Comparing the text in two PDF files

Need to compare the text in two PDF files to find differences?

Try the following (for Windows):

  1. Download Xpdf for pdftotext.exe.
  2. Download GNU utilities for Win32 for diff.exe.
  3. Extract the text from the PDF files while preserving the layout with:
    pdftotext -layout file.pdf
  4. Determine the differences and store these side by side (-y) in a text file with:
    diff -y --width=220 file1.pdf file2.pdf > file1_file2_diff.txt
    1. You might need to test different settings for the value after –width= to prevent lines from being terminated prematurely.
  5. Open file1_file2_diff.txt to look at the differences.