Comparing the text in two PDF files
Need to compare the text in two PDF files to find differences?
Try the following (for Windows):
Download
Xpdf for pdftotext.exe.
-
Extract the text from the PDF files while preserving the layout with:
pdftotext -layout file.pdf
Determine the differences and store these side by side (-y) in a text file with:
diff -y --width=220 file1.pdf file2.pdf > file1_file2_diff.txt
You might need to test different settings for the value after –width= to prevent lines from being terminated prematurely.
Open file1_file2_diff.txt to look at the differences.