Investigating Corrupted PDF


Hello Friends,

Today, I will show you how can you investigate a corrupted PDF. For this purpose I have created a sample PDF. Before reading this article,I will suggest you to read this another article PDF Overview for better understanding of PDF structure.






Tool Required



1. PDF Reader
2. Notepad++ for editing.

So , lets start to get our hand dirty.. 🙂

First, download this sample PDF and try to open this PDF.
You will see this error message.

PDF Forensic

Now open this PDF in Notepad++.

Note: I have not encoded the PDF Contents with different filters for simplicity.

1 0 obj
<<
	/Pages 2 0 R
	/Type /Catalog
>>
endobj
2 0 obj
<<
	/Count 2
	/Kids [ 3 0 R 5 0 R 7 0 R 9 0 R 11 0 R ]
	/Type /Pages
>>
endobj
3 0 obj
<<
	/MediaBox [ 0 0 795 842 ]
	/Parent 2 0 R
	/Contents 4 0 R
	/Resources <<
		/Font <<
			/F1 <<
				/Name /F1
				/BaseFont /Helvetica
				/Subtype /Type1
				/Type /Font
			>>
		>>
	>>
	/Type /Page
>>
endobj
4 0 obj
<<
	/Length 55
>>stream
BT
/F1 18 Tf
186 690 Td
20 TL
(www.secsavvy.com) Tj
ET

endstream
endobj
5 0 obj
<<
	/MediaBox [ 0 0 795 842 ]
	/Parent 2 0 R
	/Contents 6 0 R
	/Resources <<
		/Font <<
			/F1 <<
				/Name /F1
				/BaseFont /Helvetica
				/Subtype /Type1
				/Type /Font
			>>
		>>
	>>
	/Type /Page
>>
endobj
6 0 obj
<<
	/Length 45
>>stream
BT
/F1 15 Tf
186 690 Td
20 TL
(Page 1) Tj
ET

endstream
endobj
7 0 obj
<<
	/MediaBox [ 0 0 795 842 ]
	/Parent 2 0 R
	/Contents 8 0 R
	/Resources <<
		/Font <<
			/F1 <<
				/Name /F1
				/BaseFont /Helvetica
				/Subtype /Type1
				/Type /Font
			>>
		>>
	>>
	/Type /Page
>>
endobj
8 0 obj
<<
	/Length 45
>>stream
BT
/F1 15 Tf
186 690 Td
20 TL
(Page 2) Tj
ET

endstream
endobj
9 0 obj
<<
	/MediaBox [ 0 0 795 842 ]
	/Parent 2 0 R
	/Contents 10 0 R
	/Resources <<
		/Font <<
			/F1 <<
				/Name /F1
				/BaseFont /Helvetica
				/Subtype /Type1
				/Type /Font
			>>
		>>
	>>
	/Type /Page
>>
endobj
10 0 obj
<<
	/Length 45
>>stream
BT
/F1 15 Tf
186 690 Td
20 TL
(Page 3) Tj
ET

endstream
endobj
11 0 obj
<<
	/MediaBox [ 0 0 795 842 ]
	/Parent 2 0 R
	/Content 12 0 R
	/Resources <<
		/Font <<
			/F1 <<
				/Name /F1
				/BaseFont /Helvetica
				/Subtype /Type1
				/Type /Font
			>>
		>>
	>>
	/Type /Page
>>
endobj
12 0 obj
<<
	/Length 47
>>stream
BT
/F1 15 Tf
186 690 Td
20 TL
(Password) Tj
ET

endstream
endobj
xref
0 13
0000000000 65535 f
0000000010 00000 n
0000000067 00000 n
0000000161 00000 n
0000000398 00000 n
0000000510 00000 n
0000000747 00000 n
0000000849 00000 n
0000001086 00000 n
0000001188 00000 n
0000001426 00000 n
0000001529 00000 n
0000001768 00000 n
trailer
<<
	/Root 1 0 R
	/Size 13
>>
startxref
1873
%%EOF

PDF file consists of 4 elements:

  • PDF header identifying the PDF specification.
  • A body containing the objects that make up the document contained in the file
  • A cross-reference table containing information about the indirect objects in the file
  • A trailer giving the location of the cross-reference table and of certain special objects within the body of the file.

But in this case there is no header so we will add a PDF header and try to open this PDF.


%PDF-1.7



Now we are able to open this PDF.



PDF Forensic
We can see that this PDF consists of 2 pages as shown in image above but investigate further to verify it.







PDF Forensic

Now, we are able to find that this PDF has actually total 5 pages so edit the Count from 2 to 5 and open this PDF.


%PDF-1.7
1 0 obj
<< /Pages 2 0 R /Type /Catalog >>
endobj
2 0 obj
<< /Count 5
/Kids [ 3 0 R 5 0 R 7 0 R 9 0 R 11 0 R ]
/Type /Pages
>>
endobj



Now, we are able to see all 5 pages but last page is blank so we will investigate further.
Last page is pointed by 11 0 R indirect object reference.


11 0 obj
<< /MediaBox [ 0 0 795 842 ] /Parent 2 0 R /Content 12 0 R
/Resources << /Font << /F1 << /Name /F1 /BaseFont /Helvetica /Subtype /Type1 /Type /Font >>
>>
>>
/Type /Page
>>
endobj







Contents keyword is used for describing the contents of a file . If this entry is absent then the page is empty.
But in this object number 12 Contents is written as Content so PDF reader is unable to recognize the name Content so it ignores the Content without giving any error.
Replace Content with Contents and open the PDF. Now you are able to see all five pages. 🙂



You can download this corrected PDF from this link.






Demo(High Quality)





or you can also watch it on youtube

If you are more interested to read about PDF then I recommend you to visit excellent bog of Didier Stevens

Hope you enjoyed this post , feel free to comment……