Well another post on PDF. This time around a small snippet that will help us in decoding the FlateDecoded streams in PDF. FlateDecode is a commonly used filter based on the DEFLATE or Zip algorithm. A particular stream in the PDF file can be filtered multiple times with multiple filters, some of them being ASCII85Decode, ASCIIHexDecode, etc. If a particular stream is filtered with one or more filters they may have a notation like the below one.
5 0 obj<< /Length 120 /Filter [/FlateDecode /FlateDecode] >>
stream
This means that the stream following the filter is double filtered with FlateDecode. So when something is filtered twice with the same filter we decompress them twice to get the data. I just quickly pulled off a python script that will decompress the compressed FlateDecode stream. All you got to do is rip the stream section apart using your favorite hex editor. Once you have done that you can feed the stream file in the python script to get the decompressed data.
import zlib
import sys
args = sys.argv
if len(args) != 3:
print("Usage python.exe "+args[0]+" ")
exit(0)
input = args[1]
output = args[2]
file_read = open(input,'rb')
buffer = file_read.read()
decomp = zlib.decompress(buffer)
file_write = open(output,'w')
file_write.write(decomp)
PS: The script was tested under python version - 2.6.4
Same code on pastie - http://pastie.org/10425585 - Please use this.
5 0 obj<< /Length 120 /Filter [/FlateDecode /FlateDecode] >>
stream
This means that the stream following the filter is double filtered with FlateDecode. So when something is filtered twice with the same filter we decompress them twice to get the data. I just quickly pulled off a python script that will decompress the compressed FlateDecode stream. All you got to do is rip the stream section apart using your favorite hex editor. Once you have done that you can feed the stream file in the python script to get the decompressed data.
import zlib
import sys
args = sys.argv
if len(args) != 3:
print("Usage python.exe "+args[0]+"
exit(0)
input = args[1]
output = args[2]
file_read = open(input,'rb')
buffer = file_read.read()
decomp = zlib.decompress(buffer)
file_write = open(output,'w')
file_write.write(decomp)
PS: The script was tested under python version - 2.6.4
You can use this script against - http://didierstevens.com/files/data/win7-puzzle.zip
ReplyDeleteThis post was really cool. The only change I would make would be as follows: file_read = open(input,'r') into file_read = open(input,'rb')
ReplyDeleteI think some streams might get truncated otherwise!
Thanks for the code snippet, very useful.
ReplyDeleteThanks for the script - that was very useful!
ReplyDeleteTomek
I get this error.
ReplyDeletezlib.error: Error -3 while decompressing data: unknown compression method
Probably the data you are trying to decompress may not be zlib compressed.
ReplyDeleteI get an error as below;
ReplyDeleteprint("Usage python.exe " +args[0]+ " ")
^
IndentationError: expected an indented block
I don't have time to do this but I need help with it I do not really understand it
ReplyDelete@Anonymous - http://pastie.org/10425585 - Use this.
ReplyDeleteWorks like a charm (after I fixed up the indentation).
ReplyDeleteI had no idea flate could be that simple.
Thank you
What a thrilling post, you have pointed out some excellent points, I as well believe this is a superb website. I have planned to visit it again and again. BinaryToday
ReplyDeleteVan do
ReplyDeleteSome one decode this PDF-1.4
%����
1 0 obj
<>
endobj
2 0 obj
<>stream
x���n�8 @�� z^�.)�" , H�4�- � �m� Z��� +�"�C[c%����8E����"
c+�}����m�����l� ڰLx�:��W�K�M��� [�t�l��|��V?<=��ޛ��ײ��'[�fQַ����s�桍��~QU���� �4o�3��� �wկ+{�0{�סz�6�}�L��k̛:U �V ��?��3 ��JiV)�O�� ����)�sBh� m���<� Y �.� |ޖ ��˄��4yE�-�f+:ml