Friday, June 4, 2010

Flatedecode decoder

Well another post on PDF. This time around a small snippet that will help us in decoding the FlateDecoded streams in PDF. FlateDecode is a commonly used filter based on the DEFLATE or Zip algorithm. A particular stream in the PDF file can be filtered multiple times with multiple filters, some of them being ASCII85Decode, ASCIIHexDecode, etc. If a particular stream is filtered with one or more filters they may have a notation like the below one.

5 0 obj<< /Length 120 /Filter [/FlateDecode /FlateDecode] >>
stream


This means that the stream following the filter is double filtered with FlateDecode. So when something is filtered twice with the same filter we decompress them twice to get the data. I just quickly pulled off a python script that will decompress the compressed FlateDecode stream. All you got to do is rip the stream section apart using your favorite hex editor. Once you have done that you can feed the stream file in the python script to get the decompressed data.

import zlib
import sys

args = sys.argv

if len(args) != 3:
print("Usage python.exe "+args[0]+" ")
exit(0)

input = args[1]
output = args[2]

file_read = open(input,'rb')
buffer = file_read.read()
decomp = zlib.decompress(buffer)
file_write = open(output,'w')
file_write.write(decomp)

PS: The script was tested under python version - 2.6.4


Same code on pastie - http://pastie.org/10425585 - Please use this.

9 comments:

  1. You can use this script against - http://didierstevens.com/files/data/win7-puzzle.zip

    ReplyDelete
  2. This post was really cool. The only change I would make would be as follows: file_read = open(input,'r') into file_read = open(input,'rb')

    I think some streams might get truncated otherwise!

    ReplyDelete
  3. Thanks for the code snippet, very useful.

    ReplyDelete
  4. Thanks for the script - that was very useful!
    Tomek

    ReplyDelete
  5. I get this error.
    zlib.error: Error -3 while decompressing data: unknown compression method

    ReplyDelete
  6. Probably the data you are trying to decompress may not be zlib compressed.

    ReplyDelete
  7. I get an error as below;

    print("Usage python.exe " +args[0]+ " ")
    ^
    IndentationError: expected an indented block

    ReplyDelete
  8. I don't have time to do this but I need help with it I do not really understand it

    ReplyDelete
  9. @Anonymous - http://pastie.org/10425585 - Use this.

    ReplyDelete