Earlier in the year I spent some time looking for PDF parsing solutions. If you’re ever stuck working with a PDF file with no easy access to the data that generated it, you may need to parse meaningful information out of the PDF itself. This is difficult due to how PDFs are rendered.

Long story short: commercial solutions can knock this out of the park by converting the PDF to a parseable format such as csv, maintaining the structure of the document. At the time I was researching this, free solutions could convert the document to a parseable format but could not maintain the document structure, leaving you with a jumble of text strings.


The rest of the story, aka the notes I had:

A Non-Exhaustive But Solid List of Commercial Solutions with Support for Tables:

One Free Solution with Support for Tables

Text Only Solutions