Today I was asked whether I was aware of a way to extract a table from a pdf
file. I actually knew about one CLI tool pdftotext
🔗 that converts a pdf file
to a text file and I had this memory that I had used it for tables in the
past. pdftotext
is developed by Glyph & Cog
with several other CLI tools to manipulate pdf files and the pdf viewer
Xpdf. On Debian (and Debian
derivatives), pdftotext
and the other CLI tools are included in the package Debian package
poppler-utils that can be installed like so:
|
|
Once installed, the following command line does the conversion
|
|
There are several additional option and if one means to extract a table, the -layout
option is pretty helpful as it maintains the original physical layout (as explained in the documentation):
|
|
Pretty sweat 😄!