![]() using additional options or alternative output formats), and they should probably best seen as (hopefully useful) starting points for the reader’s own explorations.Īll of the tools presented here are published as open-source, and most of them have a command-line interface. Also, many of the example commands in this post can be further refined to particular needs (e.g. So there’s probably a fair amount of selection bias here, and I don’t want to make any claims of presenting the “best” way to do any of these tasks here. Some of these tasks could be done using other tools (including ones that are not mentioned here), and in some cases these other tools may well be better choices. It was guided to a great degree by the PDF-related issues I’ve encountered myself in my day to day work. View, search and extract low-level PDF objectsĮven though this post covers a lot of ground, the selection of tasks and tools presented here is by no means meant to be exhaustive.File size reduction of PDF with hi-res graphics.Inspection of embedded image information.Document information and metadata extraction. ![]() Starting with a brief overview of some general-purpose PDF toolkits, I then move on to a discussion of the following specific tasks: It is largely based on a multitude of scattered lists, cheat-sheets and working notes that I made earlier. This post is an attempt to (finally) bring together my go-to PDF analysis and processing tools and commands for a variety of common tasks in one single place. Over the years, I’ve been using a variety of open-source software tools for solving all sorts of issues with PDF documents. Specified in the corresponding Arch Linux package.Plumbers Tool Box by pszz on Flickr. License, except for the contents of the manual pages, which have their own license The website is available under the terms of the GPL-3.0 Using mandoc for the conversion of manual pages. Package information: Package name: extra/poppler Version: 23.03.0-1 Upstream: Licenses: GPL Manuals: /listing/extra/poppler/ Table of contents Pdftocairo(1), pdftohtml(1), pdftoppm(1), The pdfinfo software and documentation are copyright 1996-2011 The Xpdf tools use the following exit codes: 0 No error. v Print copyright and version information. Password Specify the user password for the PDF file. Password Specify the owner password for the PDF file. listenc Lits the available encodings -opw "-f" and "-l", only destinations in the page range areĮncoding-name Sets the encoding to use for text output. dests Print a list of all named destinations. rawdates Prints the raw (undecoded) date strings, directly from the PDF file. isodates Prints dates in ISO-8601 format (including the time zone). pdfinfoĭoes not attempt to extract strings matching from the textĬontent. Referenced by the PDF objects such as Link Annotations are listed. Currently, this is limited to Annotations. ![]() Only the URL types supported by Poppler are (Implies -struct.) -url Print all URLs in the PDF. Note that extracting text this way might be slow for big struct-text Print the textual content along with the document structure of a struct Prints the logical document structure of a Tagged-PDF file. (This is the "Metadata" streamįrom the PDF file's Catalog object.) -custom Prints custom and standard metadata. box Prints the page box bounding boxes: MediaBox, CropBox, BleedBox, TrimBox,Īnd ArtBox. l number Specifies the last page to examine. Page (and, optionally, the bounding boxes for each requested page) are The "-f" and "-l" options, the size of each requested OPTIONS -f number Specifies the first page to examine. The options -listenc, -meta, -js, -struct, and -struct-text only
0 Comments
Leave a Reply. |