Editing PDF with XPDF (or with something else)

I would like to ask if it is possible to edit PDF files using the xpdf library and if yes how? I guess this is possible but i could not find any tutorial nor documentation for xpdf so i have realy no idea 🙁 . I’m also open for using another library if any other has support for pdf editing. My only requirement for such library is that it has to be a C++ library or at least a C one and has to be cross-platform (Windows and Linux)

Just so you understand the scope of what you’re getting into, “basic editing” of PDF content is nearly always non-trivial.

Page content in PDF is represented by short RPN programs that paint on the page. It’s a small language similar to PostScript in semantics, but without looping structures or function definitions (so there is no halting problem). In a sane world, your text on the page is going to be represented by something like this:

BT /F1 12 Tf 72 720 Td (this is a text in a pdf document) Tj ET

which when translated into something more familiar, is this:

SetFont(F1, 12.0); // Font 1, 12.0 pt
TextMoveTo(72, 720);
ShowText(“this is a text in a pdf document”);

So in this case, you have to transform this into something like this:

SetFont(F1, 12.0); // Font 1, 12.0 pt
TextMoveTo(72, 720);
ShowText(“this is a “);
SetFont(F2, 12);
SetFont(F1, 12);
ShowText(” in a pdf document”);

which would become:

BT /F1 12 Tf 72 720 Td (this is a ) Tj /F2 12 Tf (text) Tj /F1 12 Tf
( in a pdf document) Tj ET

in the equivalent PDF. The issue is many-fold:

You have to remove out the page plus all its sources (non-trivial).
You possess to create a new page, inserting brand new information (you are actually adding a new font style), embedding the font if permitted.
Modify the content flow of the page to feature your adjusted content.

As well as 3 is actually where you are actually acquiring disconnected, considering that there are actually an infinite amount of means to create a website that possesses the web material you refer to as well similar to a great selection, you are actually more than likely to possess a hard opportunity securing possibly 70% of all of them. Allow me rapidly define why this is actually as unfavorable as it seems. There are PDF era systems (I’m taking a look at you, troff) that prepared all the ordinary notification on a page to begin with, after that lay all the italic message, at that point all the sturdy text. I vow, I am actually certainly not creating this up. Some programs intend to lay message down incredibly specifically, therefore if you are actually privileged, they’ll take advantage of the TJ driver which lays out text with details kerning. If you are actually certainly not fortunate (which is actually the a large number of the moment), they are actually somewhat laid out the notification with an assortment of movings prior to every solitary glyph on the page. As well as likewise what if your message is prepared our on a curve or even an unique alignment (charts, advertisements)? Specifically what concerning the scenarios where a person quietly affects the font style size for a higher distinction between upper as well as additionally lower situation or mimics little bit of caps?

This is actually why, when I made the find information tool for Acrobat 1.0, it took me 2 months of sweat to take care of as many of the side circumstances. This is actually certainly not modifying and boosting message – it is actually only looking for a solitary word or phrase.

I’m not going to advise a collection for you – sorry – I offered xpdf a short look into as well as it’s unclear whether it has PDF generation capabilities or if it is just a consumer of PDF. PdfLib, which is a commercial product, seems to create PDF, although it’s not clear if it can consume it, but you could absolutely obtain both sides by gluing them with each other.

If it were me, I would certainly use devices that I have actually developed as well as I would certainly still be a little timid of this task. My collection is being made use of by Atalasoft, the company I function for, to produce PDFs from entire fabric and also to do editing and enhancing within a very restricted domain name (comments, document metadata). The hardest component is that we do our best to conceal the complexity of PDF from our consumers. Generally, our consumers desire us to understand the specification as opposed to them as well as make the rest simple – however jobs such as this (redaction is an additional one), are really hard to do without understanding the depth of the PDF specification. If you start getting in the library globe of PDF adjustment, you need to start with checking out the spec, particularly chapter 8 (Graphics) and phase 9 (Text), and you’ll get a better understanding of just what you’re going to concern the library.

Leave a Reply

Your email address will not be published. Required fields are marked *