pdf libraries

pdf libraries

Java

Java

Quick Tutorial on Generating PDF from HTML with OpenPDF

Marcelo Abreu, founder of pdforge

Marcelo | Founder

Marcelo | Founder

Nov 4, 2024

Nov 4, 2024

Introduction to OpenPDF for HTML to PDF Conversion

OpenPDF is an open-source Java library that enables developers to create and manipulate PDF documents programmatically. In SaaS applications, generating dynamic PDF reports from HTML content is a common requirement. While OpenPDF doesn’t support direct HTML to PDF conversion out of the box, it can be combined with other tools to achieve effective results.

You can check out the full documentation here.

Comparison Between OpenPDF and Other Java PDF Libraries

When choosing a PDF library or tool for your Java project, it’s crucial to consider features, licensing, community support, and how well it integrates with your technology stack.

  • OpenPDF: An LGPL/MPL-licensed library suitable for commercial use, focusing on PDF creation and manipulation within Java applications.

  • iText: A powerful library with extensive features, but newer versions are AGPL-licensed, which may not fit all projects due to licensing restrictions.

  • Apache PDFBox: Allows low-level PDF manipulation but lacks comprehensive HTML to PDF conversion support.

  • Flying Saucer: Specializes in rendering XHTML and CSS 2.1 to PDF, making it suitable for HTML to PDF tasks, though it’s less actively maintained.

  • Playwright: Primarily a browser automation tool that supports headless browser operations. It can render complex HTML and CSS to PDF by leveraging Chromium’s print to PDF capabilities, making it a good choice for generating PDFs from web content.

Guide to generate pdf from html using Java OpenPDF
Guide to generate pdf from html using Java OpenPDF

Setting Up OpenPDF in Your Java Project

To start using OpenPDF, add it to your project’s dependencies.

For Maven projects:

<dependency>
    <groupId>com.github.librepdf</groupId>
    <artifactId>openpdf</artifactId>
    <version>1.3.30</version>
</dependency>

For Gradle projects:

implementation 'com.github.librepdf:openpdf:1.3.30'

Installing OpenPDF: A Quick Start Guide

  1. Add the Dependency: Include OpenPDF in your pom.xml or build.gradle.

  2. Refresh Dependencies: Update your project to fetch the new library.

  3. Verify Imports: Ensure you can import OpenPDF classes in your code.

Configuring Your Environment for HTML to PDF

Since OpenPDF doesn’t natively support HTML to PDF conversion, you’ll need to integrate it with an HTML parser like JSoup to manually map HTML elements to PDF elements.

Converting HTML to PDF with OpenPDF

Let’s walk through creating a PDF invoice by parsing an HTML template.

Creating a Complete Invoice HTML/CSS File as an Example

Create an invoice.html file:

<!DOCTYPE html>
<html>
<head>
    <style>
        body { font-family: Arial, sans-serif; }
        h1 { color: navy; }
        table { width: 100%; border-collapse: collapse; }
        th, td { border: 1px solid gray; padding: 8px; text-align: left; }
        .total { font-weight: bold; }
    </style>
</head>
<body>
    <h1>Invoice #12345</h1>
    <p>Date: 2024-11-04</p>
    <p>Customer: Jane Smith</p>
    <table>
        <tr>
            <th>Description</th><th>Quantity</th><th>Unit Price</th><th>Total</th>
        </tr>
        <tr>
            <td>Widget A</td><td>2</td><td>$25.00</td><td>$50.00</td>
        </tr>
        <tr>
            <td>Widget B</td><td>1</td><td>$75.00</td><td>$75.00</td>
        </tr>
        <tr>
            <td colspan="3" class="total">Grand Total</td><td>$125.00</td>
        </tr>
    </table>
</body>
</html>

Writing Java Code for HTML to PDF Conversion

Add JSoup to your dependencies:

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.15.3</version>
</dependency>

Implement the conversion:

import com.github.librepdf.openpdf.text.Document;
import com.github.librepdf.openpdf.text.Element;
import com.github.librepdf.openpdf.text.Font;
import com.github.librepdf.openpdf.text.Image;
import com.github.librepdf.openpdf.text.Paragraph;
import com.github.librepdf.openpdf.text.pdf.BaseFont;
import com.github.librepdf.openpdf.text.pdf.PdfPCell;
import com.github.librepdf.openpdf.text.pdf.PdfPTable;
import com.github.librepdf.openpdf.text.pdf.PdfWriter;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Element as HtmlElement;
import org.jsoup.nodes.Document as HtmlDocument;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
public class HtmlToPdfConverter {
    public static void main(String[] args) {
        try {
            HtmlDocument htmlDoc = Jsoup.parse(new File("invoice.html"), "UTF-8");
            Document pdfDoc = new Document();
            PdfWriter.getInstance(pdfDoc, new FileOutputStream("invoice.pdf"));
            pdfDoc.open();
            // Set up fonts
            BaseFont baseFont = BaseFont.createFont(BaseFont.HELVETICA, BaseFont.WINANSI, BaseFont.EMBEDDED);
            Font font = new Font(baseFont, 12);
            // Extract and add title
            String title = htmlDoc.select("h1").text();
            pdfDoc.add(new Paragraph(title, new Font(baseFont, 16)));
            // Extract and add date and customer info
            for (HtmlElement p : htmlDoc.select("p")) {
                pdfDoc.add(new Paragraph(p.text(), font));
            }
            // Extract table data
            HtmlElement table = htmlDoc.select("table").first();
            PdfPTable pdfTable = new PdfPTable(4); // Assuming 4 columns
            // Add table headers
            for (HtmlElement header : table.select("th")) {
                PdfPCell cell = new PdfPCell(new Paragraph(header.text(), font));
                cell.setBackgroundColor(new com.github.librepdf.openpdf.text.BaseColor(230, 230, 250));
                pdfTable.addCell(cell);
            }
            // Add table rows
            for (HtmlElement row : table.select("tr").not(":first-child")) {
                for (HtmlElement cell : row.select("td")) {
                    pdfTable.addCell(new PdfPCell(new Paragraph(cell.text(), font)));
                }
            }
            pdfDoc.add(pdfTable);
            pdfDoc.close();
            System.out.println("PDF generated successfully.");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Handling Dynamic Data in Your PDF

To deal with dynamic data, you can use placeholders in your HTML template and replace them at runtime.

Example:

<h1>Invoice #{{invoiceNumber}}</h1>
<p>Date: {{date}}</p>
<p>Customer: {{customerName}}</p>

In your Java code:

Map<String, String> data = new HashMap<>();
data.put("invoiceNumber", "12345");
data.put("date", "2024-11-04");
data.put("customerName", "Jane Smith");
String htmlContent = new String(Files.readAllBytes(Paths.get("invoice.html")), StandardCharsets.UTF_8);
for (Map.Entry<String, String> entry : data.entrySet()) {
    htmlContent = htmlContent.replace("{{" + entry.getKey() + "}}", entry.getValue());
}
// Proceed with parsing htmlContent using JSoup

Handling CSS and Images in Your PDF Output

While OpenPDF doesn’t support CSS, you can manually apply styles.

• Fonts and Colors: Use OpenPDF’s Font class to set font styles and colors.

• Images: Extract image sources from HTML and add them using OpenPDF’s Image class.

Example of adding an image:

String imageUrl = htmlDoc.select("img").attr("src");
Image image = Image.getInstance(imageUrl);
pdfDoc.add(image);

Example of styling text:

Font boldFont = new Font(baseFont, 12, Font.BOLD);
pdfDoc.add(new Paragraph("Bold Text", boldFont));

Best Practices for Using OpenPDF in Production

• Error Handling: Implement comprehensive exception management to catch and log errors.

• Resource Management: Use try-with-resources to ensure documents and streams are closed properly.

• Performance Optimization: Reuse fonts and images to optimize memory usage and performance.

How to Use a PDF API to Automate PDF Creation at Scale

For SaaS platforms, automating PDF generation at scale might require offloading the heavy lifting to a PDF API.

It's also an option to integrate with third-party APIs like pdforge you can handle high-volume PDF generation, complex formatting, and post-processing, all from a single backend call.

Here’s an example of how to integrate pdforge in Rails to convert HTML content into a PDF via an API call:

import java.io.OutputStreamWriter;
import java.net.HttpURLConnection;
import java.net.URL;

public class PdfForgeExample {
    public static void main(String[] args) {
        try {
            URL url = new URL("https://api.pdforge.com/v1/pdf/sync");
            HttpURLConnection conn = (HttpURLConnection) url.openConnection();
            conn.setRequestMethod("POST");
            conn.setRequestProperty("Authorization", "Bearer your-api-key");
            conn.setRequestProperty("Content-Type", "application/json");
            conn.setDoOutput(true);

            String jsonInputString = "{ \"templateId\": \"your-template\", \"data\": { \"html\": \"your-html\" } }";

            try(OutputStreamWriter writer = new OutputStreamWriter(conn.getOutputStream())) {
                writer.write(jsonInputString);
                writer.flush();
            }

            int responseCode = conn.getResponseCode();
            if (responseCode == HttpURLConnection.HTTP_OK) {
                // Read the response and process the PDF
            } else {
                // Handle errors
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

This code sends a POST request to the pdforge API, receives the generated PDF, and saves it locally.

Conclusion

OpenPDF is a solid choice when you need fine-grained control over PDF generation and are comfortable with manual mapping from HTML to PDF elements. It’s suitable for projects where you have simple HTML content and need extensive customization of the PDF output.

If your project involves complex HTML and CSS that need to be converted to PDF, libraries like Flying Saucer or iText, or tools like Playwright may be more fitting. Playwright can render complex web pages and generate PDFs using headless browsers, which is particularly useful when your content relies heavily on modern web technologies.

For scaling PDF generation or when you prefer not to handle the conversion logic yourself, third-party services like pdforge provide robust APIs. They handle complex HTML and CSS conversions efficiently, allowing you to focus on other aspects of your application.

Generating pdfs at scale can be quite complicated!

Generating pdfs at scale can be quite complicated!

We take care of all of this, so you focus on what trully matters on your Product!

We take care of all of this, so you focus on what trully matters on your Product!

Try for free

7-day free trial

Table of contents

Title