pdf libraries

pdf libraries

Java

Java

How to Generate PDF from HTML Using Playwright Java

Marcelo Abreu, founder of pdforge

Marcelo | Founder

Marcelo | Founder

Nov 7, 2024

Nov 7, 2024

Why Choose Playwright Java for HTML to PDF Conversion

Playwright is a cutting-edge automation library that allows developers to control web browsers with remarkable precision. It excels in rendering complex web pages, making it an ideal choice for converting HTML content into PDFs in Java applications, especially for SaaS platforms that require dynamic report generation.

You can check their vast documentation here.

Playwright vs. Other Java PDF Libraries: A Comparative Insight

When considering HTML to PDF conversion in Java, several libraries come to mind:

  • Flying Saucer: A Java library that renders XML/XHTML and CSS 2.1 content. It handles basic HTML and CSS but struggles with modern web features.

  • iText: A powerful PDF library capable of creating and manipulating PDFs. However, it has a steep learning curve and licensing restrictions for commercial use.

  • Apache PDFBox: An open-source library for working with PDF documents. It’s excellent for manipulating existing PDFs but isn’t optimized for HTML to PDF conversion.

  • OpenPDF: A derivative of iText, offering similar functionalities with an open-source license. It shares the complexities of iText without full support for advanced HTML content.

Playwright stands out by using actual browser engines to render HTML and CSS, ensuring that the generated PDFs accurately reflect modern web designs, including advanced JavaScript and CSS features.

Guide to generate pdf from html using Java Playwright
Guide to generate pdf from html using Java Playwright

Setting Up Playwright in Your Java Project

Quick Start: Installing Playwright Java

Add the following dependency to your pom.xml file if you’re using Maven:

<dependencies>
    <dependency>
        <groupId>com.microsoft.playwright</groupId>
        <artifactId>playwright</artifactId>
        <version>1.34.0</version>
    </dependency>
</dependencies>

For Gradle, include:

dependencies {
    implementation 'com.microsoft.playwright:playwright:1.34.0'
}

Configuring Dependencies

Initialize Playwright and install the necessary browser binaries:

import com.microsoft.playwright.*;

public class PlaywrightSetup {
    public static void main(String[] args) {
        try (Playwright playwright = Playwright.create()) {
            playwright.chromium().launch();
            System.out.println("Playwright is set up successfully.");
        }
    }
}

Verifying Your Setup with a Test HTML Page

Create a comprehensive HTML invoice template named invoice.html:

<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
    <title>Invoice</title>
    <style>
        body { font-family: 'Arial', sans-serif; margin: 20px; }
        h1 { text-align: center; }
        .invoice-details { width: 100%; margin-top: 20px; border-collapse: collapse; }
        .invoice-details th, .invoice-details td { padding: 10px; border: 1px solid #ccc; text-align: left; }
        .total { text-align: right; font-weight: bold; }
    </style>
</head>
<body>
    <h1>Invoice</h1>
    <p>Date: <strong>2023-10-01</strong></p>
    <p>Invoice #: <strong>INV-1001</strong></p>
    <table class="invoice-details">
        <tr>
            <th>Description</th>
            <th>Quantity</th>
            <th>Price</th>
            <th>Total</th>
        </tr>
        <tr>
            <td>Product A</td>
            <td>2</td>
            <td>$50</td>
            <td>$100</td>
        </tr>
        <tr>
            <td>Service B</td>
            <td>5</td>
            <td>$20</td>
            <td>$100</td>
        </tr>
        <tr>
            <td colspan="3" class="total">Grand Total</td>
            <td>$200</td>
        </tr>
    </table>
</body>
</html>

Implementing HTML to PDF Conversion with Playwright Java

Converting HTML to PDF Using Playwright

Method 01: Rendering the PDF from a URL

If your HTML invoice is hosted, you can navigate to its URL and generate a PDF:

import com.microsoft.playwright.*;

public class PdfFromUrl {
    public static void main(String[] args) {
        try (Playwright playwright = Playwright.create()) {
            Browser browser = playwright.chromium().launch();
            Page page = browser.newPage();
            page.navigate("https://yourdomain.com/invoice.html");
          
            // Save PDF locally
            page.pdf(new Page.PdfOptions().setPath("invoice.pdf"));
            System.out.println("PDF generated from URL and saved locally.");
          
          // Get PDF as a byte array for further processing
            byte[] pdfBytes = page.pdf();
          // Use pdfBytes to upload or process as needed
        }
    }
}

Method 02: Rendering the PDF from HTML Content

For dynamic content or when the HTML is generated on the fly:

import com.microsoft.playwright.*;

public class PdfFromContent {
    public static void main(String[] args) {
        String htmlContent = "<!DOCTYPE html><html><head><meta charset='UTF-8'><title>Invoice</title>"
                + "<style>"
                + "body { font-family: 'Arial', sans-serif; margin: 20px; }"
                + "h1 { text-align: center; }"
                + ".invoice-details { width: 100%; margin-top: 20px; border-collapse: collapse; }"
                + ".invoice-details th, .invoice-details td { padding: 10px; border: 1px solid #ccc; text-align: left; }"
                + ".total { text-align: right; font-weight: bold; }"
                + "</style></head>"
                + "<body>"
                + "<h1>Invoice</h1>"
                + "<p>Date: <strong>2023-10-01</strong></p>"
                + "<p>Invoice #: <strong>INV-1001</strong></p>"
                + "<table class='invoice-details'>"
                + "<tr><th>Description</th><th>Quantity</th><th>Price</th><th>Total</th></tr>"
                + "<tr><td>Product A</td><td>2</td><td>$50</td><td>$100</td></tr>"
                + "<tr><td>Service B</td><td>5</td><td>$20</td><td>$100</td></tr>"
                + "<tr><td colspan='3' class='total'>Grand Total</td><td>$200</td></tr>"
                + "</table>"
                + "</body></html>";
        try (Playwright playwright = Playwright.create()) {
            Browser browser = playwright.chromium().launch();
            Page page = browser.newPage();
            page.setContent(htmlContent);
            // Save PDF locally
            page.pdf(new Page.PdfOptions().setPath("invoice.pdf"));
            System.out.println("PDF generated from HTML content and saved locally.");
            // Get PDF as a byte array
            byte[] pdfBytes = page.pdf();
            // Use pdfBytes as needed
        }
    }
}

HTML Template Engines

To generate dynamic invoices, you can use template engines like FreeMarker or Thymeleaf. Here’s an example using FreeMarker:

import com.microsoft.playwright.*;
import freemarker.template.*;
import java.io.*;
import java.util.*;

public class PdfWithTemplate {
    public static void main(String[] args) throws Exception {
        // Configure FreeMarker
        Configuration cfg = new Configuration(Configuration.VERSION_2_3_30);
        cfg.setClassForTemplateLoading(PdfWithTemplate.class, "/templates");
        // Load template
        Template template = cfg.getTemplate("invoice.ftl");
        // Data model
        Map<String, Object> data = new HashMap<>();
        data.put("date", "2023-10-01");
        data.put("invoiceNumber", "INV-1001");
        List<Map<String, String>> items = new ArrayList<>();
        Map<String, String> item1 = new HashMap<>();
        item1.put("description", "Product A");
        item1.put("quantity", "2");
        item1.put("price", "$50");
        item1.put("total", "$100");
        items.add(item1);
        Map<String, String> item2 = new HashMap<>();
        item2.put("description", "Service B");
        item2.put("quantity", "5");
        item2.put("price", "$20");
        item2.put("total", "$100");
        items.add(item2);
        data.put("items", items);
        data.put("grandTotal", "$200");
        // Generate HTML content
        Writer out = new StringWriter();
        template.process(data, out);
        String htmlContent = out.toString();
        // Generate PDF with Playwright
        try (Playwright playwright = Playwright.create()) {
            Browser browser = playwright.chromium().launch();
            Page page = browser.newPage();
            page.setContent(htmlContent);
            // Save PDF locally
            page.pdf(new Page.PdfOptions().setPath("invoice.pdf"));
            System.out.println("PDF generated from template and saved locally.");
            // Get PDF as a byte array
            byte[] pdfBytes = page.pdf();
            // Use pdfBytes as needed
        }
    }
}

invoice.ftl template file in /templates directory:

<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
    <title>Invoice</title>
    <style>
        body { font-family: 'Arial', sans-serif; margin: 20px; }
        h1 { text-align: center; }
        .invoice-details { width: 100%; margin-top: 20px; border-collapse: collapse; }
        .invoice-details th, .invoice-details td { padding: 10px; border: 1px solid #ccc; text-align: left; }
        .total { text-align: right; font-weight: bold; }
    </style>
</head>
<body>
    <h1>Invoice</h1>
    <p>Date: <strong>${date}</strong></p>
    <p>Invoice #: <strong>${invoiceNumber}</strong></p>
    <table class="invoice-details">
        <tr>
            <th>Description</th>
            <th>Quantity</th>
            <th>Price</th>
            <th>Total</th>
        </tr>
        <#list items as item>
        <tr>
            <td>${item.description}</td>
            <td>${item.quantity}</td>
            <td>${item.price}</td>
            <td>${item.total}</td>
        </tr>
        </#list>
        <tr>
            <td colspan="3" class="total">Grand Total</td>
            <td>${grandTotal}</td>
        </tr>
    </table>
</body>
</html>

Adding Headers, Footers, and Page Numbers with Playwright

Enhance your PDF by adding custom headers, footers, and page numbers:

Page.PdfOptions pdfOptions = new Page.PdfOptions()
    .setPath("invoice_with_header_footer.pdf")
    .setDisplayHeaderFooter(true)
    .setHeaderTemplate("<div style='font-size:10px; width:100%; text-align:center;'>My Company</div>")
    .setFooterTemplate("<div style='font-size:10px; width:100%; text-align:center;'>Page <span class='pageNumber'></span> of <span class='totalPages'></span></div>")
    .setMargin(new Margin().setTop("50px").setBottom("50px"))
    .setPrintBackground(true);

page.pdf(pdfOptions);

All Options from the pdf Method

The page.pdf() method offers a variety of options to customize the PDF output:

  • path: Specifies the file path to save the PDF. If omitted, the PDF will be returned as a byte array.

  • scale: Sets the scale of the webpage rendering (default is 1.0).

  • displayHeaderFooter: When set to true, includes header and footer in the PDF.

  • headerTemplate and footerTemplate: HTML templates for the header and footer. Can include placeholders like <span class='pageNumber'></span>.

  • printBackground: When set to true, prints background graphics.

  • landscape: When set to true, prints the PDF in landscape orientation.

  • pageRanges: Specifies the page ranges to print, e.g., "1-5, 8, 11-13".

  • format: Sets the paper format, such as "A4", "Letter".

  • width and height: Sets the width and height of the paper in units (px, in, cm, mm).

  • margin: Sets margins for the PDF. Accepts a Margin object with top, right, bottom, left properties.

  • preferCSSPageSize: When set to true, uses the @page size defined in CSS.

Example of using multiple options:

Page.PdfOptions pdfOptions = new Page.PdfOptions()
    .setPath("custom_invoice.pdf")
    .setFormat("A4")
    .setLandscape(false)
    .setPrintBackground(true)
    .setDisplayHeaderFooter(true)
    .setHeaderTemplate("<div style='font-size:10px; text-align:center;'>Invoice Header</div>")
    .setFooterTemplate("<div style='font-size:10px; text-align:center;'>Page <span class='pageNumber'></span> of <span class='totalPages'></span></div>")
    .setMargin(new Margin().setTop("60px").setBottom("60px").setLeft("20px").setRight("20px"))
    .setScale(1.0)
    .setPageRanges("1-2")
    .setPreferCSSPageSize(true);

page.pdf(pdfOptions);

How to Use a PDF API to Automate PDF Creation at Scale

For SaaS platforms, automating PDF generation at scale might require offloading the heavy lifting to a PDF API.

It's also an option to integrate with third-party APIs like pdforge you can handle high-volume PDF generation, complex formatting, and post-processing, all from a single backend call.

Here’s an example of how to integrate pdforge in Rails to convert HTML content into a PDF via an API call:

import java.io.OutputStreamWriter;
import java.net.HttpURLConnection;
import java.net.URL;

public class PdfForgeExample {
    public static void main(String[] args) {
        try {
            URL url = new URL("https://api.pdforge.com/v1/pdf/sync");
            HttpURLConnection conn = (HttpURLConnection) url.openConnection();
            conn.setRequestMethod("POST");
            conn.setRequestProperty("Authorization", "Bearer your-api-key");
            conn.setRequestProperty("Content-Type", "application/json");
            conn.setDoOutput(true);

            String jsonInputString = "{ \"templateId\": \"your-template\", \"data\": { \"html\": \"your-html\" } }";

            try(OutputStreamWriter writer = new OutputStreamWriter(conn.getOutputStream())) {
                writer.write(jsonInputString);
                writer.flush();
            }

            int responseCode = conn.getResponseCode();
            if (responseCode == HttpURLConnection.HTTP_OK) {
                // Read the response and process the PDF
            } else {
                // Handle errors
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

This code sends a POST request to the pdforge API, receives the generated PDF, and saves it locally.

Conclusion

Playwright Java offers a robust solution for converting HTML to PDF, capturing the nuances of modern web content with high fidelity. Its use of real browser engines ensures that your PDFs accurately reflect your HTML designs, making it ideal for generating complex documents like invoices.

However, if your application requires extensive PDF manipulation beyond rendering, traditional libraries like iText or Flying Saucer might be more suitable due to their advanced features for editing and annotating PDFs.

For SaaS platforms needing to automate PDF creation at scale without managing the rendering process, leveraging third-party PDF APIs like pdforge can be a strategic choice, offering scalability and reducing infrastructure overhead.

Generating pdfs at scale can be quite complicated!

Generating pdfs at scale can be quite complicated!

We take care of all of this, so you focus on what trully matters on your Product!

We take care of all of this, so you focus on what trully matters on your Product!

Try for free

7-day free trial

Table of contents

Title