Why Choose Playwright Java for HTML to PDF Conversion
Playwright is a cutting-edge automation library that allows developers to control web browsers with remarkable precision. It excels in rendering complex web pages, making it an ideal choice for converting HTML content into PDFs in Java applications, especially for SaaS platforms that require dynamic report generation.
You can check their vast documentation here.
Playwright vs. Other Java PDF Libraries: A Comparative Insight
When considering HTML to PDF conversion in Java, several libraries come to mind:
Flying Saucer: A Java library that renders XML/XHTML and CSS 2.1 content. It handles basic HTML and CSS but struggles with modern web features.
iText: A powerful PDF library capable of creating and manipulating PDFs. However, it has a steep learning curve and licensing restrictions for commercial use.
Apache PDFBox: An open-source library for working with PDF documents. It’s excellent for manipulating existing PDFs but isn’t optimized for HTML to PDF conversion.
OpenPDF: A derivative of iText, offering similar functionalities with an open-source license. It shares the complexities of iText without full support for advanced HTML content.
Playwright stands out by using actual browser engines to render HTML and CSS, ensuring that the generated PDFs accurately reflect modern web designs, including advanced JavaScript and CSS features.
Setting Up Playwright in Your Java Project
Quick Start: Installing Playwright Java
Add the following dependency to your pom.xml file if you’re using Maven:
<dependencies>
<dependency>
<groupId>com.microsoft.playwright</groupId>
<artifactId>playwright</artifactId>
<version>1.34.0</version>
</dependency>
</dependencies>
For Gradle, include:
dependencies {
implementation 'com.microsoft.playwright:playwright:1.34.0'
}
Configuring Dependencies
Initialize Playwright and install the necessary browser binaries:
import com.microsoft.playwright.*;
public class PlaywrightSetup {
public static void main(String[] args) {
try (Playwright playwright = Playwright.create()) {
playwright.chromium().launch();
System.out.println("Playwright is set up successfully.");
}
}
}
Verifying Your Setup with a Test HTML Page
Create a comprehensive HTML invoice template named invoice.html:
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Invoice</title>
<style>
body { font-family: 'Arial', sans-serif; margin: 20px; }
h1 { text-align: center; }
.invoice-details { width: 100%; margin-top: 20px; border-collapse: collapse; }
.invoice-details th, .invoice-details td { padding: 10px; border: 1px solid #ccc; text-align: left; }
.total { text-align: right; font-weight: bold; }
</style>
</head>
<body>
<h1>Invoice</h1>
<p>Date: <strong>2023-10-01</strong></p>
<p>Invoice #: <strong>INV-1001</strong></p>
<table class="invoice-details">
<tr>
<th>Description</th>
<th>Quantity</th>
<th>Price</th>
<th>Total</th>
</tr>
<tr>
<td>Product A</td>
<td>2</td>
<td>$50</td>
<td>$100</td>
</tr>
<tr>
<td>Service B</td>
<td>5</td>
<td>$20</td>
<td>$100</td>
</tr>
<tr>
<td colspan="3" class="total">Grand Total</td>
<td>$200</td>
</tr>
</table>
</body>
</html>
Implementing HTML to PDF Conversion with Playwright Java
Converting HTML to PDF Using Playwright
Method 01: Rendering the PDF from a URL
If your HTML invoice is hosted, you can navigate to its URL and generate a PDF:
import com.microsoft.playwright.*;
public class PdfFromUrl {
public static void main(String[] args) {
try (Playwright playwright = Playwright.create()) {
Browser browser = playwright.chromium().launch();
Page page = browser.newPage();
page.navigate("https://yourdomain.com/invoice.html");
page.pdf(new Page.PdfOptions().setPath("invoice.pdf"));
System.out.println("PDF generated from URL and saved locally.");
byte[] pdfBytes = page.pdf();
}
}
}
Method 02: Rendering the PDF from HTML Content
For dynamic content or when the HTML is generated on the fly:
import com.microsoft.playwright.*;
public class PdfFromContent {
public static void main(String[] args) {
String htmlContent = "<!DOCTYPE html><html><head><meta charset='UTF-8'><title>Invoice</title>"
+ "<style>"
+ "body { font-family: 'Arial', sans-serif; margin: 20px; }"
+ "h1 { text-align: center; }"
+ ".invoice-details { width: 100%; margin-top: 20px; border-collapse: collapse; }"
+ ".invoice-details th, .invoice-details td { padding: 10px; border: 1px solid #ccc; text-align: left; }"
+ ".total { text-align: right; font-weight: bold; }"
+ "</style></head>"
+ "<body>"
+ "<h1>Invoice</h1>"
+ "<p>Date: <strong>2023-10-01</strong></p>"
+ "<p>Invoice #: <strong>INV-1001</strong></p>"
+ "<table class='invoice-details'>"
+ "<tr><th>Description</th><th>Quantity</th><th>Price</th><th>Total</th></tr>"
+ "<tr><td>Product A</td><td>2</td><td>$50</td><td>$100</td></tr>"
+ "<tr><td>Service B</td><td>5</td><td>$20</td><td>$100</td></tr>"
+ "<tr><td colspan='3' class='total'>Grand Total</td><td>$200</td></tr>"
+ "</table>"
+ "</body></html>";
try (Playwright playwright = Playwright.create()) {
Browser browser = playwright.chromium().launch();
Page page = browser.newPage();
page.setContent(htmlContent);
page.pdf(new Page.PdfOptions().setPath("invoice.pdf"));
System.out.println("PDF generated from HTML content and saved locally.");
byte[] pdfBytes = page.pdf();
}
}
}
HTML Template Engines
To generate dynamic invoices, you can use template engines like FreeMarker or Thymeleaf. Here’s an example using FreeMarker:
import com.microsoft.playwright.*;
import freemarker.template.*;
import java.io.*;
import java.util.*;
public class PdfWithTemplate {
public static void main(String[] args) throws Exception {
Configuration cfg = new Configuration(Configuration.VERSION_2_3_30);
cfg.setClassForTemplateLoading(PdfWithTemplate.class, "/templates");
Template template = cfg.getTemplate("invoice.ftl");
Map<String, Object> data = new HashMap<>();
data.put("date", "2023-10-01");
data.put("invoiceNumber", "INV-1001");
List<Map<String, String>> items = new ArrayList<>();
Map<String, String> item1 = new HashMap<>();
item1.put("description", "Product A");
item1.put("quantity", "2");
item1.put("price", "$50");
item1.put("total", "$100");
items.add(item1);
Map<String, String> item2 = new HashMap<>();
item2.put("description", "Service B");
item2.put("quantity", "5");
item2.put("price", "$20");
item2.put("total", "$100");
items.add(item2);
data.put("items", items);
data.put("grandTotal", "$200");
Writer out = new StringWriter();
template.process(data, out);
String htmlContent = out.toString();
try (Playwright playwright = Playwright.create()) {
Browser browser = playwright.chromium().launch();
Page page = browser.newPage();
page.setContent(htmlContent);
page.pdf(new Page.PdfOptions().setPath("invoice.pdf"));
System.out.println("PDF generated from template and saved locally.");
byte[] pdfBytes = page.pdf();
}
}
}
invoice.ftl template file in /templates directory:
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Invoice</title>
<style>
body { font-family: 'Arial', sans-serif; margin: 20px; }
h1 { text-align: center; }
.invoice-details { width: 100%; margin-top: 20px; border-collapse: collapse; }
.invoice-details th, .invoice-details td { padding: 10px; border: 1px solid #ccc; text-align: left; }
.total { text-align: right; font-weight: bold; }
</style>
</head>
<body>
<h1>Invoice</h1>
<p>Date: <strong>${date}</strong></p>
<p>Invoice #: <strong>${invoiceNumber}</strong></p>
<table class="invoice-details">
<tr>
<th>Description</th>
<th>Quantity</th>
<th>Price</th>
<th>Total</th>
</tr>
<#list items as item>
<tr>
<td>${item.description}</td>
<td>${item.quantity}</td>
<td>${item.price}</td>
<td>${item.total}</td>
</tr>
</#list>
<tr>
<td colspan="3" class="total">Grand Total</td>
<td>${grandTotal}</td>
</tr>
</table>
</body>
</html>
Adding Headers, Footers, and Page Numbers with Playwright
Enhance your PDF by adding custom headers, footers, and page numbers:
Page.PdfOptions pdfOptions = new Page.PdfOptions()
.setPath("invoice_with_header_footer.pdf")
.setDisplayHeaderFooter(true)
.setHeaderTemplate("<div style='font-size:10px; width:100%; text-align:center;'>My Company</div>")
.setFooterTemplate("<div style='font-size:10px; width:100%; text-align:center;'>Page <span class='pageNumber'></span> of <span class='totalPages'></span></div>")
.setMargin(new Margin().setTop("50px").setBottom("50px"))
.setPrintBackground(true);
page.pdf(pdfOptions);
All Options from the pdf Method
The page.pdf()
method offers a variety of options to customize the PDF output:
path: Specifies the file path to save the PDF. If omitted, the PDF will be returned as a byte array.
scale: Sets the scale of the webpage rendering (default is 1.0).
displayHeaderFooter: When set to true, includes header and footer in the PDF.
headerTemplate and footerTemplate: HTML templates for the header and footer. Can include placeholders like <span class='pageNumber'></span>.
printBackground: When set to true, prints background graphics.
landscape: When set to true, prints the PDF in landscape orientation.
pageRanges: Specifies the page ranges to print, e.g., "1-5, 8, 11-13".
format: Sets the paper format, such as "A4", "Letter".
width and height: Sets the width and height of the paper in units (px, in, cm, mm).
margin: Sets margins for the PDF. Accepts a Margin object with top, right, bottom, left properties.
preferCSSPageSize: When set to true, uses the @page size defined in CSS.
Example of using multiple options:
Page.PdfOptions pdfOptions = new Page.PdfOptions()
.setPath("custom_invoice.pdf")
.setFormat("A4")
.setLandscape(false)
.setPrintBackground(true)
.setDisplayHeaderFooter(true)
.setHeaderTemplate("<div style='font-size:10px; text-align:center;'>Invoice Header</div>")
.setFooterTemplate("<div style='font-size:10px; text-align:center;'>Page <span class='pageNumber'></span> of <span class='totalPages'></span></div>")
.setMargin(new Margin().setTop("60px").setBottom("60px").setLeft("20px").setRight("20px"))
.setScale(1.0)
.setPageRanges("1-2")
.setPreferCSSPageSize(true);
page.pdf(pdfOptions);
How to Use a PDF API to Automate PDF Creation at Scale
For SaaS platforms, automating PDF generation at scale might require offloading the heavy lifting to a PDF API.
It's also an option to integrate with third-party APIs like pdforge you can handle high-volume PDF generation, complex formatting, and post-processing, all from a single backend call.
Here’s an example of how to integrate pdforge in Rails to convert HTML content into a PDF via an API call:
import java.io.OutputStreamWriter;
import java.net.HttpURLConnection;
import java.net.URL;
public class PdfForgeExample {
public static void main(String[] args) {
try {
URL url = new URL("https://api.pdforge.com/v1/pdf/sync");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("POST");
conn.setRequestProperty("Authorization", "Bearer your-api-key");
conn.setRequestProperty("Content-Type", "application/json");
conn.setDoOutput(true);
String jsonInputString = "{ \"templateId\": \"your-template\", \"data\": { \"html\": \"your-html\" } }";
try(OutputStreamWriter writer = new OutputStreamWriter(conn.getOutputStream())) {
writer.write(jsonInputString);
writer.flush();
}
int responseCode = conn.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
} else {
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
This code sends a POST request to the pdforge API, receives the generated PDF, and saves it locally.
Conclusion
Playwright Java offers a robust solution for converting HTML to PDF, capturing the nuances of modern web content with high fidelity. Its use of real browser engines ensures that your PDFs accurately reflect your HTML designs, making it ideal for generating complex documents like invoices.
However, if your application requires extensive PDF manipulation beyond rendering, traditional libraries like iText or Flying Saucer might be more suitable due to their advanced features for editing and annotating PDFs.
For SaaS platforms needing to automate PDF creation at scale without managing the rendering process, leveraging third-party PDF APIs like pdforge can be a strategic choice, offering scalability and reducing infrastructure overhead.