pdf libraries

pdf libraries

Javascript

Javascript

How to Generate PDF from HTML with Playwright with Node.js: Full Guide

Marcelo Abreu, Founder of pdforge,  picture

Marcelo | Founder

Marcelo | Founder

Sep 24, 2024

Sep 24, 2024

Introduction to PDF Generation with Playwright

If you’re developing a SaaS platform, you likely need to generate PDF reports. Playwright is a powerful tool that makes it easy to render HTML and convert it to a PDF. Below is a detailed guide on implementing Playwright..

Generating PDFs from HTML content is a core requirement for many web applications today, whether for invoices, reports, or downloadable documentation. HTML’s flexibility, combined with PDF’s static nature, allows developers to create content that is dynamic online and fixed in print form. Playwright emerges as a compelling tool for this task.

Initially known for browser test automation and scrapping, Playwright has rapidly become a go-to solution for automating HTML-to-PDF conversions.

Why use Playwright for this task?

Playwright excels in browser automation because it supports multiple browsers—Chromium, Firefox, and WebKit. This cross-browser capability ensures that your HTML renders correctly, capturing the exact visual state of a webpage before converting it into a PDF.

Unlike many traditional tools that convert HTML to PDF, Playwright operates at the browser level, meaning it captures the actual page rendering, including JavaScript-driven dynamic content. This makes Playwright an invaluable tool for automating HTML to PDF processes across modern web applications.

Playwright also has well-written, robust documentation that you can use if you run into any trouble.

Key Benefits of Playwright Over Other Tools

Playwright offers several significant advantages over other PDF generation tools. First, it operates in a headless mode, which makes it highly efficient for server-side workflows where no GUI rendering is needed. Unlike simpler libraries, Playwright provides full control over the page rendering process, allowing for fine-tuning of PDF output—whether adjusting page size, margins, or dealing with dynamic content. Its ability to handle complex JavaScript scenarios also makes it more versatile than basic HTML-to-PDF converters. Additionally, with native support for multiple browsers, Playwright is ideal for creating PDFs that are consistent across different environments.

print from npmtrends showing playwright growth

Among its competitors, Playwright has seen the most significant growth in downloads and popularity in recent years, as demonstrated by the npmtrends data, solidifying its position as a leading solution as a html to pdf javascript library.

Html to pdf using playwright
Html to pdf using playwright

Setting Up Playwright for HTML to PDF Conversion

Installation Guide: How to Install Playwright with Node.js

You can also use Playwright with other programming languages such as Pyhon, Java and .NET, but on this guide, we're going to be focused on Node.js.

Setting up Playwright with Node.js is a straightforward process. If you haven’t installed Node.js yet, you’ll need to do that first. Once Node.js is ready, Playwright can be installed via npm (Node Package Manager) with a simple command:

npm install playwright

This command installs Playwright along with its browser engines. By default, Playwright includes support for Chromium, WebKit, and Firefox, but you can customize your installation to include only the engines you plan to use.

Configuring Your Environment for HTML to PDF Conversion

After installation, the next step is to configure Playwright for your specific needs. Playwright’s PDF generation process is based on controlling the rendering of web pages in a headless browser. You’ll want to configure the environment to ensure that the system has enough resources to handle browser automation tasks, especially if you’re processing large volumes of HTML.

Basic Example: Generating PDF from a URL

Once the environment is configured, it’s time to write your first Playwright script to convert HTML into a PDF. This basic script captures a web page and saves it as a PDF:

const { chromium } = require('playwright');

(async () => {
  const browser = await chromium.launch();
  const page = await browser.newPage();
  await page.goto('https://your-saas-url.com/report', { waitUntil: 'networkidle' });
  await page.pdf({ path: 'report.pdf', format: 'A4' });
  await browser.close();
})();

This script launches a headless Chromium browser, navigates to a webpage, and generates a PDF. It’s a simple starting point, but from here, you can add customization options to tailor the PDF output.

If you wan to launch a different browser context, you can easily do so, following the documentation here.

Other Alternative: Generating PDF from an HTML file

If you have a pre-compiled HTML file, Playwright can also convert it to a PDF:

1. Create an index.html file:

<!DOCTYPE html>
<html>
<head>
  <title>Report</title>
</head>
<body>
  <h1>Monthly SaaS Report</h1>
  <p>Your detailed analytics for the month.</p>
</body>
</html>

2. Use Playwright to load and convert the HTML:

const { chromium } = require('playwright');
const fs = require('fs');

(async () => {
  const browser = await chromium.launch();
  const page = await browser.newPage();
  const content = fs.readFileSync('index.html', 'utf8');
  await page.setContent(content);
  await page.pdf({ path: 'local-report.pdf', format: 'A4' });
  await browser.close();
})();


HTML Template Engines

If you want to generate the HTML dynamically before converting it to PDF, consider using a templating engine. Some common ones include:

Handlebars: Allows logic in your templates, such as loops or conditions.

EJS: Simple and effective, great for embedding JavaScript in HTML.

Pug: Offers a shorthand syntax for HTML, reducing boilerplate.

Here's an example with Handlebars:

const Handlebars = require('handlebars');
const template = Handlebars.compile('<h1>{{title}}</h1><p>{{body}}</p>');
const html = template({ title: 'Monthly Report', body: 'Detailed analysis for the month.' });
await page.setContent(html);


Writing Playwright Script for PDF Generation

Writing Playwright scripts for PDF generation is highly flexible. Playwright’s API gives you full control over how HTML is rendered and saved as a PDF. From customizing page layouts to handling dynamic content, Playwright ensures that the final PDF is a faithful representation of the original web page.

Step-by-Step Guide to Writing a Playwright Script for HTML-to-PDF

A basic HTML-to-PDF script follows this structure: launch the browser, navigate to the page, and call Playwright’s pdf() function to generate the file. However, Playwright allows for much more customization.

await page.pdf({
  path: 'output.pdf',
  format: 'A4',
  printBackground: true,
  margin: { top: '20px', bottom: '20px', right: '10px', left: '10px' }
});

In this example, the format is set to A4, and margins are customized. Playwright also supports features like printing background graphics, which can be essential for web pages with complex designs.

Customization is one of the strongest aspects of Playwright’s PDF generation. You can set the page format to common sizes such as A4 or Letter, adjust margins for optimal content layout, and choose between portrait or landscape orientation depending on the document’s content.

Working with Dynamic Content: Converting Web Forms and Interactive Elements

One of Playwright’s key strengths is its ability to handle dynamic web content. Many web pages today are powered by JavaScript, with forms or interactive components that may not render properly with simpler PDF tools. Playwright captures the actual state of the page after all scripts have run, ensuring that forms, buttons, or interactive elements are correctly represented in the final PDF.

await page.goto('https://your-saas-url.com/report', { waitUntil: 'networkidle' });

Using Playwright’s PDF API: Fine-Tuning Your PDF Output

Playwright’s PDF API offers options to fine-tune the output. For instance, you can set the scale of the content, define paper size, or even add headers and footers dynamically. These options give developers unparalleled control over the final document.

Adding Headers, Footers, and Page Numbers with Playwright

Many official documents require headers, footers, or page numbers. With Playwright, you can dynamically add these elements by injecting HTML templates into the PDF output. This ensures consistency across all pages.

await page.pdf({
  path: 'header-footer-report.pdf',
  format: 'A4',
  displayHeaderFooter: true,
  headerTemplate: '<span style="font-size:10px;">SaaS Report Header</span>',
  footerTemplate: '<span style="font-size:10px;">Page <span class="pageNumber"></span> of <span class="totalPages"></span></span>',
  margin: { top: '50px', bottom: '50px' }
});

Troubleshooting and Best Practices for Playwright PDF Generation

Even with a solid setup, you may encounter challenges. Understanding how to troubleshoot and optimize your workflow is essential.

Creating a Serverless Service to Scale Your PDF Generation

One advanced option for scaling PDF generation is to create a serverless service. By deploying Playwright on platforms like AWS Lambda, you can build a highly scalable, on-demand PDF generation service that handles multiple requests simultaneously.

We have a full guide on how to deploy playwright on AWS Lambda here.

Common Issues in Playwright PDF Generation and How to Fix Them

Common issues include incorrect font rendering, missing assets, or layout inconsistencies. These problems often arise due to misconfigured resource paths or improper handling of external files like CSS or images. Ensuring all assets are correctly loaded and available during the page render can resolve most issues.

If you can see an image on the website but the image isn't appearing properly on the output pdf, you can try to emulate the media type before calling the pdf() function, like the following code:

await page.emulateMedia({ media: "screen" });
await page.pdf({ path: 'example.pdf' });

Optimizing Performance: Reducing File Size and Enhancing Output Quality

PDF file size can quickly become an issue when handling large images or extensive content. By compressing assets and using efficient CSS, you can reduce the file size without sacrificing quality. Playwright also provides options for adjusting the resolution and compression of images in the PDF.

   await page.addStyleTag({
        content: `
        @media print {
            body {
                font-size: 12px;  /* Use smaller fonts to reduce content size */
                margin: 10mm;     /* Narrow margins */
                -webkit-print-color-adjust: exact; /* Show background color */
            }
            img {
                max-width: 100%;  /* Ensure images don't stretch beyond the page */
                height: auto;
            }
            .nav-bar, .footer {   /* Hide unnecessary elements */
                display: none;
            }
        }`
    });
  
  // Ensure that your report renders correctly without unnecessary overhead.
  await page.setViewportSize({ width: 1280, height: 800 });

    // Generate PDF with optimizations
    await page.pdf({
        path: 'output.pdf',
        format: 'A4',
        printBackground: true,  // Enable background graphics
        margin: { top: '10mm', bottom: '10mm' },  // Reduced margins
        scale: 0.9,             // Slightly scale down content to reduce file size
        quality: 80,            // Adjust quality (affects images)
    });

Conclusion

We have other HTML-to-PDF conversion libraries, such as Puppeteer, pdf-lib, or jsPDF, but Playwright offers an unparalleled combination of power, flexibility, and control for generating PDFs from HTML content. Whether you’re working with static pages or complex, dynamic content, Playwright’s robust API ensures that your PDFs will be a true representation of your original web content.

Generating pdfs at scale can be quite complicated!

Generating pdfs at scale can be quite complicated!

We take care of all of this, so you focus on what trully matters on your Product!

We take care of all of this, so you focus on what trully matters on your Product!

Try for free

7-day free trial

Table of contents

Title