pdf libraries

pdf libraries

Javascript

Javascript

How to Generate PDF from HTML Using Puppeteer in Node.js

Marcelo Abreu, Founder of pdforge,  picture

Marcelo | Founder

Marcelo | Founder

Oct 1, 2024

Oct 1, 2024

Introduction and the Growing Need for Automated PDF Reports

Automated PDF generation has become a vital component in SaaS platforms, especially when it comes to providing dynamic content like reports, invoices, and contracts. Manual generation is inefficient, and for many SaaS applications, the ability to convert HTML to a PDF programmatically is essential for scalability. Puppeteer, a headless Chrome Node.js API, enables you to automate this process with precision, handling complex layouts and modern web technologies seamlessly.

Puppeteer also has a vast documentation that you can access here.

Why It’s Recommended to Use Playwright Instead of Puppeteer

While Puppeteer remains a powerful choice, Playwright—a newer framework developed by Microsoft—has emerged as an attractive alternative. Playwright builds upon Puppeteer’s core functionality, but offers additional features like multi-browser support and better cross-platform reliability. Moreover, it handles more complex browser automation scenarios, making it an excellent option for teams looking to scale their PDF generation or run automated tests alongside their PDF workflows.

If you go to npmtrends, you can see the behaviour of users moving from Puppeteer to Playwright:

trend to puppeteer users migrating to playwright


Html to pdf using puppeteer
Html to pdf using puppeteer

Setting Up Puppeteer in a Node.js Environment

Installing Puppeteer and Node.js: Step-by-Step Guide

To install Puppeteer, run:

bash
npm install puppeteer


Ensure Node.js is installed and set up your project structure as follows:

/project-root
  /templates
    - report.html
  /src
    - generatePdf.js
  package.json

Exploring Puppeteer’s Capabilities for PDF Creation

Puppeteer gives you full control over rendering web content and saving it as PDFs. With built-in browser automation, it easily handles HTML, CSS, and JavaScript, making it ideal for generating dynamic documents that resemble what users see in the browser.

Converting HTML to PDF Using Puppeteer

Example HTML Report Interface

Here’s a simple HTML template representing a mocked report interface:

html
<!DOCTYPE html>
<html>
<head>
  <title>Monthly Report</title>
  <style>
    body { font-family: Arial, sans-serif; padding: 20px; }
    .header { text-align: center; }
    .report-table { width: 100%; border-collapse: collapse; }
    .report-table th, .report-table td { border: 1px solid #ddd; padding: 8px; }
    .report-table th { background-color: #f2f2f2; }
  </style>
</head>
<body>
  <div class="header">
    <h1>Monthly Financial Report</h1>
    <p>October 2024</p>
  </div>
  <table class="report-table">
    <thead>
      <tr>
        <th>Item</th>
        <th>Amount</th>
        <th>Date</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>Revenue</td>
        <td>$20,000</td>
        <td>01/10/2024</td>
      </tr>
      <tr>
        <td>Expenses</td>
        <td>$5,000</td>
        <td>05/10/2024</td>
      </tr>
      <tr>
        <td>Net Profit</td>
        <td>$15,000</td>
        <td>31/10/2024</td>
      </tr>
    </tbody>
  </table>
</body>
</html>

You can use this interface to generate a PDF using Puppeteer.

Example 1: Using an HTML File to Generate a PDF Report with Puppeteer

const puppeteer = require('puppeteer');
(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto(`file://${__dirname}/../templates/report.html`);
  
  await page.pdf({ 
    path: 'output.pdf',
    format: 'A4', 
    printBackground: true,
    landscape: false,
    margin: { top: '20px', right: '20px', bottom: '20px', left: '20px' },
    scale: 1,
    displayHeaderFooter: false,
  });
  
  await browser.close();
})();

Example 2: Using an Internal URL of Your SaaS to Generate the PDF with Puppeteer

const puppeteer = require('puppeteer');
(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://your-saas.com/dashboard/report');
  
  await page.pdf({
    path: 'dashboard_report.pdf',
    format: 'A4',
    margin: { top: '30px', bottom: '30px' },
    landscape: true,
    printBackground: true
  });
  
  await browser.close();
})();

Key Puppeteer Methods for PDF Generation: An In-Depth Look

Puppeteer’s page.pdf() function provides flexibility in generating PDFs. Here are the options:

- path: The output file name for the PDF.

- format: Paper size, e.g., `A4`, `Letter`, `Legal`.

- landscape: Whether to print in landscape mode. Default is `false`.

- scale: Scale of the webpage rendering (between `0.1` and `2`).

- printBackground: Include background colors/images. Default is `false`.

- margin: Margins for the PDF, specified as an object with `top`, `bottom`, `left`, and `right` properties.

- displayHeaderFooter: Adds additional headers/footers, useful for page numbers.

Best Practices for Formatting and Styling HTML Before PDF Conversion

To ensure consistency in your PDFs, structure your HTML using clean and semantic code. Leverage print-specific stylesheets (@media print) to fine-tune your document’s appearance. Avoid elements that won’t translate well in a static PDF, like videos or JavaScript-driven interactions.

await page.addStyleTag({
        content: `
        @media print {
            body {
                font-size: 12px;  /* Use smaller fonts to reduce content size */
                margin: 10mm;     /* Narrow margins */
                -webkit-print-color-adjust: exact; /* Show background color */
            }
            img {
                max-width: 100%;  /* Ensure images don't stretch beyond the page */
                height: auto;
            }
            .nav-bar, .footer {   /* Hide unnecessary elements */
                display: none;
            }
        }`
    });

Optimizing Your Puppeteer-Based PDF Stack

Making a Serverless Architecture for Scaling Requests and Memory Usage

For large-scale PDF generation, consider serverless solutions such as AWS Lambda. This enables you to process PDF requests without overloading your servers, ensuring that each request operates independently with optimal memory usage.

Debugging and Troubleshooting Common Puppeteer Issues

Use headful mode for debugging by launching Puppeteer with { headless: false }. This lets you see what’s happening in real-time during PDF generation, especially useful when dealing with dynamic content or CSS issues.

Tips for Optimizing Performance in Large-Scale PDF Generation

Reuse browser instances to save on memory and processing time. Launching a new browser for each request is resource-intensive, especially when handling high volumes. Here’s an optimized approach:

const browser = await puppeteer.launch();
const page = await browser.newPage();
for (let url of urls) {
  await page.goto(url);
  await page.pdf({ path: `${url}.pdf` });
}
await browser.close();

Conclusion

Although we have other HTML-to-PDF conversion libraries, such as Playwright, pdf-lib, or jsPDF, Puppeteer is a great and versatile tool for generating PDFs from HTML in Node.js. Its headless browser lets developers automate and scale PDF creation. Whether it’s financial reports, invoices, or dynamic content, Puppeteer provides accurate rendering and flexibility, making it a key solution for SaaS platforms.

Generating pdfs at scale can be quite complicated!

Generating pdfs at scale can be quite complicated!

We take care of all of this, so you focus on what trully matters on your Product!

We take care of all of this, so you focus on what trully matters on your Product!

Try for free

7-day free trial

Table of contents

Title