pdforge

Product

Resources

Integrations

Pricing

Try for free

pdforge

PDF Libraries

Python

Generate PDF from HTML Using Playwright Python

Marcelo | Founder

Oct 7, 2024

An Introduction to PDF Generation with Python and Playwright

Generating PDFs from HTML is a common requirement in SaaS applications, whether it’s for creating reports, invoices, or exporting user data. Playwright is a powerful browser automation tool that integrates seamlessly with Python, making it an ideal solution for converting HTML into PDFs. Its headless browser capabilities enable you to accurately render web pages and convert them into high-quality PDFs that support CSS, JavaScript, and dynamic content.

Playwright have a robust documentation that you can access.

If you're using other programming languages, check out our other guides:

Alternative PDF Libraries: How Playwright Compares to Other Tools

While Playwright is highly capable, there are alternative libraries you can consider:

• Pyppeteer (2,063,960 monthly downloads): A Python port of Puppeteer, which can also control a browser for rendering HTML to PDF. However, Playwright offers more flexibility with multi-browser support and superior handling of dynamic content.

• PyPDF2 (9,982,763 monthly downloads): This is a Python library for manipulating existing PDFs. While it cannot convert HTML directly into PDF, it can be useful if you need to merge or split PDF documents after generation.

Playwright’s ability to directly interact with HTML, CSS, and JavaScript gives it a clear advantage for more complex PDF generation needs.

If you want to dig deeper on a comparison between playwright and other python pdf libraries, we also have a detailed article with a full comparison between the best PDF libraries for python in 2025.

Guide to generate pdf from html using python playwright

Setting Up Playwright and Python for HTML to PDF Generation

Installing Playwright and Setting Up Your Python Environment

Start by installing Playwright along with the necessary Python bindings:

pip install playwright
python -m

This setup allows you to control a headless browser using Python, which is essential for rendering HTML into PDF. Once installed, you can launch a browser, load your HTML content, and export it as a PDF.

Configuring Playwright for HTML to PDF Conversion

Here’s a basic script to get started with generating PDFs using Playwright:

from playwright.sync_api import sync_playwright
def generate_pdf(html_file, output_pdf):
    with sync_playwright() as p:
        browser = p.chromium.launch()
        page = browser.new_page()
        page.goto(f'file:///{html_file}')
        page.pdf(path=output_pdf)
        browser.close()
generate_pdf('invoice.html', 'output.pdf')

In this example, the script loads an HTML file, renders it, and exports it as a PDF. You can customize the page.pdf() function to adjust formatting options, such as margins, page size, and more.

Creating a Complete Invoice HTML/CSS File for Example

Let’s use a basic invoice template to illustrate how HTML files are converted. The following HTML provides a simple, styled invoice:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Invoice</title>
    <style>
        body { font-family: Arial, sans-serif; padding: 20px; }
        .invoice-header { text-align: center; margin-bottom: 20px; }
        .invoice-details { border: 1px solid #ddd; padding: 20px; }
        .total { font-weight: bold; margin-top: 20px; }
    </style>
</head>
<body>
    <div class="invoice-header">
        <h1>Invoice #12345</h1>
        <p>Date: 2024-10-01</p>
    </div>
    <div class="invoice-details">
        <p>Client: John Doe</p>
        <p>Service: Software Development</p>
        <p>Amount Due: $1000</p>
    </div>
    <div class="total">
        <p>Total: $1000</p>
    </div>
</body>
</html>

This invoice will be rendered into a PDF using Playwright, with the formatting from the CSS directly applied.

Step-by-Step Guide: Generating PDFs from HTML Using Playwright

Using Playwright to Render HTML and Convert It to a PDF

To generate a PDF from HTML, use Playwright’s page.pdf() function. You can specify the output format and customize settings like margins and page orientation:

page.pdf(path='output.pdf', format='A4', margin={'top': '20px', 'bottom': '20px'})

Playwright also supports adding headers and footers, which can include dynamic elements like page numbers:

page.pdf(path='output.pdf', displayHeaderFooter=True, 
         headerTemplate='<div>Invoice Header</div>',
         footerTemplate='<div>Page <span class="pageNumber"></span> of <span class="totalPages"></span></div>')

This provides full control over how the final PDF is structured, making it ideal for professional documents.

HTML Template Engines: Jinja2 for Dynamic Content

When generating dynamic reports or invoices, manually creating HTML files can be cumbersome. Instead, you can use a template engine like Jinja2 to automate this process. Jinja2 allows you to create HTML templates and populate them with dynamic content.

Here’s how you can use Jinja2 with Python:

from jinja2 import Template
html_template = """
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Invoice</title>
</head>
<body>
    <h1>Invoice #{{ invoice_id }}</h1>
    <p>Client: {{ client_name }}</p>
    <p>Amount Due: ${{ amount_due }}</p>
</body>
</html>
"""
template = Template(html_template)
html_content = template.render(invoice_id="12345", client_name="John Doe", amount_due="1000")
with open('invoice.html', 'w') as file:
    file.write(html_content)

Jinja2 templates enable you to generate complex, data-driven HTML dynamically, streamlining the PDF creation process.

Best Practices for Optimizing HTML to PDF Performance in Production

Optimizing HTML to PDF conversion at scale is critical for SaaS applications. Here are some best practices:

• Minimize JavaScript: Use lightweight JavaScript in your HTML templates to avoid performance bottlenecks.

• Use Serverless Architectures: Deploy your PDF generation logic in a serverless environment, such as AWS Lambda or Google Cloud Functions. This allows you to scale your PDF generation automatically based on demand, reducing infrastructure costs. We have a full guide on how to deploy playwright on AWS Lambda here.

• Cache Resources: Caching CSS and other assets helps reduce load times when generating PDFs from the same template multiple times.

• Leverage Asynchronous Processes: Use Python’s async capabilities to handle multiple PDF generation requests simultaneously, improving overall throughput.

Security Considerations: Ensuring Safe PDF Generation in SaaS Systems

When generating PDFs in a SaaS context, security is paramount. Ensure the following:

• Sanitize Input: Prevent malicious content by validating and sanitizing user input in your HTML templates.

• Isolated Environments: Run Playwright in isolated, sandboxed environments to prevent potential security breaches.

• Access Control: Ensure that sensitive data embedded in PDFs is only accessible by authorized users.

How to Use a PDF API to Automate PDF Creation at Scale

pdforge is a third-party pdf generation API. You can create beautiful reports with flexible layouts and complex components with an easy-to-use opinionated no-code builder. Let the AI do the heavy lifting by generating your templates, creating custom components or even filling all the variables for you.

You can handle high-volume PDF generation from a single backend call.

Here’s an example of how to generate pdf with pdforge via an API call:

import requests
import json

url = 'https://api.pdforge.com/v1/pdf/sync'
headers = {
    'Authorization': 'Bearer your-api-key',
    'Content-Type': 'application/json'
}
data = {
    'templateId': 'your_invoice_template_id_here',
    'data': {
          "invoice_number": "your invoice_number here",
          "invoice_date": "your invoice_date here",
          # ...other invoice variables here
    }
}

response = requests.post(url, headers=headers, data=json.dumps(data))
with open('output.pdf', 'wb') as f:
        f.write(response.content)

You can create your account, experience our no-code builder and create your first layout template without any upfront payment clicking here.

Conclusion

Playwright and Python provide an effective solution for generating PDFs from HTML, especially when you need precise control over rendering and layout. However, for applications that require large-scale PDF generation or more advanced features, a third-party service like pdforge may be the best option, offering both scalability and additional functionality.

Generating pdfs at scale can be annoying!

Let us help you make it easier while you focus on what truly matters for your company.

Try for free

7-day free trial

Title

Automate PDF Generation with pdforge

No code or design experience needed

AI creates your template in seconds

Fine tune the design in our no-code builder

Generate PDFs with our API or integrations

Try for free

All services are online

PDF Libraries

PDF Libraries

Python

Python

Generate PDF from HTML Using Playwright Python

An Introduction to PDF Generation with Python and Playwright

Alternative PDF Libraries: How Playwright Compares to Other Tools

Setting Up Playwright and Python for HTML to PDF Generation

Installing Playwright and Setting Up Your Python Environment

Configuring Playwright for HTML to PDF Conversion

Creating a Complete Invoice HTML/CSS File for Example

Step-by-Step Guide: Generating PDFs from HTML Using Playwright

Using Playwright to Render HTML and Convert It to a PDF

HTML Template Engines: Jinja2 for Dynamic Content

Best Practices for Optimizing HTML to PDF Performance in Production

Security Considerations: Ensuring Safe PDF Generation in SaaS Systems

How to Use a PDF API to Automate PDF Creation at Scale

Table of contents