pdf libraries

pdf libraries

Python

Python

How to Generate PDF from HTML Using ReportLab in Python

Marcelo Abreu, founder of pdforge

Marcelo | Founder

Marcelo | Founder

Oct 10, 2024

Oct 10, 2024

An Introduction to ReportLab: A Python PDF Generation Library

ReportLab is a powerful and flexible Python PDF generation library, well-suited for SaaS applications that require dynamic PDF creation. With ReportLab, you can programmatically create PDFs from scratch, offering deep customization for layouts, fonts, graphics, and tables. Its flexibility allows it to handle complex reporting needs, making it a top choice for developers seeking more than basic HTML-to-PDF conversion.

You can access the full documentation here.

Comparing ReportLab with Pyppeteer and PyPDF2

Number os montly downloads for reportlab

While ReportLab excels in customization and granular control, it is often compared with Pyppeteer (2,063,960 montly downloads) and PyPDF2 (9,982,763 monthly downloads), two other popular libraries.

Pyppeteer, based on Chromium, renders HTML into PDFs using a browser engine, offering pixel-perfect fidelity to web designs, but with less control over layout structure or programmatic content.

On the other hand, PyPDF2 focuses on manipulating existing PDFs—merging, splitting, and encrypting—making it ideal for tasks where PDFs need to be edited rather than created. ReportLab provides a balance, combining the flexibility of creation and customization that Pyppeteer lacks, while offering more control than PyPDF2 in terms of building documents from the ground up.

Guide to generate pdf from html using python reportlab
Guide to generate pdf from html using python reportlab

Setting Up the Environment for HTML to PDF with ReportLab

To start working with ReportLab, we need to install the necessary dependencies and create a project setup.

Installing ReportLab and Required Dependencies

First, install ReportLab via pip:

We’ll also need lxml for parsing HTML content:

A Quick Overview of Python and HTML to PDF Conversion

ReportLab doesn’t directly convert HTML to PDF, but rather, it allows you to create PDFs by manipulating Python objects, with full control over the document structure. We’ll use an HTML invoice example and convert its structure to a PDF format using ReportLab.

Setting Up a Basic Python Project for PDF Generation

Organize your project as follows:

invoice_pdf/

├── html_templates/
│   └── invoice.html
└── pdf_generator.py

In html_templates/invoice.html, we’ll create a simple invoice template:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Invoice</title>
    <style>
        body { font-family: Arial, sans-serif; }
        .invoice-box { max-width: 800px; margin: auto; padding: 30px; border: 1px solid #eee; }
        .heading { background: #eee; padding: 10px; font-weight: bold; }
        .details { margin-top: 20px; }
        .details td { padding: 5px; }
        .totals { margin-top: 20px; text-align: right; }
    </style>
</head>
<body>
    <div class="invoice-box">
        <h1>Invoice</h1>
        <table class="details">
            <tr>
                <td><strong>Invoice #:</strong> 12345</td>
                <td><strong>Date:</strong> 2024-10-07</td>
            </tr>
            <tr>
                <td><strong>Client:</strong> John Doe</td>
                <td><strong>Due Date:</strong> 2024-11-07</td>
            </tr>
        </table>
        <table class="details" border="1">
            <tr class="heading">
                <td>Item</td>
                <td>Price</td>
            </tr>
            <tr>
                <td>Website Design</td>
                <td>$300.00</td>
            </tr>
            <tr>
                <td>Hosting (3 months)</td>
                <td>$75.00</td>
            </tr>
        </table>
        <table class="totals">
            <tr>
                <td><strong>Total:</strong> $375.00</td>
            </tr>
        </table>
    </div>
</body>
</html>

We will convert this invoice into a PDF with ReportLab.

Step-by-Step Guide: How to Generate PDF from HTML Using ReportLab

Creating a Simple HTML Structure for Conversion

Load the HTML content into your Python script using lxml:

from lxml import etree
with open('html_templates/invoice.html', 'r') as file:
    html_content = file.read()
parser = etree.HTMLParser()
tree = etree.fromstring(html_content, parser)

The HTML structure represents an invoice template with basic CSS for styling. Now, we will use ReportLab to programmatically create a PDF from this structure.

Converting HTML to PDF with ReportLab: The Core Functions

In pdf_generator.py, we can now use ReportLab’s SimpleDocTemplate and Paragraph components to structure the PDF:

from reportlab.lib.styles import getSampleStyleSheet
from reportlab.platypus import SimpleDocTemplate, Paragraph, Table
doc = SimpleDocTemplate("invoice.pdf")
styles = getSampleStyleSheet()
content = []
# Extracting content from HTML
for element in tree.iter():
    if element.tag == 'h1':
        content.append(Paragraph(f"<h1>{element.text}</h1>", styles['Heading1']))
    elif element.tag == 'td' and element.text:
        content.append(Paragraph(element.text, styles['BodyText']))
doc.build(content)

This code will capture the basic elements of our invoice and create a PDF.

Customizing PDF Output: Fonts, Styles, and Layout Adjustments

ReportLab allows extensive customization. For example, let’s adjust fonts, margins, and other layout settings:

from reportlab.lib.pagesizes import A4
from reportlab.lib import colors
doc = SimpleDocTemplate("invoice.pdf", pagesize=A4, rightMargin=30, leftMargin=30, topMargin=30, bottomMargin=30)
styles = getSampleStyleSheet()
# Customizing font and colors
custom_styles = styles['BodyText']
custom_styles.fontName = 'Helvetica-Bold'
custom_styles.textColor = colors.HexColor("#333333")

This gives you control over how the text appears on the PDF, aligning it with your branding.

Handling Dynamic Content: Generating PDFs from Real-Time Data

Dynamic content is crucial for invoices that pull from databases or APIs. Here’s how you could integrate Python variables into the PDF generation process:

invoice_data = {
    'invoice_number': 12345,
    'client_name': 'John Doe',
    'date': '2024-10-07',
    'due_date': '2024-11-07',
    'items': [
        {'description': 'Website Design', 'price': 300.00},
        {'description': 'Hosting (3 months)', 'price': 75.00}
    ],
    'total': 375.00
}
content.append(Paragraph(f"Invoice #: {invoice_data['invoice_number']}", styles['BodyText']))
content.append(Paragraph(f"Client: {invoice_data['client_name']}", styles['BodyText']))

This approach allows you to dynamically generate invoices based on user input or database records.

Modifying an Existing HTML File Using ReportLab

One of ReportLab’s powerful features is the ability to modify existing content, such as adding elements to an HTML template or adjusting styles on-the-fly. Suppose you want to update the invoice by adding a company logo or changing the layout dynamically. Here’s how you could achieve that:

from reportlab.lib.units import inch
doc = SimpleDocTemplate("modified_invoice.pdf")
content = []
# Adding a logo to the invoice
logo = "path_to_logo.png"
content.append(Image(logo, 2*inch, 2*inch))
# Adding modified content from the original HTML
for element in tree.iter():
    if element.tag == 'h1':
        content.append(Paragraph(f"Updated {element.text}", styles['Heading1']))
doc.build(content)

This allows for real-time adjustments to HTML-based templates, providing flexibility for modifying documents.

How to Use a PDF API to Automate PDF Creation at Scale

When dealing with large-scale PDF generation, especially in a SaaS environment, automation becomes key. Although ReportLab is a powerful library for generating PDFs, integrating it with a PDF API can streamline the process, particularly when working with high volumes or requiring web-based solutions.

A popular PDF API such as pdforge allows you to offload the rendering and scaling to an external service. Here’s how you can integrate it:

import requests
import json

url = 'https://api.pdforge.com/v1/pdf/sync'
headers = {
    'Authorization': 'Bearer your-api-key',
    'Content-Type': 'application/json'
}
data = {
    'templateId': 'your-template',
    'data': {
        'html': 'your-html'
    }
}

response = requests.post(url, headers=headers, data=json.dumps(data))
with open('output.pdf', 'wb') as f:
        f.write(response.content)

By integrating ReportLab with a PDF API, you can automate bulk PDF generation and scale it across multiple platforms, reducing processing time and system load.

Conclusion

ReportLab offers unparalleled flexibility for creating custom PDF reports in Python, making it ideal for SaaS developers who need dynamic, branded, and highly customized PDFs.

While Pyppeteer is great for pixel-perfect web-to-PDF rendering and PyPDF2 excels at editing existing PDFs, ReportLab stands out with its deep programmatic control over document creation. It’s the best choice for developers needing extensive layout, font, and content customization.

However, for large-scale automation, integrating with a third-party PDF API like pdforge can streamline PDF generation at scale.

Generating pdfs at scale can be quite complicated!

Generating pdfs at scale can be quite complicated!

We take care of all of this, so you focus on what trully matters on your Product!

We take care of all of this, so you focus on what trully matters on your Product!

Try for free

7-day free trial

Table of contents

Title