pdf libraries

pdf libraries

Python

Python

How to generate PDF from HTML Using Python-PDFKit

Marcelo Abreu, founder of pdforge

Marcelo | Founder

Marcelo | Founder

Oct 11, 2024

Oct 11, 2024

An Introduction to PDFKit: a Python PDF Generation Library

PDFKit is a popular Python library that simplifies the process of converting HTML to PDF, providing an easy way to style your documents using familiar web technologies like HTML and CSS. In this guide, we’ll walk through setting up PDFKit, configuring it to generate PDF documents from HTML, and even handling more advanced use cases like dynamic content and asynchronous generation.

You can check out the pypi documentation here.

Comparing PDFKit with Other Python PDF Libraries

pdfkit monthly downloads

There are many Python libraries available for generating PDFs. Libraries like ReportLab (4,788,417 monthly downloads) and PyPDF2 (9,982,763 monthly downloads) offer powerful document creation tools, but they often require manually defining document structure and layout. This can become cumbersome, especially if you’re more familiar with HTML and CSS.

PDFKit, on the other hand, excels by leveraging wkhtmltopdf, which converts HTML content directly into PDF. This allows you to use existing HTML templates, making it much easier to maintain and style your documents. While other libraries might provide lower-level control over PDF creation, PDFKit offers simplicity, leveraging familiar web technologies.

Guide to generate pdf from html using python pdfkit
Guide to generate pdf from html using python pdfkit

Setting Up Python-PDFKit for HTML to PDF Conversion

Installing PDFKit and Dependencies for a Smooth Setup

To start using PDFKit, you’ll first need to install both the pdfkit Python package and the wkhtmltopdf binary, which handles the heavy lifting of converting HTML to PDF. Start by installing the necessary dependencies:

For wkhtmltopdf, you’ll need to install it separately based on your OS. On Ubuntu, for instance, you can install it via:

sudo

On macOS, you can use Homebrew:

Once both are installed, PDFKit is ready to use. If you encounter issues with the installation, ensure that wkhtmltopdf is correctly configured in your system’s PATH.

Configuring wkhtmltopdf: The Engine Behind Python-PDFKit

wkhtmltopdf is the core engine that powers PDFKit, translating your HTML and CSS into a PDF file. For a smooth experience, make sure you configure the path to wkhtmltopdf correctly in your code. You can set the path manually if necessary:

import pdfkit
pdfkit_config = pdfkit.configuration(wkhtmltopdf='/usr/local/bin/wkhtmltopdf')
pdfkit.from_file('example.html', 'output.pdf', configuration=pdfkit_config)

By explicitly defining the path, you avoid potential issues with the binary not being found, especially in different environments like Docker or cloud servers.

Key Features of Python-PDFKit You Should Know

PDFKit allows you to generate PDFs from URLs, strings, or files. It provides extensive options for customizing the conversion process, such as setting margins, page sizes, and header/footer content.

Some key features include:

• Ability to convert HTML files, strings, or web pages.

• Support for custom page settings, like orientation and margins.

• Options for embedding metadata, like title, subject, and author.

• Advanced control over CSS for precise styling.

Essential Python Code Snippets to Convert HTML to PDF

Here’s an example of converting a simple HTML file to PDF using PDFKit:

import pdfkit
# Convert a local HTML file to PDF
pdfkit.from_file('example.html', 'output.pdf')
# Convert an HTML string directly to PDF
html_string = '<h1>Invoice</h1><p>This is an invoice for your order.</p>'
pdfkit.from_string(html_string, 'invoice.pdf')
# Convert a webpage to PDF
pdfkit.from_url('http://example.com', 'webpage.pdf')

With just a few lines of code, PDFKit handles the heavy lifting of transforming your HTML content into a polished PDF document.

Step-by-Step Guide: Generating PDFs from HTML Using PDFKit

Creating a Complete Invoice HTML/CSS File for Example

Let’s walk through creating an invoice PDF from an HTML template. Below is a basic example of an HTML invoice:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Invoice</title>
    <style>
        body { font-family: Arial, sans-serif; }
        .invoice-box { max-width: 800px; margin: auto; padding: 30px; }
        .invoice-table { width: 100%; border-collapse: collapse; }
        .invoice-table th, .invoice-table td { padding: 8px; border-bottom: 1px solid #ddd; }
    </style>
</head>
<body>
    <div class="invoice-box">
        <h1>Invoice</h1>
        <p>Invoice Date: {{ invoice_date }}</p>
        <p>Invoice #: {{ invoice_number }}</p>
        <table class="invoice-table">
            <thead>
                <tr>
                    <th>Item</th>
                    <th>Quantity</th>
                    <th>Price</th>
                </tr>
            </thead>
            <tbody>
                <tr>
                    <td>Product 1</td>
                    <td>2</td>
                    <td>$200</td>
                </tr>
            </tbody>
        </table>
    </div>
</body>
</html>

This HTML template defines the structure of the invoice and includes placeholders for dynamic content such as the invoice date and number.

Using PDFKit to Render HTML and Convert It to a PDF

Once you’ve designed the HTML template, converting it into a PDF with PDFKit is straightforward:

import pdfkit
pdfkit.from_file('invoice.html', 'invoice.pdf')

The result is a PDF file styled according to your HTML and CSS. You can customize this further by passing additional options to wkhtmltopdf, like page size or orientation:

options = {
    'page-size': 'Letter',
    'orientation': 'Portrait',
    'margin-top': '10mm',
    'margin-bottom': '10mm',
    'margin-left': '10mm',
    'margin-right': '10mm',
}
pdfkit.from_file('invoice.html', 'invoice.pdf', options=options)

Styling PDFs: Managing CSS for Professional-Looking Documents

One of the benefits of using HTML/CSS for PDF generation is that you can leverage all the power of CSS to style your documents. You can create tables, adjust font sizes, apply background colors, and more. Ensure your stylesheets are correctly linked in the HTML:

<link rel="stylesheet" href="styles.css">

This allows for clean separation of content and design, making it easier to maintain and update your PDFs.

Dynamic Data with HTML Template Engine

For dynamic content like invoices, you can use a templating engine like Jinja2 to populate your HTML template with data:

from jinja2 import Template
template = Template(open('invoice_template.html').read())
html_content = template.render(invoice_date='2024-10-10', invoice_number='12345')
pdfkit.from_string(html_content, 'dynamic_invoice.pdf')

Using Jinja2 ensures that you can dynamically generate content for each PDF without manually editing the HTML file.

Improving Performance: Asynchronous PDF Generation in Python

In high-traffic applications, generating PDFs synchronously might create bottlenecks. You can offload this task by using asynchronous Python libraries like Celery to generate PDFs in the background, improving performance and user experience:

from celery import Celery
app = Celery('tasks', broker='redis://localhost:6379/0')
@app.task
def generate_pdf_async(html):
    pdfkit.from_string(html, 'output.pdf')

This approach ensures scalability by allowing your application to handle PDF generation as a background process.

Debugging Common Issues When Converting HTML to PDF with PDFKit

Sometimes, HTML elements might not render as expected in your PDF. This could be due to unsupported CSS properties in wkhtmltopdf. Use the --debug-javascript flag to help identify issues with JavaScript execution, and ensure that all assets like fonts or images are correctly loaded.

How to Use a PDF API to Automate PDF Creation at Scale

For SaaS applications that need to generate PDFs at scale, integrating with a third-party PDF API can be a more efficient approach. APIs such as pdforge offer extensive features like watermarking, encryption, and built-in scaling capabilities. They simplify the process of handling high volumes of PDF generation requests without the overhead of managing your own infrastructure.

Here’s an example of how you might integrate with a PDF API:

import requests
import json

url = 'https://api.pdforge.com/v1/pdf/sync'
headers = {
    'Authorization': 'Bearer your-api-key',
    'Content-Type': 'application/json'
}
data = {
    'templateId': 'your-template',
    'data': {
        'html': 'your-html'
    }
}

response = requests.post(url, headers=headers, data=json.dumps(data))
with open('output.pdf', 'wb') as f:
        f.write(response.content)

Using a PDF API like this can help offload the heavy lifting involved in creating and managing large-scale PDF generation.

Conclusion

PDFKit is an excellent choice for generating PDFs from HTML when you need flexibility and ease of use in your SaaS application. However, for advanced use cases or large-scale deployments, you may want to consider third-party solutions like pdforge.

Generating pdfs at scale can be quite complicated!

Generating pdfs at scale can be quite complicated!

We take care of all of this, so you focus on what trully matters on your Product!

We take care of all of this, so you focus on what trully matters on your Product!

Try for free

7-day free trial

Table of contents

Title