How to Scale HTML to PDF with Serverless and Puppeteer
Introduction to Serverless HTML to PDF Conversion
Looking for a straightforward guide to deploy HTML to PDF capabilities on a serverless architecture using AWS Lambda with Puppeteer? You’ve come to the right place!
While numerous tutorials explain HTML-to-PDF libraries, practical guidance on scaling this setup is rare. In this article, we’ll walk through implementing a scalable solution for generating PDFs in your SaaS environment.
Why Scalable PDF Generation Matters in SaaS
SaaS applications often require PDF generation for invoices, reports, or user-specific documents. Traditional server-based solutions can quickly become resource-heavy and difficult to scale. By using AWS Lambda’s serverless model, you can automatically handle scaling, reducing operational complexity and costs.
Puppeteer Overview
Puppeteer is a Node.js library that provides a high-level API to control Chrome or Chromium. It’s ideal for rendering web pages and converting them into PDFs. Running Puppeteer in a headless mode makes it well-suited for serverless environments like AWS Lambda.
While there are numerous resources on setting up Puppeteer for PDF generation, this guide focuses on integrating Puppeteer into a serverless AWS Lambda environment.
We have several guides on how to use puppeteer for pdf generation, so this article will focus mainly on the serverless architecture, but you can check out the full guides here:
Setting Up Puppeteer and AWS Lambda for Serverless PDF Generation
Integrating Puppeteer with AWS Lambda lets you generate PDFs on-demand without worrying about underlying server maintenance.
Implementing the HTML to PDF Serverless Function
First, set up a Node.js project and install Puppeteer
Create a script that converts HTML to PDF:
Configuring AWS Lambda
AWS Lambda doesn’t ship with Chromium by default, so you’ll rely on chrome-aws-lambda for a precompiled binary. Ensure that you’ve deployed your code along with the node_modules that include chrome-aws-lambda and puppeteer-core.
If you need more customization, consider a Lambda Layer containing Chromium binaries. However, chrome-aws-lambda is often the easiest route.
Configuring Lambda with Docker (Recommended)
To simplify dependencies and ensure a consistent environment, you can bundle everything using Docker.
Dockerfile Example:
For more details on optimizing your Docker image, consider this resource on building custom Docker images for AWS Lambda.
Alternative: Create and Deploy a Chromium Lambda Layer
First, you need to create a Lambda Layer that includes the Chromium binary compatible with AWS Lambda’s execution environment.
Steps to Create the Layer:
1. Download a Compatible Chromium Binary:
You can download a precompiled Chromium binary optimized for AWS Lambda from repositories like alixaxel/chrome-aws-lambda or serverless-chrome. Alternatively, you can build your own Chromium binary tailored to your needs.
2. Prepare the Directory Structure:
AWS Lambda Layers expect a specific directory structure. For executables, place Chromium in the /bin directory.
3. Add Chromium to the Layer:
Place the downloaded Chromium binary into the layer/bin directory.
4. Create the ZIP Archive:
Zip the layer directory to create the Lambda Layer package.
5. Upload the Layer to AWS Lambda:
• Navigate to the AWS Lambda Console.
• Go to Layers in the left-hand menu.
• Click Create layer.
• Provide a name (e.g., chromium-layer).
• Upload the chromium-layer.zip file.
• Specify the compatible runtime (e.g., Node.js 14.x, Node.js 16.x, etc.).
• Click Create.
But we'd recommend using chrome-aws-lambda instead.
Full Lambda Function Example with Dynamic HTML
To generate PDFs from dynamic HTML content (instead of navigating to a URL), modify the handler:
Uploading the Docker Image to AWS
To deploy via container images:
1. Build the Docker Image:
If you’re on an M1 Mac, consider:
2. Tag Your Docker Image:
3. Push to ECR:
4. Deploy the Lambda:
In the AWS Lambda console, create a new function using the container image from ECR.
Advanced Topics
Handling Concurrency and Scaling
AWS Lambda can run up to 1,000 concurrent instances by default. If you expect higher load, request a quota increase in the AWS Service Quotas console.
Common Puppeteer Issues in AWS Lambda
Memory Constraints:
Chromium can be memory-intensive. Cleaning up /tmp
after each run can help manage disk space.
Architecture Compatibility:
If developing on an M1 Mac, cross-compile using buildx:
Conclusion
Implementing HTML to PDF generation on a serverless architecture using Puppeteer and AWS Lambda provides a scalable and maintenance-free approach. While setting up this environment may require initial effort, the payoff is a highly flexible, cost-effective, and automated PDF generation pipeline.
If you’d rather not maintain this infrastructure yourself, consider third-party solutions like pdforge, which can offload the complexity and let you focus on building your application.
Try for free
7-day free trial