pdf libraries

pdf libraries

C#

C#

How to Generate PDF from HTML with Playwright in C#

Marcelo Abreu, founder of pdforge

Marcelo | Founder

Marcelo | Founder

Oct 26, 2024

Oct 26, 2024

Introduction to PDF Generation with Playwright

Playwright, a powerful automation library, provides an efficient solution to convert HTML to PDF using C# in a .NET environment. Generating PDFs from HTML content is a critical feature for many SaaS applications, especially when producing dynamic reports, invoices, or user-generated content. This article delves into how to leverage Playwright for seamless and high-quality PDF generation.

Playwright also has well-written, robust documentation that you can use if you run into any trouble.

Comparison Between Playwright and Other C# PDF Libraries

When it comes to PDF generation in C#, several libraries are available. Here’s how Playwright compares to some popular options:

Download comparison between pdf libraries using nuget trends

PdfSharp: Ideal for creating PDFs from scratch but lacks direct HTML to PDF conversion capabilities.

PuppeteerSharp: A port of Puppeteer for C#, allowing control over Chrome or Chromium browsers. It handles HTML to PDF conversion but can be resource-intensive and may have compatibility issues.

iTextSharp: A comprehensive library for PDF manipulation with extensive features, but it has a steep learning curve and licensing considerations for commercial use.

Playwright stands out due to:

  • Modern Web Rendering: Offers high-fidelity rendering of complex, modern web pages, including JavaScript execution and CSS styling.

  • Cross-Browser Support: Works with Chromium, Firefox, and WebKit, ensuring consistent PDF output across browsers.

  • Performance: Designed for efficiency, making it suitable for both development and production environments.

Guide to generate pdf from html using C# Playwright
Guide to generate pdf from html using C# Playwright

Setting Up Playwright for HTML to PDF Conversion

Installation Guide: How to Install Playwright with Node.js

Before starting, ensure that Node.js and the .NET SDK are installed on your system.

1. Install Node.js: Download and install from the official Node.js website.

2. Create a New .NET Project:

Open a terminal and run:

dotnet new console -o HtmlToPdfExample
cd

3. Add Playwright Package:

dotnet add package Microsoft.Playwright

4. Install Playwright Browsers:

npx playwright install

Configuring Your Environment for HTML to PDF Conversion

Set up your Program.cs with the necessary using directives and an asynchronous Main method:

using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.Playwright;
class Program
{
    public static async Task Main(string[] args)
    {
        // Your code will go here

Basic Example: Generating PDF from an HTML File

Creating a Robust HTML Template

Create an HTML file named invoice.html with comprehensive content:

<!-- invoice.html -->
<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
    <title>Invoice #{{InvoiceNumber}}</title>
    <style>
        body { font-family: Arial, sans-serif; margin: 0; padding: 20px; }
        .invoice-box { max-width: 800px; margin: auto; border: 1px solid #eee; padding: 20px; }
        .invoice-header { text-align: center; margin-bottom: 50px; }
        .invoice-details { width: 100%; margin-bottom: 30px; }
        .invoice-details th, .invoice-details td { padding: 10px; border-bottom: 1px solid #eee; }
        .invoice-items { width: 100%; border-collapse: collapse; }
        .invoice-items th, .invoice-items td { border: 1px solid #eee; padding: 10px; text-align: left; }
        .total { text-align: right; margin-top: 20px; font-size: 18px; }
    </style>
</head>
<body>
    <div class="invoice-box">
        <div class="invoice-header">
            <h1>Invoice #{{InvoiceNumber}}</h1>
            <p>Date: {{Date}}</p>
            <p>Due Date: {{DueDate}}</p>
        </div>
        <table class="invoice-details">
            <tr>
                <th>Billed To:</th>
                <td>{{ClientName}}</td>
            </tr>
            <tr>
                <th>Email:</th>
                <td>{{ClientEmail}}</td>
            </tr>
            <tr>
                <th>Address:</th>
                <td>{{ClientAddress}}</td>
            </tr>
        </table>
        <table class="invoice-items">
            <tr>
                <th>Description</th>
                <th>Quantity</th>
                <th>Unit Price</th>
                <th>Amount</th>
            </tr>
            {{#Items}}
            <tr>
                <td>{{Description}}</td>
                <td>{{Quantity}}</td>
                <td>{{UnitPrice}}</td>
                <td>{{Amount}}</td>
            </tr>
            {{/Items}}
        </table>
        <div class="total">
            <strong>Total: ${{TotalAmount}}</strong>
        </div>
    </div>
</body>
</html>

Generating PDF from the HTML File

Add the following code to Program.cs:

using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.Playwright;
class Program
{
    public static async Task Main(string[] args)
    {
        // Read the HTML content
        string htmlPath = Path.GetFullPath("invoice.html");
        string htmlContent = File.ReadAllText(htmlPath);
        // Replace placeholders with actual data
        htmlContent = htmlContent
            .Replace("{{InvoiceNumber}}", "INV-1001")
            .Replace("{{Date}}", DateTime.Now.ToString("yyyy-MM-dd"))
            .Replace("{{DueDate}}", DateTime.Now.AddDays(30).ToString("yyyy-MM-dd"))
            .Replace("{{ClientName}}", "Acme Corp")
            .Replace("{{ClientEmail}}", "contact@acme.com")
            .Replace("{{ClientAddress}}", "123 Business Rd, Business City, BC 54321")
            .Replace("{{TotalAmount}}", "1500.00");
        // Replace items (simple example)
        string itemsHtml = @"
            <tr>
                <td>Web Development Services</td>
                <td>1</td>
                <td>$1500.00</td>
                <td>$1500.00</td>
            </tr>
        ";
        htmlContent = htmlContent.Replace("{{#Items}}", "").Replace("{{/Items}}", "").Replace("{{Items}}", itemsHtml);
        // Save the modified HTML to a temporary file
        string tempHtmlPath = Path.Combine(Path.GetTempPath(), "temp_invoice.html");
        File.WriteAllText(tempHtmlPath, htmlContent);
        // Initialize Playwright
        using var playwright = await Playwright.CreateAsync();
        await using var browser = await playwright.Chromium.LaunchAsync(new BrowserTypeLaunchOptions
        {
            Headless = true
        });
        var context = await browser.NewContextAsync();
        var page = await context.NewPageAsync();
        // Navigate to the HTML file
        await page.GotoAsync($"file://{tempHtmlPath}");
        await page.WaitForLoadStateAsync(LoadState.NetworkIdle);
        // Generate the PDF
        await page.PdfAsync(new PagePdfOptions
        {
            Path = "invoice.pdf",
            Format = "A4",
            PrintBackground = true,
            Margin = new Margin { Top = "20mm", Bottom = "20mm", Left = "10mm", Right = "10mm" }
        });
        Console.WriteLine("PDF generated successfully."

Ensure your Program.cs contains the complete code as above, which reads the HTML, replaces placeholders, saves a temporary HTML file, and generates a PDF.

Generating PDF from a URL

To generate a PDF from a live website, use the following code:

using System;
using System.Threading.Tasks;
using Microsoft.Playwright;
class Program
{
    public static async Task Main(string[] args)
    {
        // Initialize Playwright
        using var playwright = await Playwright.CreateAsync();
        await using var browser = await playwright.Chromium.LaunchAsync(new BrowserTypeLaunchOptions
        {
            Headless = true
        });
        var context = await browser.NewContextAsync();
        var page = await context.NewPageAsync();
        // Navigate to the URL
        await page.GotoAsync("https://www.example.com");
        await page.WaitForLoadStateAsync(LoadState.NetworkIdle);
        // Generate the PDF
        await page.PdfAsync(new PagePdfOptions
        {
            Path = "website.pdf",
            Format = "A4",
            PrintBackground = true,
            Margin = new Margin { Top = "20mm", Bottom = "20mm", Left = "10mm", Right = "10mm" }
        });
        Console.WriteLine("Website PDF generated successfully."

This complete code initializes Playwright, navigates to the specified URL, waits for the page to load, and generates a PDF.

Writing Playwright Script for PDF Generation

HTML Template Engines

To manage dynamic data in your HTML templates more efficiently, consider using an HTML template engine. Popular options in C# include:

Razor: The default view engine for ASP.NET MVC, allowing you to embed C# code within HTML.

Scriban: A fast and lightweight templating language with simple syntax.

DotLiquid: A .NET port of the popular Ruby Liquid templating engine.

Using Scriban to Populate the HTML Template

Install the Scriban package:

dotnet add package Scriban

Modify your Program.cs to use Scriban:

using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.Playwright;
using Scriban;
class Program
{
    public static async Task Main(string[] args)
    {
        // Read the HTML template
        string htmlTemplate = File.ReadAllText("invoice.html");
        // Create a template object
        var template = Template.Parse(htmlTemplate);
        // Define the data model
        var model = new
        {
            InvoiceNumber = "INV-1001",
            Date = DateTime.Now.ToString("yyyy-MM-dd"),
            DueDate = DateTime.Now.AddDays(30).ToString("yyyy-MM-dd"),
            ClientName = "Acme Corp",
            ClientEmail = "contact@acme.com",
            ClientAddress = "123 Business Rd, Business City, BC 54321",
            Items = new[]
            {
                new { Description = "Web Development Services", Quantity = 1, UnitPrice = 1500.00, Amount = 1500.00 }
            },
            TotalAmount = "1500.00"
        };
        // Render the template with the data model
        string htmlContent = template.Render(model);
        // Save the rendered HTML to a temporary file
        string tempHtmlPath = Path.Combine(Path.GetTempPath(), "temp_invoice.html");
        File.WriteAllText(tempHtmlPath, htmlContent);
        // Initialize Playwright and generate PDF as before
        using var playwright = await Playwright.CreateAsync();
        await using var browser = await playwright.Chromium.LaunchAsync();
        var context = await browser.NewContextAsync();
        var page = await context.NewPageAsync();
        await page.GotoAsync($"file://{tempHtmlPath}");
        await page.WaitForLoadStateAsync(LoadState.NetworkIdle);
        await page.PdfAsync(new PagePdfOptions
        {
            Path = "invoice.pdf",
            Format = "A4",
            PrintBackground = true,
            Margin = new Margin { Top = "20mm", Bottom = "20mm", Left = "10mm", Right = "10mm" }
        });
        Console.WriteLine("PDF generated successfully using Scriban template."

By using Scriban, placeholders in the HTML template (e.g., {{InvoiceNumber}}, {{Items}}) are filled with data from the model, making the process more maintainable and scalable.

Working with Dynamic Content: Converting Web Forms and Interactive Elements

When dealing with dynamic content or JavaScript-rendered pages, ensure the page fully loads before generating the PDF:

await page.GotoAsync($"file://{tempHtmlPath}");
await page.WaitForLoadStateAsync(LoadState.NetworkIdle);
// Additional waits if necessary
await page.PdfAsync(/* options */

Adding Headers, Footers, and Page Numbers with Playwright

Customize headers and footers using HTML templates within the PagePdfOptions:

await page.PdfAsync(new PagePdfOptions
{
    Path = "invoice_with_header_footer.pdf",
    DisplayHeaderFooter = true,
    HeaderTemplate = "<div style='font-size:12px; text-align:center; width:100%;'>My SaaS Application</div>",
    FooterTemplate = "<div style='font-size:12px; text-align:center; width:100%;'>Page <span class=\"pageNumber\"></span> of <span class=\"totalPages\"></span></div>",
    Margin = new Margin { Top = "40mm", Bottom = "20mm", Left = "10mm", Right = "10mm" },
    Format = "A4",
    PrintBackground = true

Troubleshooting and Best Practices for Playwright PDF Generation

Common Issues in Playwright PDF Generation and How to Fix Them

Resources Not Loading: Ensure all resources (CSS, images, fonts) are accessible. Use absolute URLs or embed styles directly.

Timing Issues: If the page has animations or delayed content, wait appropriately:

await page.WaitForTimeoutAsync(5000); // Wait for 5 seconds

Large PDFs: For large documents, consider pagination and optimize images to reduce file size.

Creating a Serverless Service to Scale Your PDF Generation

Scaling PDF generation can be challenging when handling high volumes. Deploying your PDF generation logic as a serverless service offers several benefits. You can use popular providers such as AWS Lambda or Google Cloud Functions to make it serverless.

// AWS Lambda Function Handler
public class Function
{
    public async Task<APIGatewayProxyResponse> FunctionHandler(APIGatewayProxyRequest request, ILambdaContext context)
    {
        // Parse request, generate PDF using Playwright
        // Return PDF as base64 encoded string or file

Benefits of Serverless PDF Generation

  • Scalability: Automatically handles increased load without manual intervention.

  • Cost-Effective: Pay only for the actual compute time used.

  • Maintenance: Reduced operational overhead as there’s no need to manage servers.

By deploying to AWS Lambda or Google Cloud Functions, your application can efficiently handle spikes in PDF generation requests, ensuring reliable performance for your users.

We have a full guide on how to deploy playwright on AWS Lambda here.

How to Use a PDF API to Automate PDF Creation at Scale

For SaaS platforms, automating PDF generation at scale might require offloading the heavy lifting to a PDF API. By integrating APIs like pdforge you can handle high-volume PDF generation, complex formatting, and post-processing, all from a single backend call.

Implementation Example in C#:

using System;
using System.Net.Http;
using System.Text;
using System.Threading.Tasks;
namespace PdfApiIntegration
{
    class Program
    {
        static async Task Main(string[] args)
        {
            var client = new HttpClient();
            client.DefaultRequestHeaders.Add("Authorization", "Bearer your-api-key");
            var requestBody = new
            {
                templateId = "your-template",
                data = new { html = "your-html" }
            };
            var content = new StringContent(
                Newtonsoft.Json.JsonConvert.SerializeObject(requestBody),
                Encoding.UTF8,
                "application/json"
            );
            var response = await client.PostAsync("https://api.pdforge.com/v1/pdf/sync", content);
            if (response.IsSuccessStatusCode)
            {
                var pdfBytes = await response.Content.ReadAsByteArrayAsync();
                File.WriteAllBytes("invoice.pdf", pdfBytes);
                Console.WriteLine("PDF generated using PDFForge API.");
            }
            else
            {
                Console.WriteLine("Error generating PDF: " + response.ReasonPhrase

This code sends a POST request to the pdforge API, receives the generated PDF, and saves it locally.

Conclusion

When deciding on a method for generating PDFs from HTML in your SaaS application:

Use Playwright when you need precise control over rendering and can manage the necessary infrastructure. It excels at generating PDFs from complex, dynamic web content. You can create a powerful and flexible solution for converting HTML to PDF.

Consider Other Libraries like PuppeteerSharp or iTextSharp if they better align with your project’s requirements or if you’re already familiar with them.

Opt for Third-Party PDF APIs like pdforge when scalability, ease of use, and reduced maintenance are priorities. These services handle the heavy lifting and can integrate seamlessly with your application.

Generating pdfs at scale can be quite complicated!

Generating pdfs at scale can be quite complicated!

We take care of all of this, so you focus on what trully matters on your Product!

We take care of all of this, so you focus on what trully matters on your Product!

Try for free

7-day free trial

Table of contents

Title