pdf libraries

pdf libraries

C#

C#

Generate PDF from HTML Easily with PuppeteerSharp

Marcelo Abreu, founder of pdforge

Marcelo | Founder

Marcelo | Founder

Oct 25, 2024

Oct 25, 2024

Introduction to PuppeteerSharp for HTML to PDF

PuppeteerSharp is a powerful library that simplifies converting HTML to PDF using C# and .NET, making the task of implementing PDF reports in your SaaS application much easier. While developers often consider libraries like iTextSharp, PdfSharp, and tools like Playwright for generating PDFs in a .NET environment, PuppeteerSharp stands out by offering a headless Chrome browser API. This allows for precise rendering of HTML to PDF, ensuring that your PDFs look exactly as they would in a web browser, preserving styles and layouts seamlessly.

You can check out the full documentation here.

Comparison Between PuppeteerSharp and Other C# PDF Libraries

While PuppeteerSharp shows a decline in popularity compared to iTextSharp and Playwright based on NuGet trends, it remains a reliable choice for generating PDFs in C# and .NET environments.

Download comparison between pdf libraries using nuget trends

PdfSharp: Great for creating PDFs from scratch programmatically but lacks HTML to PDF conversion capabilities.

iTextSharp: Offers extensive PDF manipulation features but can be complex and has licensing restrictions for commercial use.

Playwright: Similar to PuppeteerSharp but designed for end-to-end testing; less focused on PDF generation and gaining traction and download volume with time.

PuppeteerSharp: Provides accurate HTML to PDF conversion using Chromium, ideal for applications requiring precise styling and layout.

Guide to generate pdf from html using C# Puppeteer Sharp
Guide to generate pdf from html using C# Puppeteer Sharp

Setting Up PuppeteerSharp in a .NET Environment

Getting started with PuppeteerSharp is straightforward. Below is a step-by-step guide to installing and configuring the library in your project.

Installing PuppeteerSharp

1. Install via NuGet Package Manager:

2. Download Chromium (if not already installed):

await new BrowserFetcher().DownloadAsync(BrowserFetcher.DefaultRevision

3. Initialize the Browser Instance:

var browser = await Puppeteer.LaunchAsync(new LaunchOptions { Headless = true

With PuppeteerSharp installed, you’re ready to generate PDFs from HTML content seamlessly.

Generating PDF from HTML Using PuppeteerSharp

Creating a dynamic PDF often starts with an HTML template. Let’s build a complete invoice example to illustrate this.

Creating a Complete Invoice HTML/CSS File

First, design your invoice template using HTML and CSS:

<!DOCTYPE html>
<html>
<head>
    <style>
        /* Invoice CSS styles */
        body { font-family: 'Arial', sans-serif; }
        .invoice-box {
            max-width: 800px;
            margin: auto;
            padding: 30px;
            border: 1px solid #eee;
            box-shadow: 0 0 10px rgba(0, 0, 0, 0.15);
        }
        .invoice-box table { width: 100%; line-height: inherit; text-align: left; }
        .invoice-box table td { padding: 5px; vertical-align: top; }
        .invoice-box table tr.heading td { background: #eee; border-bottom: 1px solid #ddd; font-weight: bold; }
        .invoice-box table tr.item td { border-bottom: 1px solid #eee; }
        .invoice-box table tr.total td:nth-child(2) { border-top: 2px solid #eee; font-weight: bold; }
    </style>
</head>
<body>
    <div class="invoice-box">
        <table>
            <tr class="top">
                <td colspan="2">
                    <h2>Invoice #{{InvoiceNumber}}</h2>
                    <p>Date: {{Date}}</p>
                </td>
            </tr>
            <tr class="information">
                <td>
                    <strong>From:</strong><br>
                    {{CompanyName}}<br>
                    {{CompanyAddress}}
                </td>
                <td>
                    <strong>To:</strong><br>
                    {{CustomerName}}<br>
                    {{CustomerAddress}}
                </td>
            </tr>
            <tr class="heading">
                <td>Description</td>
                <td>Price</td>
            </tr>
            {{#each Items}}
            <tr class="item">
                <td>{{this.Description}}</td>
                <td>{{this.Price}}</td>
            </tr>
            {{/each}}
            <tr class="total">
                <td></td>
                <td>Total: ${{Total}}</td>
            </tr>
        </table>
    </div>
</body>
</html>

HTML Template Engines

To populate dynamic data into your HTML, consider using template engines like Handlebars.NET. For example:

var template = File.ReadAllText("invoice.html");
var data = new
{
    InvoiceNumber = "INV-1001",
    Date = DateTime.Now.ToString("MM/dd/yyyy"),
    CompanyName = "Acme Corp.",
    CompanyAddress = "123 Business Road, Business City, BC 54321",
    CustomerName = "John Doe",
    CustomerAddress = "789 Residential Ave, Hometown, HT 12345",
    Items = new[]
    {
        new { Description = "Consulting Services", Price = "$1,000.00" },
        new { Description = "Software Development", Price = "$2,500.00" },
        new { Description = "Support and Maintenance", Price = "$500.00" },
    },
    Total = "4,000.00"
};
var htmlContent = Handlebars.Compile(template)(data

Methods to Generate PDF Using PuppeteerSharp

PuppeteerSharp offers multiple ways to generate PDFs:

Generating PDF from HTML Content

var page = await browser.NewPageAsync();
await page.SetContentAsync(htmlContent);
await page.PdfAsync("invoice.pdf", new PdfOptions
{
    Format = PaperFormat.A4,
    PrintBackground = true,
    MarginOptions = new MarginOptions
    {
        Top = "20px",
        Right = "20px",
        Bottom = "20px",
        Left = "20px"
    },
    DisplayHeaderFooter = false,
    Landscape = false

Generating PDF from a URL

var page = await browser.NewPageAsync();
await page.GoToAsync("https://yourwebsite.com/invoice/INV-1001");
await page.PdfAsync("invoice.pdf", new PdfOptions
{
    Format = PaperFormat.A4,
    PrintBackground = true,
    MarginOptions = new MarginOptions
    {
        Top = "20px",
        Right = "20px",
        Bottom = "20px",
        Left = "20px"
    },
    DisplayHeaderFooter = false,
    Landscape = false

Utilizing PuppeteerSharp’s PDF API Features

PuppeteerSharp’s PDF options include:

  • Format: Paper size, e.g., PaperFormat.A4.

  • PrintBackground: Include background graphics.

  • MarginOptions: Set top, right, bottom, and left margins.

  • DisplayHeaderFooter: Show headers and footers.

  • HeaderTemplate and FooterTemplate: HTML templates for headers and footers.

  • Landscape: Set orientation to landscape.

  • Scale: Scale of the webpage rendering (default 1).

  • PageRanges: Specify pages to include, e.g., "1-5".

Example with all options:

await page.PdfAsync("invoice.pdf", new PdfOptions
{
    Scale = 1.0m,
    DisplayHeaderFooter = true,
    HeaderTemplate = "<div style='font-size:10px; text-align:center; width:100%;'>My Company Header</div>",
    FooterTemplate = "<div style='font-size:10px; text-align:center; width:100%;'>Page <span class='pageNumber'></span> of <span class='totalPages'></span></div>",
    PrintBackground = true,
    Landscape = false,
    PageRanges = "1-2",
    Format = PaperFormat.A4,
    MarginOptions = new MarginOptions
    {
        Top = "50px",
        Bottom = "50px",
        Left = "20px",
        Right = "20px"

Main Features of PuppeteerSharp

Headless Browser Automation: Mimics a real browser without a GUI.

Precise Rendering: Renders HTML/CSS as Chrome would.

Customization: Adjust page size, margins, headers, and footers.

Interactivity: Execute JavaScript on the page before rendering.

Common Pitfalls and How to Avoid Them

Large File Sizes: Optimize images and resources to reduce PDF size.

Missing Assets: Ensure all CSS and JS files are accessible.

Asynchronous Content: Wait for dynamic content to load before generating the PDF:

await page.WaitForSelectorAsync("#content-loaded"

How to Use a PDF API to Automate PDF Creation at Scale

For large-scale PDF generation, integrating a dedicated PDF API can enhance performance:

Asynchronous Processing: Queue PDF generation tasks.

Resource Management: Reuse browser instances to save memory.

Error Handling: Implement robust logging and exception management.

For SaaS platforms, automating PDF generation at scale might require offloading the heavy lifting to a PDF API. By integrating APIs like pdforge you can handle high-volume PDF generation, complex formatting, and post-processing, all from a single backend call.

Implementation Example in C#:

using System;
using System.Net.Http;
using System.Text;
using System.Threading.Tasks;
namespace PdfApiIntegration
{
    class Program
    {
        static async Task Main(string[] args)
        {
            var client = new HttpClient();
            client.DefaultRequestHeaders.Add("Authorization", "Bearer your-api-key");
            var requestBody = new
            {
                templateId = "your-template",
                data = new { html = "your-html" }
            };
            var content = new StringContent(
                Newtonsoft.Json.JsonConvert.SerializeObject(requestBody),
                Encoding.UTF8,
                "application/json"
            );
            var response = await client.PostAsync("https://api.pdforge.com/v1/pdf/sync", content);
            if (response.IsSuccessStatusCode)
            {
                var pdfBytes = await response.Content.ReadAsByteArrayAsync();
                File.WriteAllBytes("invoice.pdf", pdfBytes);
                Console.WriteLine("PDF generated using PDFForge API.");
            }
            else
            {
                Console.WriteLine("Error generating PDF: " + response.ReasonPhrase

This code sends a POST request to the pdforge API, receives the generated PDF, and saves it locally.

Conclusion

Choosing the right tool for PDF generation depends on your project’s requirements:

Opt for PuppeteerSharp when accurate HTML to PDF conversion is crucial, especially with complex CSS.

Use iTextSharp or PdfSharp when you need granular control over PDF elements without HTML rendering.

Consider third-party PDF APIs like pdforge for scalable and high-performance PDF generation without managing infrastructure.

Generating pdfs at scale can be quite complicated!

Generating pdfs at scale can be quite complicated!

We take care of all of this, so you focus on what trully matters on your Product!

We take care of all of this, so you focus on what trully matters on your Product!

Try for free

7-day free trial

Table of contents

Title