Introduction to PDF Generation with Playwright
Playwright, a powerful automation library, provides an efficient solution to convert HTML to PDF using C# in a .NET environment. Generating PDFs from HTML content is a critical feature for many SaaS applications, especially when producing dynamic reports, invoices, or user-generated content. This article delves into how to leverage Playwright for seamless and high-quality PDF generation.
Playwright also has well-written, robust documentation that you can use if you run into any trouble.
Comparison Between Playwright and Other C# PDF Libraries
When it comes to PDF generation in C#, several libraries are available. Here’s how Playwright compares to some popular options:
PdfSharp: Ideal for creating PDFs from scratch but lacks direct HTML to PDF conversion capabilities.
PuppeteerSharp: A port of Puppeteer for C#, allowing control over Chrome or Chromium browsers. It handles HTML to PDF conversion but can be resource-intensive and may have compatibility issues.
iTextSharp: A comprehensive library for PDF manipulation with extensive features, but it has a steep learning curve and licensing considerations for commercial use.
Playwright stands out due to:
Modern Web Rendering: Offers high-fidelity rendering of complex, modern web pages, including JavaScript execution and CSS styling.
Cross-Browser Support: Works with Chromium, Firefox, and WebKit, ensuring consistent PDF output across browsers.
Performance: Designed for efficiency, making it suitable for both development and production environments.
Setting Up Playwright for HTML to PDF Conversion
Installation Guide: How to Install Playwright with Node.js
Before starting, ensure that Node.js and the .NET SDK are installed on your system.
1. Install Node.js: Download and install from the official Node.js website.
2. Create a New .NET Project:
Open a terminal and run:
dotnet new console -o HtmlToPdfExample
cd
3. Add Playwright Package:
dotnet add package Microsoft.Playwright
4. Install Playwright Browsers:
Configuring Your Environment for HTML to PDF Conversion
Set up your Program.cs with the necessary using directives and an asynchronous Main method:
using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.Playwright;
class Program
{
public static async Task Main(string[] args)
{
Basic Example: Generating PDF from an HTML File
Creating a Robust HTML Template
Create an HTML file named invoice.html with comprehensive content:
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Invoice #{{InvoiceNumber}}</title>
<style>
body { font-family: Arial, sans-serif; margin: 0; padding: 20px; }
.invoice-box { max-width: 800px; margin: auto; border: 1px solid #eee; padding: 20px; }
.invoice-header { text-align: center; margin-bottom: 50px; }
.invoice-details { width: 100%; margin-bottom: 30px; }
.invoice-details th, .invoice-details td { padding: 10px; border-bottom: 1px solid #eee; }
.invoice-items { width: 100%; border-collapse: collapse; }
.invoice-items th, .invoice-items td { border: 1px solid #eee; padding: 10px; text-align: left; }
.total { text-align: right; margin-top: 20px; font-size: 18px; }
</style>
</head>
<body>
<div class="invoice-box">
<div class="invoice-header">
<h1>Invoice #{{InvoiceNumber}}</h1>
<p>Date: {{Date}}</p>
<p>Due Date: {{DueDate}}</p>
</div>
<table class="invoice-details">
<tr>
<th>Billed To:</th>
<td>{{ClientName}}</td>
</tr>
<tr>
<th>Email:</th>
<td>{{ClientEmail}}</td>
</tr>
<tr>
<th>Address:</th>
<td>{{ClientAddress}}</td>
</tr>
</table>
<table class="invoice-items">
<tr>
<th>Description</th>
<th>Quantity</th>
<th>Unit Price</th>
<th>Amount</th>
</tr>
{{#Items}}
<tr>
<td>{{Description}}</td>
<td>{{Quantity}}</td>
<td>{{UnitPrice}}</td>
<td>{{Amount}}</td>
</tr>
{{/Items}}
</table>
<div class="total">
<strong>Total: ${{TotalAmount}}</strong>
</div>
</div>
</body>
</html>
Generating PDF from the HTML File
Add the following code to Program.cs:
using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.Playwright;
class Program
{
public static async Task Main(string[] args)
{
string htmlPath = Path.GetFullPath("invoice.html");
string htmlContent = File.ReadAllText(htmlPath);
htmlContent = htmlContent
.Replace("{{InvoiceNumber}}", "INV-1001")
.Replace("{{Date}}", DateTime.Now.ToString("yyyy-MM-dd"))
.Replace("{{DueDate}}", DateTime.Now.AddDays(30).ToString("yyyy-MM-dd"))
.Replace("{{ClientName}}", "Acme Corp")
.Replace("{{ClientEmail}}", "contact@acme.com")
.Replace("{{ClientAddress}}", "123 Business Rd, Business City, BC 54321")
.Replace("{{TotalAmount}}", "1500.00");
string itemsHtml = @"
<tr>
<td>Web Development Services</td>
<td>1</td>
<td>$1500.00</td>
<td>$1500.00</td>
</tr>
";
htmlContent = htmlContent.Replace("{{#Items}}", "").Replace("{{/Items}}", "").Replace("{{Items}}", itemsHtml);
string tempHtmlPath = Path.Combine(Path.GetTempPath(), "temp_invoice.html");
File.WriteAllText(tempHtmlPath, htmlContent);
using var playwright = await Playwright.CreateAsync();
await using var browser = await playwright.Chromium.LaunchAsync(new BrowserTypeLaunchOptions
{
Headless = true
});
var context = await browser.NewContextAsync();
var page = await context.NewPageAsync();
await page.GotoAsync($"file://{tempHtmlPath}");
await page.WaitForLoadStateAsync(LoadState.NetworkIdle);
await page.PdfAsync(new PagePdfOptions
{
Path = "invoice.pdf",
Format = "A4",
PrintBackground = true,
Margin = new Margin { Top = "20mm", Bottom = "20mm", Left = "10mm", Right = "10mm" }
});
Console.WriteLine("PDF generated successfully."
Ensure your Program.cs contains the complete code as above, which reads the HTML, replaces placeholders, saves a temporary HTML file, and generates a PDF.
Generating PDF from a URL
To generate a PDF from a live website, use the following code:
using System;
using System.Threading.Tasks;
using Microsoft.Playwright;
class Program
{
public static async Task Main(string[] args)
{
using var playwright = await Playwright.CreateAsync();
await using var browser = await playwright.Chromium.LaunchAsync(new BrowserTypeLaunchOptions
{
Headless = true
});
var context = await browser.NewContextAsync();
var page = await context.NewPageAsync();
await page.GotoAsync("https://www.example.com");
await page.WaitForLoadStateAsync(LoadState.NetworkIdle);
await page.PdfAsync(new PagePdfOptions
{
Path = "website.pdf",
Format = "A4",
PrintBackground = true,
Margin = new Margin { Top = "20mm", Bottom = "20mm", Left = "10mm", Right = "10mm" }
});
Console.WriteLine("Website PDF generated successfully."
This complete code initializes Playwright, navigates to the specified URL, waits for the page to load, and generates a PDF.
Writing Playwright Script for PDF Generation
HTML Template Engines
To manage dynamic data in your HTML templates more efficiently, consider using an HTML template engine. Popular options in C# include:
• Razor: The default view engine for ASP.NET MVC, allowing you to embed C# code within HTML.
• Scriban: A fast and lightweight templating language with simple syntax.
• DotLiquid: A .NET port of the popular Ruby Liquid templating engine.
Using Scriban to Populate the HTML Template
Install the Scriban package:
dotnet add package Scriban
Modify your Program.cs to use Scriban:
using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.Playwright;
using Scriban;
class Program
{
public static async Task Main(string[] args)
{
string htmlTemplate = File.ReadAllText("invoice.html");
var template = Template.Parse(htmlTemplate);
var model = new
{
InvoiceNumber = "INV-1001",
Date = DateTime.Now.ToString("yyyy-MM-dd"),
DueDate = DateTime.Now.AddDays(30).ToString("yyyy-MM-dd"),
ClientName = "Acme Corp",
ClientEmail = "contact@acme.com",
ClientAddress = "123 Business Rd, Business City, BC 54321",
Items = new[]
{
new { Description = "Web Development Services", Quantity = 1, UnitPrice = 1500.00, Amount = 1500.00 }
},
TotalAmount = "1500.00"
};
string htmlContent = template.Render(model);
string tempHtmlPath = Path.Combine(Path.GetTempPath(), "temp_invoice.html");
File.WriteAllText(tempHtmlPath, htmlContent);
using var playwright = await Playwright.CreateAsync();
await using var browser = await playwright.Chromium.LaunchAsync();
var context = await browser.NewContextAsync();
var page = await context.NewPageAsync();
await page.GotoAsync($"file://{tempHtmlPath}");
await page.WaitForLoadStateAsync(LoadState.NetworkIdle);
await page.PdfAsync(new PagePdfOptions
{
Path = "invoice.pdf",
Format = "A4",
PrintBackground = true,
Margin = new Margin { Top = "20mm", Bottom = "20mm", Left = "10mm", Right = "10mm" }
});
Console.WriteLine("PDF generated successfully using Scriban template."
By using Scriban, placeholders in the HTML template (e.g., {{InvoiceNumber}}, {{Items}}) are filled with data from the model, making the process more maintainable and scalable.
Working with Dynamic Content: Converting Web Forms and Interactive Elements
When dealing with dynamic content or JavaScript-rendered pages, ensure the page fully loads before generating the PDF:
await page.GotoAsync($"file://{tempHtmlPath}");
await page.WaitForLoadStateAsync(LoadState.NetworkIdle);
await page.PdfAsync(
Adding Headers, Footers, and Page Numbers with Playwright
Customize headers and footers using HTML templates within the PagePdfOptions:
await page.PdfAsync(new PagePdfOptions
{
Path = "invoice_with_header_footer.pdf",
DisplayHeaderFooter = true,
HeaderTemplate = "<div style='font-size:12px; text-align:center; width:100%;'>My SaaS Application</div>",
FooterTemplate = "<div style='font-size:12px; text-align:center; width:100%;'>Page <span class=\"pageNumber\"></span> of <span class=\"totalPages\"></span></div>",
Margin = new Margin { Top = "40mm", Bottom = "20mm", Left = "10mm", Right = "10mm" },
Format = "A4",
PrintBackground = true
Troubleshooting and Best Practices for Playwright PDF Generation
Common Issues in Playwright PDF Generation and How to Fix Them
Resources Not Loading: Ensure all resources (CSS, images, fonts) are accessible. Use absolute URLs or embed styles directly.
Timing Issues: If the page has animations or delayed content, wait appropriately:
await page.WaitForTimeoutAsync(5000);
• Large PDFs: For large documents, consider pagination and optimize images to reduce file size.
Creating a Serverless Service to Scale Your PDF Generation
Scaling PDF generation can be challenging when handling high volumes. Deploying your PDF generation logic as a serverless service offers several benefits. You can use popular providers such as AWS Lambda or Google Cloud Functions to make it serverless.
public class Function
{
public async Task<APIGatewayProxyResponse> FunctionHandler(APIGatewayProxyRequest request, ILambdaContext context)
{
Benefits of Serverless PDF Generation
Scalability: Automatically handles increased load without manual intervention.
Cost-Effective: Pay only for the actual compute time used.
Maintenance: Reduced operational overhead as there’s no need to manage servers.
By deploying to AWS Lambda or Google Cloud Functions, your application can efficiently handle spikes in PDF generation requests, ensuring reliable performance for your users.
We have a full guide on how to deploy playwright on AWS Lambda here.
How to Use a PDF API to Automate PDF Creation at Scale
For SaaS platforms, automating PDF generation at scale might require offloading the heavy lifting to a PDF API. By integrating APIs like pdforge you can handle high-volume PDF generation, complex formatting, and post-processing, all from a single backend call.
Implementation Example in C#:
using System;
using System.Net.Http;
using System.Text;
using System.Threading.Tasks;
namespace PdfApiIntegration
{
class Program
{
static async Task Main(string[] args)
{
var client = new HttpClient();
client.DefaultRequestHeaders.Add("Authorization", "Bearer your-api-key");
var requestBody = new
{
templateId = "your-template",
data = new { html = "your-html" }
};
var content = new StringContent(
Newtonsoft.Json.JsonConvert.SerializeObject(requestBody),
Encoding.UTF8,
"application/json"
);
var response = await client.PostAsync("https://api.pdforge.com/v1/pdf/sync", content);
if (response.IsSuccessStatusCode)
{
var pdfBytes = await response.Content.ReadAsByteArrayAsync();
File.WriteAllBytes("invoice.pdf", pdfBytes);
Console.WriteLine("PDF generated using PDFForge API.");
}
else
{
Console.WriteLine("Error generating PDF: " + response.ReasonPhrase
This code sends a POST request to the pdforge API, receives the generated PDF, and saves it locally.
Conclusion
When deciding on a method for generating PDFs from HTML in your SaaS application:
• Use Playwright when you need precise control over rendering and can manage the necessary infrastructure. It excels at generating PDFs from complex, dynamic web content. You can create a powerful and flexible solution for converting HTML to PDF.
• Consider Other Libraries like PuppeteerSharp or iTextSharp if they better align with your project’s requirements or if you’re already familiar with them.
• Opt for Third-Party PDF APIs like pdforge when scalability, ease of use, and reduced maintenance are priorities. These services handle the heavy lifting and can integrate seamlessly with your application.