pdf libraries

pdf libraries

Ruby on rails

Ruby on rails

How to Generate PDF from HTML with Puppeteer-Ruby

Marcelo Abreu, founder of pdforge

Marcelo | Founder

Marcelo | Founder

Nov 2, 2024

Nov 2, 2024

Introduction to Puppeteer-Ruby for HTML to PDF Conversion

Puppeteer-Ruby is a flexible library for generating PDFs from HTML using Chromium, much like the more popular Grover gem.

Both libraries utilize a headless Chrome environment for rendering HTML, but Puppeteer-Ruby provides a versatile option for developers who prefer direct integration with Puppeteer’s Node.js API. This makes it ideal for SaaS applications needing precise and styled PDF outputs.

You can check the documentation here.

Comparison Between Puppeteer-Ruby and Other Ruby PDF Libraries

Number of downloads from bestgems of puppeteer-ruby

Ruby offers a range of PDF generation libraries, each with unique features:

  • WickedPDF and PDFKit - Rely on server-rendered HTML and inline CSS for PDF generation, offering basic support for styling but may struggle with complex or interactive layouts. These libraries are better for simpler documents.

  • Prawn and HexaPDF- Focuses on manual layout control without HTML-to-PDF capabilities. It’s a powerful choice for custom-built PDFs but lacks native HTML rendering, making it less ideal for styling-heavy documents.

  • Grover - A widely-adopted alternative to Puppeteer-Ruby, Grover also uses Chromium, offering similar high-fidelity HTML and CSS rendering in PDFs with broader Rails integration.

Guide to generate pdf from html using Ruby on rails puppeter-ruby
Guide to generate pdf from html using Ruby on rails puppeter-ruby

Setting Up the Puppeteer-Ruby Environment

Prerequisites for Puppeteer-Ruby Installation

Start by installing Node.js, Puppeteer, and the Puppeteer-Ruby gem. Since Puppeteer operates within a Node.js environment, having Node installed is essential.

Install Node.js:

$ node -v

# Install Node.js if not installed
$ sudo

Then, install Puppeteer-Ruby:

$ gem

Installing Puppeteer and Setting Up Node in Ruby Projects

Inside your project, initialize Node.js and install Puppeteer. Below is an example structure, which includes folders for controllers, views, and PDFs for easier organization.

$ mkdir pdf_project && cd pdf_project
$ npm init -y
$ npm

For a Rails project, this setup might look like this:



Integrating Puppeteer-Ruby with Rails

Within Rails, integrate Puppeteer-Ruby by rendering a PDF directly from your HTML views. Rails makes it easy to pass dynamic data into these templates, allowing each PDF to be customized per user or request.

# Gemfile
gem 'puppeteer-ruby'

# Install dependencies
$ bundle install

In app/controllers/pdf_controller.rb, use Puppeteer-Ruby to generate a PDF:

class PdfController < ApplicationController
  require 'puppeteer'
  def generate_invoice
    customer = Customer.find(params[:id]) # Dynamic data from Rails model
    Puppeteer.launch do |browser|
      page = browser.new_page
      # Render Rails template as HTML and load it into Puppeteer
      html = render_to_string(template: 'pdf/invoice', locals: { customer: customer })
      page.set_content(html)
      # Optionally, save directly as a file or return as a buffer
      pdf_data = page.pdf(format: 'A4')
      # Alternatives to saving in memory
      send_data pdf_data, filename: "invoice_#{customer.id}.pdf", type: 'application/pdf', disposition: 'inline'
    end
  end
end

The send_data method here streams the PDF as a file to the user, avoiding the need to save it to disk. By using a buffer directly, it conserves memory and allows for faster responses, ideal for applications handling real-time PDF requests.

Converting HTML to PDF with Puppeteer-Ruby

Structuring HTML for PDF Rendering

Using Rails variables within the HTML template allows dynamic data to flow easily. Here’s an example invoice template with embedded Rails variables:

<!-- app/views/pdf/invoice.html.erb -->
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Invoice</title>
  <style>
    body { font-family: Arial, sans-serif; }
    .invoice-box { max-width: 800px; margin: auto; padding: 30px; border: 1px solid #eee; }
    .header, .footer { text-align: center; margin: 20px 0; }
  </style>
</head>
<body>
  <div class="invoice-box">
    <h1>Invoice for <%= customer.name %></h1>
    <p>Date: <%= Time.now.strftime("%d/%m/%Y") %></p>
    <p>Customer ID: <%= customer.id %></p>
    <p>Customer Email: <%= customer.email %></p>
    <table>
      <tr><th>Item</th><th>Price</th></tr>
      <% customer.orders.each do |order| %>
        <tr><td><%= order.item_name %></td><td>$<%= order.price %></td></tr>
      <% end %>
      <tr><td>Total</td><td>$<%= customer.orders.sum(:price) %></td></tr>
    </table>
  </div>
</body>
</html

Adding Page Numbers, Headers, and Footers

You can add headers, footers, and page numbers by configuring them within page.pdf options:

page.pdf(
  path: "pdfs/invoice.pdf",
  format: 'A4',
  display_header_footer: true,
  header_template: "<span style='font-size:12px; margin-left: 20px;'>Company Header</span>",
  footer_template: "<span style='font-size:12px; margin-left: 20px;'>Page <span class='pageNumber'></span> of <span class='totalPages'></span></span>"

Error Handling and Troubleshooting Common Issues in Puppeteer-Ruby

To troubleshoot issues with Puppeteer-Ruby, use debugging options like disabling headless mode or adding event listeners for console logs. This can provide insights into asset loading or rendering issues.

Puppeteer.launch(headless: false) do |browser|
  page = browser.new_page
  page.goto("file://#{Rails.root}/app/views/pdf/invoice.html.erb")
  
  # Enable console logging
  page.on('console') { |msg| puts "PAGE LOG: #{msg.text}" }
  
  # Track network requests
  page.on('request') { |req| puts "Request: #{req.url}" }
  page.on('response') { |res| puts "Response: #{res.status}" }
  
  pdf_data = page.pdf(format: 'A4')
end

With these configurations, you can track requests, responses, and console messages, making it easier to identify issues with asset paths, variable rendering, or template accessibility.

How to Use a PDF API to Automate PDF Creation at Scale

For SaaS platforms, automating PDF generation at scale might require offloading the heavy lifting to a PDF API.

It's also an option to integrate with third-party APIs like pdforge you can handle high-volume PDF generation, complex formatting, and post-processing, all from a single backend call.

Here’s an example of how to integrate pdforge in Rails to convert HTML content into a PDF via an API call:

require 'net/http'
require 'json'
require 'uri'

class PdfApiService
  def self.generate_pdf(html_content)
    uri = URI("https://api.pdforge.com/v1/pdf/sync")
    http = Net::HTTP.new(uri.host, uri.port)
    http.use_ssl = true
    request = Net::HTTP::Post.new(uri.path, {
      'Content-Type' => 'application/json',
      'Authorization' => "Bearer your-api-key"
    })
    request.body = {
      templateId: 'your-template',
      data: { html: html_content }
    }.to_json
    response = http.request(request)
    response.body if response.is_a?(Net::HTTPSuccess)
  end
end

This code sends a POST request to the pdforge API, receives the generated PDF, and saves it locally.

Conclusion

While Puppeteer-Ruby offers detailed, flexible HTML-to-PDF rendering, Grover remains a robust option for most Rails applications due to its simplicity and wider adoption. Puppeteer-Ruby suits more specialized applications where custom Node.js features or advanced debugging are required.

For simpler projects, consider alternatives like WickedPDF or PDFKit, while complex scaling needs may benefit from using a third-party PDF API like pdforge to streamline large-scale PDF generation across high-demand environments.

Generating pdfs at scale can be quite complicated!

Generating pdfs at scale can be quite complicated!

We take care of all of this, so you focus on what trully matters on your Product!

We take care of all of this, so you focus on what trully matters on your Product!

Try for free

7-day free trial

Table of contents

Title