Skip to content

PDF

Purpose

Generate PDFs from HTML files, merge PDFs, and compress PDFs.

Methods

Binding name: p6.pdf


fromHtml

Generates a PDF from an HTML string(required) at location specified by targetUri. We can also add metadata to the PDF if required.

More information about the arguments

  • html (required) – The HTML content to be converted into a PDF.
  • targetUri (optional, default: null) – The destination URI where the generated PDF should be saved. If null, it creates the temp file.
  • metadata (optional, default: null) – A map containing metadata such as title, author, subject and keywords for the generated PDF.

Returns the URI written to.

Syntax

String p6.pdf.fromHtml(String html, String targetUri = null, Map<String, String> metadata = null)

Warning

The CSS of the HTML must be version 2.1 max.

The targetUri must point to a local file (e.g. protocol file: only).

Tip

To use a temporary file, set null to the parameter targetUri

Example

Temporary file

p6.pdf.fromHtml('<div><b>Bold</b> text</div>')

Incorrect html as string return in an error

p6.pdf.fromHtml('<b>Bold</b> text')

Specify the target

p6.pdf.fromHtml('<div><b>Bold</b> text</div>', 'p6file://${P6_DATA}/path/to.pdf')

Specify the target with metadata

def metadataMap = [
    "Title"   : "ABC",
    "Keywords": "p6",
    "Author"  : "author",
    "Subject" : "Invoice"
]
p6.pdf.fromHtml('<div><b>Bold</b> text</div>', 'p6file://${P6_DATA}/path/to.pdf', metadataMap)
Specify the metadata without target
def metadataMap = [
    "Title"   : "ABC",
    "Keywords": "p6",
    "Author"  : "author",
    "Subject" : "Invoice"
]
p6.pdf.fromHtml('<div><b>Bold</b> text</div>', metadataMap)


merge

Merges PDFs specified in the List of sourceUris and write the result to targetUri. Returns the URI written to.

Syntax

String p6.pdf.merge(List<String> sourceUris[, String targetUri])

Warning

The targetUri must point to a local file (e.g. protocol file: only).

Tip

To use a temporary file, set null to the parameter targetUri

Example

Temporary file

p6.pdf.merge(['p6file://${P6_DATA}/path/pdf1.pdf', 'p6file://${P6_DATA}/path/pdf2.pdf'])

Specify the target

p6.pdf.merge(['p6file://${P6_DATA}/path/pdf1.pdf', 'p6file://${P6_DATA}/path/pdf2.pdf'], 'p6file://${P6_DATA}/path/to.pdf')


parse

Parses the PDF file specified in the configuration map and calls the given closure with each row processed.

Syntax

void p6.pdf.parse(Map<String, Object> configuration, Closure rowNotify)
Example
def cnf = [
    area0: '402.89,17.24,550.29,64.89',
    area1: '30.6,346.29,195.95,150.07',
    pages: '1,2',
    uri: 'p6file://${P6_DATA}/00140_Facture Alfa.pdf'
]

p6.pdf.parse(cnf) { pageNumber, row ->
    p6.log.debug( pageNumber + ': ' + row )
    if ( pageNumber == 2) false         // Returning false will halt page iteration
    else true
}
def cnf = [
    columns0: '0,25.0,71.3,180.53,462.91,504.42,535.45,585.68,643.15,714.6',
    uri: 'p6file://${P6_DATA}/00140_Facture Alfa.pdf'
]

p6.pdf.parse(cnf) { pageNumber, row ->
    p6.log.debug( pageNumber + ': ' + row )
}

parseToList

Parses the PDF file specified in the configuration map returning the processed values as a List of Tuples (pageNumber, row).

Syntax

List<Tuple> p6.pdf.parseToList(Map<String, Object> configuration)
Parameter: configuration
Configuration Name Description
password (Optional) Password to use to decrypt the pdf
spreadsheetDisabled (Optional) Force PDF not to be extracted using spreadsheet-style extraction (if there are ruling lines separating each cell, as in a PDF of an Excel spreadsheet). The default is true.
areaFail (Optional) If a configured area does not select text on a page a P6Exception is thrown, unless this value is false. The default is true
areaN (Optional) N is a zero based numeric. If no area(s) are given, the whole of each page will be used as the bounding area. All areas defined will be applied to each page specified. Area format is defined in ‘Points’ and can be identified using OSX Preview via ‘Rectangular Selection’ mode. A comma separated string is required: '{top},{left},{width},{height}'
columnsN (Optional) N is a zero based numeric. A comma separated list of X coordinates of column boundaries.
uri (Mandatory) The URI of the source PDF file to parse.
pages (Optional) If not specified, all pages in the source file will be processed. A comma separated string list of page numbers is required.
Example
def cnf = [
    area0: '402.89,17.24,550.29,64.89',
    areaFail: false,
    uri: 'p6file://${P6_DATA}/00140_Facture Alfa.pdf'
]

def lstTuples = p6.pdf.parseToList(cnf)

lstTuples.each { tup ->
    p6.log.debug( tup.get(0) + ": " + tup.get(1) )
}

split

Copy pages from a source PDF file to a destination PDF file.

Syntax

void p6.pdf.split(Map<String, Object> configuration)
Parameter: configuration
Configuration Name Description
password (Optional) Password to use to decrypt the pdf
keepAnnotations (Optional) true to retain any annotations in the destination (default: false)
startPage (Mandatory) A one based numeric specifying the first page to copy to the new destination
endPage (Mandatory) A one based numeric specifying the last page (and all pages in between) to copy to the new destination
sourceUri (Mandatory) The URI of the source PDF file
destinationUri (Mandatory) The URI of the destination PDF file. Destination will always be overwritten
Example
def cnf = [
    startPage: 3,
    endPage: 4,
    sourceUri: 'p6file://${P6_DATA}/00140_Facture Alfa.pdf',
    destinationUri: 'file:/tmp/page4.pdf'
]

p6.pdf.split(cnf)

sign

Sign the PDF file specified in the configuration map and write the result to the targetUri. Returns the URI written to.

Syntax

String p6.pdf.sign(Map<String, Object> configuration)

Warning

The targetUri must point to a local file (e.g. protocol file: only).

Tip

To use a temporary file, set null to the parameter targetUri

Parameter: configuration
Configuration Name Description
keyStoreUri (Mandatory) The URI of the KeyStore file (PKCS12)
keyStorePassword (Optional) Password to open the KeyStore
keyStoreAlias (Optional) Alias to use in the KeyStore. (First one will be used by default)
uri (Mandatory) The URI of the source PDF file to parse.
password (Optional) Password to use to decrypt the pdf
tsa (Optional) URL of the TSA server to timestamp the signed file
reason (Optional) The signature reason.
targetUri (Optional) The URI of the target signed PDF file.
Example
def cnf = [
    keyStoreUri: 'file://${P6_DATA}/keystore.p12',
    keyStorePassword: '123456',

    uri: 'file://${P6_DATA}/source.pdf',
    reason: 'Signed on Platform6',
    targetUri: 'file://${P6_DATA}/signed.pdf'
]

p6.log.debug "Signed PDF path:" + p6.pdf.sign(cnf)

Tip

You can generate a p12 file for your tests using the command line:

openssl req -x509 -newkey rsa:1024 -keyout key.pem -out cert.pem -days 365
openssl pkcs12 -export -out keyStore.p12 -inkey key.pem -in cert.pem -name test

fromXml

Generates a PDF from the given XML and XSLT strings/filePaths.

Returns the URI written to.

Optional params:

  • targetUri – The destination URI where the generated PDF should be saved. By Default, it creates the temp file.
  • metadata – A map containing metadata such as title, author, subject and keywords for the generated PDF.

The targetUri must point to a local file (e.g. protocol file: only).

Syntax

String p6.pdf.fromXml xml schema xsl 

Input XML

    <?xml version="1.0" encoding="UTF-8"?>
    <employees>
    <employee>
            <name>Alice</name>
            <role>Developer</role>
        </employee>
    </employees>
Input XSL

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/">
        <html>
            <body>
                <h2>Employee Details</h2>
                <table border="1">
                    <tr>
                        <th>Name</th>
                        <th>Role</th>
                    </tr>
                    <xsl:for-each select="employees/employee">
                        <tr>
                            <td><xsl:value-of select="name"/></td>
                            <td><xsl:value-of select="role"/></td>
                        </tr>
                    </xsl:for-each>
                </table>
            </body>
        </html>
    </xsl:template>

</xsl:stylesheet>
Example

Temporary file

p6.pdf.fromXml xml schema xsl

Specify the target

p6.pdf.fromXml xml schema xsl, {targetUri 'p6file://${P6_DATA}/path/to.pdf'}

Specify the target with metadata

def metadataMap = [
    "Title"   : "ABC",
    "Keywords": "p6",
    "Author"  : "author",
    "Subject" : "Invoice"
]
p6.pdf.fromXML xml schema xsl, {targetUri 'p6file://${P6_DATA}/to.pdf'; metadata metadataMap}
Specify the metadata without target

def metadataMap = [
    "Title"   : "ABC",
    "Keywords": "p6",
    "Author"  : "author",
    "Subject" : "Invoice"
]
p6.pdf.fromXml xml schema xsl {metadata metadataMap}

Specify xml and xsl using filepath

p6.pdf.fromXml 'p6file://${P6_DATA}/xml.xml' schema 'p6file://${P6_DATA}/xslt.xml'

compress

New Feature

Since 6.10.11

Compresses a PDF file using a pure-Java pipeline that preserves text as vectors (no rasterization).

  • Annotation removal (optional) — removes comments, highlights, and form field annotations from each page.
  • Image recompression (optional) — recompresses images as JPEG with configurable quality, DPI downscaling, and optional greyscale conversion.

  • Since 6.0.20

  • Font substitution — replaces embedded TrueType fonts (Arial, Calibri, Times New Roman, Georgia, Verdana, Courier New, etc.) with the 14 standard PDF fonts (Helvetica, Times, Courier). Standard fonts need no embedding, eliminating the main source of bloat in Office-generated PDFs.

  • Stream deduplication — detects identical Form XObjects and images that appear on multiple pages (e.g. headers, logos) and replaces all copies with a single shared reference.
  • Unused resource cleanup — removes fonts, images, and graphics states declared in page resources but never referenced in the page content.
  • Object garbage collection — deep-copies pages into a fresh document, discarding all unreachable objects (orphaned font streams, stale XRef entries) left over from the previous steps.

Expected compression

On a typical multi-page document produced by p6.pdf.merge (text + repeated header/logo), this pipeline achieves 90–94% size reduction while keeping text fully readable.

It returns statistics about the compression:

Key Description Type
source.path Path of the source file String
source.size Size of the source file in bytes Long
source.size.pretty Size of the source file in human-readable form (e.g. 3 MB) String
compression.success true if the output is smaller than the source Boolean
compression.duration Time taken (e.g. 1s 96ms) String
font.substituted Number of embedded font references replaced by standard fonts Since 6.0.20 Int
stream.deduplicated Number of duplicate Form XObjects / images merged into a shared reference Since 6.0.20 Int
image.count Total number of images processed (only when compressImages is true) Since 6.0.20 Int
image.size.original Sum of image stream sizes before recompression in bytes (only when compressImages is true) Since 6.0.20 Long
image.size.compressed Sum of image stream sizes after recompression in bytes (only when compressImages is true) Since 6.0.20 Long

Extra keys returned only when compression.success is true:

Key Description Type
compression.level Percentage reduction (e.g. 93.77%) String
target.path Path of the compressed output file String
target.size Size of the output file in bytes Long
target.size.pretty Size of the output file in human-readable form (e.g. 201 KB) String

Syntax

Map<String, Object> p6.pdf.compress '/path/to/source.pdf'
Map<String, Object> p6.pdf.compress '/path/to/source.pdf', {
    destination null
    threshold null
    silent false
    replace true
    compressFonts true
    compressImages true
    quality 0.3f
    dpi 150
    greyscale true
    removeAnnotations false
}

Parameters

All parameters are optional. Defaults are shown above.

  • destination (String) — output file path. If empty or null, a temporary file is used.
  • replace (Boolean) — if true and no destination is set, the source file is overwritten. Ignored when destination is set.
  • threshold (Long) — maximum allowed size in bytes for the output. Throws a P6Exception if exceeded (unless silent is true).
  • silent (Boolean) — if true, no exception is thrown when compression produces a larger file or exceeds threshold. The source file is kept unchanged in both cases.
  • compressFonts (Boolean) — replaces embedded TrueType fonts with standard PDF fonts.
  • compressImages (Boolean) — enables image deduplication and JPEG recompression.
  • quality (Float) — JPEG compression quality between 0.0 (smallest) and 1.0 (best quality). Default: 0.3.
  • dpi (Integer) — maximum DPI for image downscaling. Images below this DPI are not upscaled. Default: 150.
  • greyscale (Boolean) — converts colour images to greyscale. Default: true.
  • removeAnnotations (Boolean) — removes annotations (comments, highlights, form fields) from all pages.

Warning

Font substitution trade-off: Arial and similar fonts are replaced by metrically close equivalents (Arial → Helvetica, Times New Roman → Times, etc.). Glyph shapes are nearly identical but metrics differ slightly, which may cause minor text reflow in some documents.

A P6Exception is thrown if an unexpected error occurs during compression.

Example
// Overwrite the source file with the compressed version
p6.pdf.compress '/path/to/source.pdf'

// Save to a temporary file instead of overwriting
p6.pdf.compress '/path/to/source.pdf', { replace false }

// Save to a specific destination
p6.pdf.compress '/path/to/source.pdf', { destination '/path/to/target.pdf' }

// Higher quality images, keep colour
p6.pdf.compress '/path/to/source.pdf', { quality 0.7f; dpi 96; greyscale false }

// Maximum compression: low quality, greyscale, remove annotations
p6.pdf.compress '/path/to/source.pdf', { quality 0.1f; dpi 72; greyscale true; removeAnnotations true }

// Skip font substitution (preserves original fonts, less compression)
p6.pdf.compress '/path/to/source.pdf', { compressFonts false }

// Enforce a size limit
try {
    p6.pdf.compress '/path/to/source.pdf', { threshold 500000 }
} catch (P6Exception e) {
    p6.log.error "Compression failed: ${e.message}"
}