PDF

Purpose¶

Generate PDFs from HTML files, merge PDFs, and compress PDFs.

Methods¶

Binding name: p6.pdf

fromHtml¶

Generates a PDF from an HTML string(required) at location specified by targetUri. We can also add metadata to the PDF if required.

More information about the arguments

html (required) – The HTML content to be converted into a PDF.
targetUri (optional, default: null) – The destination URI where the generated PDF should be saved. If null, it creates the temp file.
metadata (optional, default: null) – A map containing metadata such as title, author, subject and keywords for the generated PDF.

Returns the URI written to.

Syntax

String p6.pdf.fromHtml(String html, String targetUri = null, Map<String, String> metadata = null)

Warning

The CSS of the HTML must be version 2.1 max.

The targetUri must point to a local file (e.g. protocol file: only).

Tip

To use a temporary file, set null to the parameter targetUri

Example

Temporary file

p6.pdf.fromHtml('<div><b>Bold</b> text</div>')

Incorrect html as string return in an error

p6.pdf.fromHtml('<b>Bold</b> text')

Specify the target

p6.pdf.fromHtml('<div><b>Bold</b> text</div>', 'p6file://${P6_DATA}/path/to.pdf')

Specify the target with metadata

def metadataMap = [
    "Title"   : "ABC",
    "Keywords": "p6",
    "Author"  : "author",
    "Subject" : "Invoice"
]
p6.pdf.fromHtml('<div><b>Bold</b> text</div>', 'p6file://${P6_DATA}/path/to.pdf', metadataMap)

Specify the metadata without target

def metadataMap = [
    "Title"   : "ABC",
    "Keywords": "p6",
    "Author"  : "author",
    "Subject" : "Invoice"
]
p6.pdf.fromHtml('<div><b>Bold</b> text</div>', metadataMap)

merge¶

Merges PDFs specified in the List of sourceUris and write the result to targetUri. Returns the URI written to.

Syntax

String p6.pdf.merge(List<String> sourceUris[, String targetUri])

Warning

The targetUri must point to a local file (e.g. protocol file: only).

Tip

To use a temporary file, set null to the parameter targetUri

Example

Temporary file

p6.pdf.merge(['p6file://${P6_DATA}/path/pdf1.pdf', 'p6file://${P6_DATA}/path/pdf2.pdf'])

Specify the target

p6.pdf.merge(['p6file://${P6_DATA}/path/pdf1.pdf', 'p6file://${P6_DATA}/path/pdf2.pdf'], 'p6file://${P6_DATA}/path/to.pdf')

parse¶

Parses the PDF file specified in the configuration map and calls the given closure with each row processed.

Syntax

void p6.pdf.parse(Map<String, Object> configuration, Closure rowNotify)

Example

def cnf = [
    area0: '402.89,17.24,550.29,64.89',
    area1: '30.6,346.29,195.95,150.07',
    pages: '1,2',
    uri: 'p6file://${P6_DATA}/00140_Facture Alfa.pdf'
]

p6.pdf.parse(cnf) { pageNumber, row ->
    p6.log.debug( pageNumber + ': ' + row )
    if ( pageNumber == 2) false         // Returning false will halt page iteration
    else true
}

def cnf = [
    columns0: '0,25.0,71.3,180.53,462.91,504.42,535.45,585.68,643.15,714.6',
    uri: 'p6file://${P6_DATA}/00140_Facture Alfa.pdf'
]

p6.pdf.parse(cnf) { pageNumber, row ->
    p6.log.debug( pageNumber + ': ' + row )
}

parseToList¶

Parses the PDF file specified in the configuration map returning the processed values as a List of Tuples (pageNumber, row).

Syntax

List<Tuple> p6.pdf.parseToList(Map<String, Object> configuration)

Parameter: configuration

Configuration Name	Description
`password`	(Optional) Password to use to decrypt the pdf
`spreadsheetDisabled`	(Optional) Force PDF not to be extracted using spreadsheet-style extraction (if there are ruling lines separating each cell, as in a PDF of an Excel spreadsheet). The default is true.
`areaFail`	(Optional) If a configured area does not select text on a page a P6Exception is thrown, unless this value is false. The default is true
`areaN`	(Optional) `N` is a zero based numeric. If no area(s) are given, the whole of each page will be used as the bounding area. All areas defined will be applied to each page specified. Area format is defined in ‘Points’ and can be identified using OSX Preview via ‘Rectangular Selection’ mode. A comma separated string is required: `'{top},{left},{width},{height}'`
`columnsN`	(Optional) `N` is a zero based numeric. A comma separated list of X coordinates of column boundaries.
`uri`	(Mandatory) The URI of the source PDF file to parse.
`pages`	(Optional) If not specified, all pages in the source file will be processed. A comma separated string list of page numbers is required.

Example

def cnf = [
    area0: '402.89,17.24,550.29,64.89',
    areaFail: false,
    uri: 'p6file://${P6_DATA}/00140_Facture Alfa.pdf'
]

def lstTuples = p6.pdf.parseToList(cnf)

lstTuples.each { tup ->
    p6.log.debug( tup.get(0) + ": " + tup.get(1) )
}

split¶

Copy pages from a source PDF file to a destination PDF file.

Syntax

void p6.pdf.split(Map<String, Object> configuration)

Parameter: configuration

Configuration Name	Description
`password`	(Optional) Password to use to decrypt the pdf
`keepAnnotations`	(Optional) true to retain any annotations in the destination (default: false)
`startPage`	(Mandatory) A one based numeric specifying the first page to copy to the new destination
`endPage`	(Mandatory) A one based numeric specifying the last page (and all pages in between) to copy to the new destination
`sourceUri`	(Mandatory) The URI of the source PDF file
`destinationUri`	(Mandatory) The URI of the destination PDF file. Destination will always be overwritten

Example

def cnf = [
    startPage: 3,
    endPage: 4,
    sourceUri: 'p6file://${P6_DATA}/00140_Facture Alfa.pdf',
    destinationUri: 'file:/tmp/page4.pdf'
]

p6.pdf.split(cnf)

sign¶

Sign the PDF file specified in the configuration map and write the result to the targetUri. Returns the URI written to.

Syntax

String p6.pdf.sign(Map<String, Object> configuration)

Warning

The targetUri must point to a local file (e.g. protocol file: only).

Tip

To use a temporary file, set null to the parameter targetUri

Parameter: configuration

Configuration Name	Description
`keyStoreUri`	(Mandatory) The URI of the KeyStore file (PKCS12)
`keyStorePassword`	(Optional) Password to open the KeyStore
`keyStoreAlias`	(Optional) Alias to use in the KeyStore. (First one will be used by default)
`uri`	(Mandatory) The URI of the source PDF file to parse.
`password`	(Optional) Password to use to decrypt the pdf
`tsa`	(Optional) URL of the TSA server to timestamp the signed file
`reason`	(Optional) The signature reason.
`targetUri`	(Optional) The URI of the target signed PDF file.

Example

def cnf = [
    keyStoreUri: 'file://${P6_DATA}/keystore.p12',
    keyStorePassword: '123456',

    uri: 'file://${P6_DATA}/source.pdf',
    reason: 'Signed on Platform6',
    targetUri: 'file://${P6_DATA}/signed.pdf'
]

p6.log.debug "Signed PDF path:" + p6.pdf.sign(cnf)

Tip

You can generate a p12 file for your tests using the command line:

openssl req -x509 -newkey rsa:1024 -keyout key.pem -out cert.pem -days 365
openssl pkcs12 -export -out keyStore.p12 -inkey key.pem -in cert.pem -name test

fromXml¶

Generates a PDF from the given XML and XSLT strings/filePaths.

Returns the URI written to.

Optional params:

targetUri – The destination URI where the generated PDF should be saved. By Default, it creates the temp file.
metadata – A map containing metadata such as title, author, subject and keywords for the generated PDF.

The targetUri must point to a local file (e.g. protocol file: only).

Syntax

String p6.pdf.fromXml xml schema xsl

Input XML

    <?xml version="1.0" encoding="UTF-8"?>
    <employees>
    <employee>
            <name>Alice</name>
            <role>Developer</role>
        </employee>
    </employees>

Input XSL

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/">
        <html>
            <body>
                <h2>Employee Details</h2>
                <table border="1">
                    <tr>
                        <th>Name</th>
                        <th>Role</th>
                    </tr>
                    <xsl:for-each select="employees/employee">
                        <tr>
                            <td><xsl:value-of select="name"/></td>
                            <td><xsl:value-of select="role"/></td>
                        </tr>
                    </xsl:for-each>
                </table>
            </body>
        </html>
    </xsl:template>

</xsl:stylesheet>

Example

Temporary file

p6.pdf.fromXml xml schema xsl

Specify the target

p6.pdf.fromXml xml schema xsl, {targetUri 'p6file://${P6_DATA}/path/to.pdf'}

Specify the target with metadata

def metadataMap = [
    "Title"   : "ABC",
    "Keywords": "p6",
    "Author"  : "author",
    "Subject" : "Invoice"
]
p6.pdf.fromXML xml schema xsl, {targetUri 'p6file://${P6_DATA}/to.pdf'; metadata metadataMap}

Specify the metadata without target

def metadataMap = [
    "Title"   : "ABC",
    "Keywords": "p6",
    "Author"  : "author",
    "Subject" : "Invoice"
]
p6.pdf.fromXml xml schema xsl {metadata metadataMap}

Specify xml and xsl using filepath

p6.pdf.fromXml 'p6file://${P6_DATA}/xml.xml' schema 'p6file://${P6_DATA}/xslt.xml'

compress¶

New Feature

Since 6.10.11

Compresses a PDF file using a pure-Java pipeline that preserves text as vectors (no rasterization).

Annotation removal (optional) — removes comments, highlights, and form field annotations from each page.
Image recompression (optional) — recompresses images as JPEG with configurable quality, DPI downscaling, and optional greyscale conversion.
Since 6.0.20
Font substitution — replaces embedded TrueType fonts (Arial, Calibri, Times New Roman, Georgia, Verdana, Courier New, etc.) with the 14 standard PDF fonts (Helvetica, Times, Courier). Standard fonts need no embedding, eliminating the main source of bloat in Office-generated PDFs.
Stream deduplication — detects identical Form XObjects and images that appear on multiple pages (e.g. headers, logos) and replaces all copies with a single shared reference.
Unused resource cleanup — removes fonts, images, and graphics states declared in page resources but never referenced in the page content.
Object garbage collection — deep-copies pages into a fresh document, discarding all unreachable objects (orphaned font streams, stale XRef entries) left over from the previous steps.

Expected compression

On a typical multi-page document produced by p6.pdf.merge (text + repeated header/logo), this pipeline achieves 90–94% size reduction while keeping text fully readable.

It returns statistics about the compression:

Key	Description	Type
`source.path`	Path of the source file	String
`source.size`	Size of the source file in bytes	Long
`source.size.pretty`	Size of the source file in human-readable form (e.g. 3 MB)	String
`compression.success`	`true` if the output is smaller than the source	Boolean
`compression.duration`	Time taken (e.g. `1s 96ms`)	String
`font.substituted`	Number of embedded font references replaced by standard fonts Since 6.0.20	Int
`stream.deduplicated`	Number of duplicate Form XObjects / images merged into a shared reference Since 6.0.20	Int
`image.count`	Total number of images processed (only when `compressImages` is `true`) Since 6.0.20	Int
`image.size.original`	Sum of image stream sizes before recompression in bytes (only when `compressImages` is `true`) Since 6.0.20	Long
`image.size.compressed`	Sum of image stream sizes after recompression in bytes (only when `compressImages` is `true`) Since 6.0.20	Long

Extra keys returned only when compression.success is true:

Key	Description	Type
`compression.level`	Percentage reduction (e.g. `93.77%`)	String
`target.path`	Path of the compressed output file	String
`target.size`	Size of the output file in bytes	Long
`target.size.pretty`	Size of the output file in human-readable form (e.g. 201 KB)	String

Syntax

BasicAdvanced

Map<String, Object> p6.pdf.compress '/path/to/source.pdf'

Map<String, Object> p6.pdf.compress '/path/to/source.pdf', {
    destination null
    threshold null
    silent false
    replace true
    compressFonts true
    compressImages true
    quality 0.3f
    dpi 150
    greyscale true
    removeAnnotations false
}

Parameters

All parameters are optional. Defaults are shown above.

destination (String) — output file path. If empty or null, a temporary file is used.
replace (Boolean) — if true and no destination is set, the source file is overwritten. Ignored when destination is set.
threshold (Long) — maximum allowed size in bytes for the output. Throws a P6Exception if exceeded (unless silent is true).
silent (Boolean) — if true, no exception is thrown when compression produces a larger file or exceeds threshold. The source file is kept unchanged in both cases.
compressFonts (Boolean) — replaces embedded TrueType fonts with standard PDF fonts.
compressImages (Boolean) — enables image deduplication and JPEG recompression.
quality (Float) — JPEG compression quality between 0.0 (smallest) and 1.0 (best quality). Default: 0.3.
dpi (Integer) — maximum DPI for image downscaling. Images below this DPI are not upscaled. Default: 150.
greyscale (Boolean) — converts colour images to greyscale. Default: true.
removeAnnotations (Boolean) — removes annotations (comments, highlights, form fields) from all pages.

Warning

Font substitution trade-off: Arial and similar fonts are replaced by metrically close equivalents (Arial → Helvetica, Times New Roman → Times, etc.). Glyph shapes are nearly identical but metrics differ slightly, which may cause minor text reflow in some documents.

A P6Exception is thrown if an unexpected error occurs during compression.

Example

// Overwrite the source file with the compressed version
p6.pdf.compress '/path/to/source.pdf'

// Save to a temporary file instead of overwriting
p6.pdf.compress '/path/to/source.pdf', { replace false }

// Save to a specific destination
p6.pdf.compress '/path/to/source.pdf', { destination '/path/to/target.pdf' }

// Higher quality images, keep colour
p6.pdf.compress '/path/to/source.pdf', { quality 0.7f; dpi 96; greyscale false }

// Maximum compression: low quality, greyscale, remove annotations
p6.pdf.compress '/path/to/source.pdf', { quality 0.1f; dpi 72; greyscale true; removeAnnotations true }

// Skip font substitution (preserves original fonts, less compression)
p6.pdf.compress '/path/to/source.pdf', { compressFonts false }

// Enforce a size limit
try {
    p6.pdf.compress '/path/to/source.pdf', { threshold 500000 }
} catch (P6Exception e) {
    p6.log.error "Compression failed: ${e.message}"
}