PDF

Purpose¶

Generate PDFs from HTML files and merge PDFs.

Methods¶

Binding name: p6.pdf

fromHtml¶

Generates a PDF from an HTML string(required) at location specified by targetUri. We can also add metadata to the PDF if required.

More information about the arguments

html (required) – The HTML content to be converted into a PDF.
targetUri (optional, default: null) – The destination URI where the generated PDF should be saved. If null, it creates the temp file.
metadata (optional, default: null) – A map containing metadata such as title, author, subject and keywords for the generated PDF.

Returns the URI written to.

Syntax

String p6.pdf.fromHtml(String html, String targetUri = null, MapString, String> metadata = null)

Warning

The CSS of the HTML must be version 2.1 max.

The targetUri must point to a local file (e.g. protocol file: only).

Tip

To use a temporary file, set null to the parameter targetUri

Example

Temporary file

p6.pdf.fromHtml('<div><b>Bold</b> text</div>')

Incorrect html as string return in an error

p6.pdf.fromHtml('<b>Bold</b> text')

Specify the target

p6.pdf.fromHtml('<div><b>Bold</b> text</div>', 'p6file://${P6_DATA}/path/to.pdf')

Specify the target with metadata

def metadataMap = [
    "Title"   : "ABC",
    "Keywords": "p6",
    "Author"  : "author",
    "Subject" : "Invoice"
]
p6.pdf.fromHtml('<div><b>Bold</b> text</div>', 'p6file://${P6_DATA}/path/to.pdf', metadataMap)

Specify the metadata without target

def metadataMap = [
    "Title"   : "ABC",
    "Keywords": "p6",
    "Author"  : "author",
    "Subject" : "Invoice"
]
p6.pdf.fromHtml('<div><b>Bold</b> text</div>', metadataMap)

merge¶

Merges PDFs specified in the List of sourceUris and write the result to targetUri. Returns the URI written to.

Syntax

String p6.pdf.merge(List<String> sourceUris[, String targetUri])

Warning

The targetUri must point to a local file (e.g. protocol file: only).

Tip

To use a temporary file, set null to the parameter targetUri

Example

Temporary file

p6.pdf.merge(['p6file://${P6_DATA}/path/pdf1.pdf', 'p6file://${P6_DATA}/path/pdf2.pdf'])

Specify the target

p6.pdf.merge(['p6file://${P6_DATA}/path/pdf1.pdf', 'p6file://${P6_DATA}/path/pdf2.pdf'], 'p6file://${P6_DATA}/path/to.pdf')

parse¶

Parses the PDF file specified in the configuration map and calls the given closure with each row processed.

Syntax

void p6.pdf.parse(Map<String, Object> configuration, Closure rowNotify)

Example

def cnf = [
    area0: '402.89,17.24,550.29,64.89',
    area1: '30.6,346.29,195.95,150.07',
    pages: '1,2',
    uri: 'p6file://${P6_DATA}/00140_Facture Alfa.pdf'
]

p6.pdf.parse(cnf) { pageNumber, row ->
    p6.log.debug( pageNumber + ': ' + row )
    if ( pageNumber == 2) false         // Returning false will halt page iteration
    else true
}

def cnf = [
    columns0: '0,25.0,71.3,180.53,462.91,504.42,535.45,585.68,643.15,714.6',
    uri: 'p6file://${P6_DATA}/00140_Facture Alfa.pdf'
]

p6.pdf.parse(cnf) { pageNumber, row ->
    p6.log.debug( pageNumber + ': ' + row )
}

parseToList¶

Parses the PDF file specified in the configuration map returning the processed values as a List of Tuples (pageNumber, row).

Syntax

List<Tuple> p6.pdf.parseToList(Map<String, Object> configuration)

Parameter: configuration

Configuration Name	Description
`password`	(Optional) Password to use to decrypt the pdf
`spreadsheetDisabled`	(Optional) Force PDF not to be extracted using spreadsheet-style extraction (if there are ruling lines separating each cell, as in a PDF of an Excel spreadsheet). The default is true.
`areaFail`	(Optional) If a configured area does not select text on a page a P6Exception is thrown, unless this value is false. The default is true
`areaN`	(Optional) `N` is a zero based numeric. If no area(s) are given, the whole of each page will be used as the bounding area. All areas defined will be applied to each page specified. Area format is defined in ‘Points’ and can be identified using OSX Preview via ‘Rectangular Selection’ mode. A comma separated string is required: `'{top},{left},{width},{height}'`
`columnsN`	(Optional) `N` is a zero based numeric. A comma separated list of X coordinates of column boundaries.
`uri`	(Mandatory) The URI of the source PDF file to parse.
`pages`	(Optional) If not specified, all pages in the source file will be processed. A comma separated string list of page numbers is required.

Example

def cnf = [
    area0: '402.89,17.24,550.29,64.89',
    areaFail: false,
    uri: 'p6file://${P6_DATA}/00140_Facture Alfa.pdf'
]

def lstTuples = p6.pdf.parseToList(cnf)

lstTuples.each { tup ->
    p6.log.debug( tup.get(0) + ": " + tup.get(1) )
}

split¶

Copy pages from a source PDF file to a destination PDF file.

Syntax

void p6.pdf.split(Map<String, Object> configuration)

Parameter: configuration

Configuration Name	Description
`password`	(Optional) Password to use to decrypt the pdf
`keepAnnotations`	(Optional) true to retain any annotations in the destination (default: false)
`startPage`	(Mandatory) A one based numeric specifying the first page to copy to the new destination
`endPage`	(Mandatory) A one based numeric specifying the last page (and all pages in between) to copy to the new destination
`sourceUri`	(Mandatory) The URI of the source PDF file
`destinationUri`	(Mandatory) The URI of the destination PDF file. Destination will always be overwritten

Example

def cnf = [
    startPage: 3,
    endPage: 4,
    sourceUri: 'p6file://${P6_DATA}/00140_Facture Alfa.pdf',
    destinationUri: 'file:/tmp/page4.pdf'
]

p6.pdf.split(cnf)

sign¶

Sign the PDF file specified in the configuration map and write the result to the targetUri. Returns the URI written to.

Syntax

String p6.pdf.sign(Map<String, Object> configuration)

Warning

The targetUri must point to a local file (e.g. protocol file: only).

Tip

To use a temporary file, set null to the parameter targetUri

Parameter: configuration

Configuration Name	Description
`keyStoreUri`	(Mandatory) The URI of the KeyStore file (PKCS12)
`keyStorePassword`	(Optional) Password to open the KeyStore
`keyStoreAlias`	(Optional) Alias to use in the KeyStore. (First one will be used by default)
`uri`	(Mandatory) The URI of the source PDF file to parse.
`password`	(Optional) Password to use to decrypt the pdf
`tsa`	(Optional) URL of the TSA server to timestamp the signed file
`reason`	(Optional) The signature reason.
`targetUri`	(Optional) The URI of the target signed PDF file.

Example

def cnf = [
    keyStoreUri: 'file://${P6_DATA}/keystore.p12',
    keyStorePassword: '123456',

    uri: 'file://${P6_DATA}/source.pdf',
    reason: 'Signed on Platform6',
    targetUri: 'file://${P6_DATA}/signed.pdf'
]

p6.log.debug "Signed PDF path:" + p6.pdf.sign(cnf)

Tip

You can generate a p12 file for your tests using the command line:

openssl req -x509 -newkey rsa:1024 -keyout key.pem -out cert.pem -days 365
openssl pkcs12 -export -out keyStore.p12 -inkey key.pem -in cert.pem -name test

fromXml¶

Generates a PDF from the given XML and XSLT strings/filePaths.

Returns the URI written to.

Optional params:

targetUri – The destination URI where the generated PDF should be saved. By Default, it creates the temp file.
metadata – A map containing metadata such as title, author, subject and keywords for the generated PDF.

The targetUri must point to a local file (e.g. protocol file: only).

Syntax

String p6.pdf.fromXml xml schema xsl

Input XML

    <?xml version="1.0" encoding="UTF-8"?>
    <employees>
    <employee>
            <name>Alice</name>
            <role>Developer</role>
        </employee>
    </employees>

Input XSL

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/">
        <html>
            <body>
                <h2>Employee Details</h2>
                <table border="1">
                    <tr>
                        <th>Name</th>
                        <th>Role</th>
                    </tr>
                    <xsl:for-each select="employees/employee">
                        <tr>
                            <td><xsl:value-of select="name"/></td>
                            <td><xsl:value-of select="role"/></td>
                        </tr>
                    </xsl:for-each>
                </table>
            </body>
        </html>
    </xsl:template>

</xsl:stylesheet>

Example

Temporary file

p6.pdf.fromXml xml schema xsl

Specify the target

p6.pdf.fromXml xml schema xsl, {targetUri 'p6file://${P6_DATA}/path/to.pdf'}

Specify the target with metadata

def metadataMap = [
    "Title"   : "ABC",
    "Keywords": "p6",
    "Author"  : "author",
    "Subject" : "Invoice"
]
p6.pdf.fromXML xml schema xsl, {targetUri 'p6file://${P6_DATA}/to.pdf'; metadata metadataMap}

Specify the metadata without target

def metadataMap = [
    "Title"   : "ABC",
    "Keywords": "p6",
    "Author"  : "author",
    "Subject" : "Invoice"
]
p6.pdf.fromXml xml schema xsl {metadata metadataMap}

Specify xml and xsl using filepath

p6.pdf.fromXml 'p6file://${P6_DATA}/xml.xml' schema 'p6file://${P6_DATA}/xslt.xml'

compress¶

Since 6.10.11

Compress a PDF file using different approches:

Image compression
Font compression
Image manipulation (quality, dpi, greyscale)
Remove annotations

It returns statistics about the compression:

Key	Description	Type
`source.path`	The path of the source file	String
`source.size`	The size of the source file in bytes	Long
`source.size.pretty`	The size of the source file in human readable size (e.g. 1.2 MB)	String
`compression.success`	True if the compression is successful	Boolean

Extra parameters are returned if the compression is successful:

Key	Description	Type
`compression.level`	Percentage of compression of the PDF	String
`compression.duration`	Compression duration	String
`target.path`	The path of the target file	String
`target.size`	The size of the target file in bytes	Long
`target.size.pretty`	The size of the target file in human readable size (e.g. 1.2 MB)	String
`annotation.compression.enable`	True if PDF annotation are removed	Boolean
`font.compression.enable`	True if PDF font are compressed	Boolean
`image.compression.enable`	True if PDF images are compressed	Boolean
`image.compression.parameters`	The compression parameters used for PDF images	String
`image.count`	Number of PDF found images	Int
`image.cached`	True if the duplicate images should be cached	Boolean
`image.source.size`	Sum of all the images size	Long
`image.target.size`	Sum of all the compressed images size	Long
`image.compression.level`	Percentage of compression of the PDF images	String

Syntax

BasicAdvanced

Map<String, Object> p6.pdf.compress '/path/to/source.pdf'

Map<String, Object>p6.pdf.compress '/path/to/source.pdf', {
    destination null
    threshold null
    silent false
    replace true
    compressFonts true
    compressImages true
    quality 0.5f 
    dpi 72
    greyscale false
    removeAnnotations false
}

Note

The default values of the previous parameters are the default ones and all the parameters are optional.

destination (String) - if not set or empty, a temporary file will be created
threshold (Long) - The limit threshold size in bytes for the compressed file. If the size is bigger than threshold a P6Exception will be throw (unless using the silent mode)
silent parameter is set to false by default. If set to true, no exception will be thrown if the result of the compression does not match the expectations
replace parameter will be ignored if a destination is specified. Otherwise, the source file will be replaced by the compressed one

Warning

The compression DSL method can throw a P6Exception if an unexpected error occurs during the compression process.

Example

# The compressed file will override the source file 
p6.pdf.compress '/path/to/source.pdf'

# The compressed file will be saved to a temporary file
p6.pdf.compress '/path/to/source.pdf', { destination '' }
p6.pdf.compress '/path/to/source.pdf', { replace false }

# The compressed file will be saved to the destination filepath
p6.pdf.compress '/path/to/source.pdf', { destination '/path/to/target.pdf' }

# Compression options
p6.pdf.compress '/path/to/source.pdf', { quality 0.1f; dpi 36; greyscale true }    
p6.pdf.compress '/path/to/source.pdf', { removeAnnotations: true; compressFonts: false }

# Compression with threshold
try {
    p6.pdf.compress '/path/to/source.pdf', { threshold 1000000 }
} catch (P6Exception e) {
    p6.log.error "Compression failed: ${e.message}"
}