Purpose¶
Generate PDFs from HTML files, merge PDFs, and compress PDFs.
Methods¶
Binding name: p6.pdf
fromHtml¶
Generates a PDF from an HTML string(required) at location specified by targetUri.
We can also add metadata to the PDF if required.
More information about the arguments
- html (required) – The HTML content to be converted into a PDF.
- targetUri (optional, default: null) – The destination URI where the generated PDF should be saved. If null, it creates the temp file.
- metadata (optional, default: null) – A map containing metadata such as title, author, subject and keywords for the generated PDF.
Returns the URI written to.
Syntax
String p6.pdf.fromHtml(String html, String targetUri = null, Map<String, String> metadata = null)
Warning
The CSS of the HTML must be version 2.1 max.
The targetUri must point to a local file (e.g. protocol file: only).
Tip
To use a temporary file, set null to the parameter targetUri
Example
Temporary file
p6.pdf.fromHtml('<div><b>Bold</b> text</div>')
Incorrect html as string return in an error
p6.pdf.fromHtml('<b>Bold</b> text')
Specify the target
p6.pdf.fromHtml('<div><b>Bold</b> text</div>', 'p6file://${P6_DATA}/path/to.pdf')
Specify the target with metadata
def metadataMap = [
"Title" : "ABC",
"Keywords": "p6",
"Author" : "author",
"Subject" : "Invoice"
]
p6.pdf.fromHtml('<div><b>Bold</b> text</div>', 'p6file://${P6_DATA}/path/to.pdf', metadataMap)
def metadataMap = [
"Title" : "ABC",
"Keywords": "p6",
"Author" : "author",
"Subject" : "Invoice"
]
p6.pdf.fromHtml('<div><b>Bold</b> text</div>', metadataMap)
merge¶
Merges PDFs specified in the List of sourceUris and write the result to targetUri.
Returns the URI written to.
Syntax
String p6.pdf.merge(List<String> sourceUris[, String targetUri])
Warning
The targetUri must point to a local file (e.g. protocol file: only).
Tip
To use a temporary file, set null to the parameter targetUri
Example
Temporary file
p6.pdf.merge(['p6file://${P6_DATA}/path/pdf1.pdf', 'p6file://${P6_DATA}/path/pdf2.pdf'])
Specify the target
p6.pdf.merge(['p6file://${P6_DATA}/path/pdf1.pdf', 'p6file://${P6_DATA}/path/pdf2.pdf'], 'p6file://${P6_DATA}/path/to.pdf')
parse¶
Parses the PDF file specified in the configuration map and calls the given closure with each row processed.
Syntax
void p6.pdf.parse(Map<String, Object> configuration, Closure rowNotify)
Example
def cnf = [
area0: '402.89,17.24,550.29,64.89',
area1: '30.6,346.29,195.95,150.07',
pages: '1,2',
uri: 'p6file://${P6_DATA}/00140_Facture Alfa.pdf'
]
p6.pdf.parse(cnf) { pageNumber, row ->
p6.log.debug( pageNumber + ': ' + row )
if ( pageNumber == 2) false // Returning false will halt page iteration
else true
}
def cnf = [
columns0: '0,25.0,71.3,180.53,462.91,504.42,535.45,585.68,643.15,714.6',
uri: 'p6file://${P6_DATA}/00140_Facture Alfa.pdf'
]
p6.pdf.parse(cnf) { pageNumber, row ->
p6.log.debug( pageNumber + ': ' + row )
}
parseToList¶
Parses the PDF file specified in the configuration map returning the processed values as a List of Tuples (pageNumber, row).
Syntax
List<Tuple> p6.pdf.parseToList(Map<String, Object> configuration)
Parameter: configuration
| Configuration Name | Description |
|---|---|
password |
(Optional) Password to use to decrypt the pdf |
spreadsheetDisabled |
(Optional) Force PDF not to be extracted using spreadsheet-style extraction (if there are ruling lines separating each cell, as in a PDF of an Excel spreadsheet). The default is true. |
areaFail |
(Optional) If a configured area does not select text on a page a P6Exception is thrown, unless this value is false. The default is true |
areaN |
(Optional) N is a zero based numeric. If no area(s) are given, the whole of each page will be used as the bounding area. All areas defined will be applied to each page specified. Area format is defined in ‘Points’ and can be identified using OSX Preview via ‘Rectangular Selection’ mode. A comma separated string is required: '{top},{left},{width},{height}' |
columnsN |
(Optional) N is a zero based numeric. A comma separated list of X coordinates of column boundaries. |
uri |
(Mandatory) The URI of the source PDF file to parse. |
pages |
(Optional) If not specified, all pages in the source file will be processed. A comma separated string list of page numbers is required. |
Example
def cnf = [
area0: '402.89,17.24,550.29,64.89',
areaFail: false,
uri: 'p6file://${P6_DATA}/00140_Facture Alfa.pdf'
]
def lstTuples = p6.pdf.parseToList(cnf)
lstTuples.each { tup ->
p6.log.debug( tup.get(0) + ": " + tup.get(1) )
}
split¶
Copy pages from a source PDF file to a destination PDF file.
Syntax
void p6.pdf.split(Map<String, Object> configuration)
Parameter: configuration
| Configuration Name | Description |
|---|---|
password |
(Optional) Password to use to decrypt the pdf |
keepAnnotations |
(Optional) true to retain any annotations in the destination (default: false) |
startPage |
(Mandatory) A one based numeric specifying the first page to copy to the new destination |
endPage |
(Mandatory) A one based numeric specifying the last page (and all pages in between) to copy to the new destination |
sourceUri |
(Mandatory) The URI of the source PDF file |
destinationUri |
(Mandatory) The URI of the destination PDF file. Destination will always be overwritten |
Example
def cnf = [
startPage: 3,
endPage: 4,
sourceUri: 'p6file://${P6_DATA}/00140_Facture Alfa.pdf',
destinationUri: 'file:/tmp/page4.pdf'
]
p6.pdf.split(cnf)
sign¶
Sign the PDF file specified in the configuration map and write the result to the targetUri.
Returns the URI written to.
Syntax
String p6.pdf.sign(Map<String, Object> configuration)
Warning
The targetUri must point to a local file (e.g. protocol file: only).
Tip
To use a temporary file, set null to the parameter targetUri
Parameter: configuration
| Configuration Name | Description |
|---|---|
keyStoreUri |
(Mandatory) The URI of the KeyStore file (PKCS12) |
keyStorePassword |
(Optional) Password to open the KeyStore |
keyStoreAlias |
(Optional) Alias to use in the KeyStore. (First one will be used by default) |
uri |
(Mandatory) The URI of the source PDF file to parse. |
password |
(Optional) Password to use to decrypt the pdf |
tsa |
(Optional) URL of the TSA server to timestamp the signed file |
reason |
(Optional) The signature reason. |
targetUri |
(Optional) The URI of the target signed PDF file. |
Example
def cnf = [
keyStoreUri: 'file://${P6_DATA}/keystore.p12',
keyStorePassword: '123456',
uri: 'file://${P6_DATA}/source.pdf',
reason: 'Signed on Platform6',
targetUri: 'file://${P6_DATA}/signed.pdf'
]
p6.log.debug "Signed PDF path:" + p6.pdf.sign(cnf)
Tip
You can generate a p12 file for your tests using the command line:
openssl req -x509 -newkey rsa:1024 -keyout key.pem -out cert.pem -days 365
openssl pkcs12 -export -out keyStore.p12 -inkey key.pem -in cert.pem -name test
fromXml¶
Generates a PDF from the given XML and XSLT strings/filePaths.
Returns the URI written to.
Optional params:
- targetUri – The destination URI where the generated PDF should be saved. By Default, it creates the temp file.
- metadata – A map containing metadata such as title, author, subject and keywords for the generated PDF.
The targetUri must point to a local file (e.g. protocol file: only).
Syntax
String p6.pdf.fromXml xml schema xsl
Input XML
<?xml version="1.0" encoding="UTF-8"?>
<employees>
<employee>
<name>Alice</name>
<role>Developer</role>
</employee>
</employees>
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>Employee Details</h2>
<table border="1">
<tr>
<th>Name</th>
<th>Role</th>
</tr>
<xsl:for-each select="employees/employee">
<tr>
<td><xsl:value-of select="name"/></td>
<td><xsl:value-of select="role"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Example
Temporary file
p6.pdf.fromXml xml schema xsl
Specify the target
p6.pdf.fromXml xml schema xsl, {targetUri 'p6file://${P6_DATA}/path/to.pdf'}
Specify the target with metadata
def metadataMap = [
"Title" : "ABC",
"Keywords": "p6",
"Author" : "author",
"Subject" : "Invoice"
]
p6.pdf.fromXML xml schema xsl, {targetUri 'p6file://${P6_DATA}/to.pdf'; metadata metadataMap}
def metadataMap = [
"Title" : "ABC",
"Keywords": "p6",
"Author" : "author",
"Subject" : "Invoice"
]
p6.pdf.fromXml xml schema xsl {metadata metadataMap}
Specify xml and xsl using filepath
p6.pdf.fromXml 'p6file://${P6_DATA}/xml.xml' schema 'p6file://${P6_DATA}/xslt.xml'
compress¶
New Feature
Since 6.10.11
Compresses a PDF file using a pure-Java pipeline that preserves text as vectors (no rasterization).
- Annotation removal (optional) — removes comments, highlights, and form field annotations from each page.
-
Image recompression (optional) — recompresses images as JPEG with configurable quality, DPI downscaling, and optional greyscale conversion.
-
Since 6.0.20
-
Font substitution — replaces embedded TrueType fonts (Arial, Calibri, Times New Roman, Georgia, Verdana, Courier New, etc.) with the 14 standard PDF fonts (Helvetica, Times, Courier). Standard fonts need no embedding, eliminating the main source of bloat in Office-generated PDFs.
- Stream deduplication — detects identical Form XObjects and images that appear on multiple pages (e.g. headers, logos) and replaces all copies with a single shared reference.
- Unused resource cleanup — removes fonts, images, and graphics states declared in page resources but never referenced in the page content.
- Object garbage collection — deep-copies pages into a fresh document, discarding all unreachable objects (orphaned font streams, stale XRef entries) left over from the previous steps.
Expected compression
On a typical multi-page document produced by p6.pdf.merge (text + repeated header/logo), this pipeline achieves 90–94% size reduction while keeping text fully readable.
It returns statistics about the compression:
| Key | Description | Type |
|---|---|---|
source.path |
Path of the source file | String |
source.size |
Size of the source file in bytes | Long |
source.size.pretty |
Size of the source file in human-readable form (e.g. 3 MB) | String |
compression.success |
true if the output is smaller than the source |
Boolean |
compression.duration |
Time taken (e.g. 1s 96ms) |
String |
font.substituted |
Number of embedded font references replaced by standard fonts Since 6.0.20 | Int |
stream.deduplicated |
Number of duplicate Form XObjects / images merged into a shared reference Since 6.0.20 | Int |
image.count |
Total number of images processed (only when compressImages is true) Since 6.0.20 |
Int |
image.size.original |
Sum of image stream sizes before recompression in bytes (only when compressImages is true) Since 6.0.20 |
Long |
image.size.compressed |
Sum of image stream sizes after recompression in bytes (only when compressImages is true) Since 6.0.20 |
Long |
Extra keys returned only when compression.success is true:
| Key | Description | Type |
|---|---|---|
compression.level |
Percentage reduction (e.g. 93.77%) |
String |
target.path |
Path of the compressed output file | String |
target.size |
Size of the output file in bytes | Long |
target.size.pretty |
Size of the output file in human-readable form (e.g. 201 KB) | String |
Syntax
Map<String, Object> p6.pdf.compress '/path/to/source.pdf'
Map<String, Object> p6.pdf.compress '/path/to/source.pdf', {
destination null
threshold null
silent false
replace true
compressFonts true
compressImages true
quality 0.3f
dpi 150
greyscale true
removeAnnotations false
}
Parameters
All parameters are optional. Defaults are shown above.
destination(String) — output file path. If empty or null, a temporary file is used.replace(Boolean) — iftrueand nodestinationis set, the source file is overwritten. Ignored whendestinationis set.threshold(Long) — maximum allowed size in bytes for the output. Throws aP6Exceptionif exceeded (unlesssilentistrue).silent(Boolean) — iftrue, no exception is thrown when compression produces a larger file or exceedsthreshold. The source file is kept unchanged in both cases.compressFonts(Boolean) — replaces embedded TrueType fonts with standard PDF fonts.compressImages(Boolean) — enables image deduplication and JPEG recompression.quality(Float) — JPEG compression quality between0.0(smallest) and1.0(best quality). Default:0.3.dpi(Integer) — maximum DPI for image downscaling. Images below this DPI are not upscaled. Default:150.greyscale(Boolean) — converts colour images to greyscale. Default:true.removeAnnotations(Boolean) — removes annotations (comments, highlights, form fields) from all pages.
Warning
Font substitution trade-off: Arial and similar fonts are replaced by metrically close equivalents (Arial → Helvetica, Times New Roman → Times, etc.). Glyph shapes are nearly identical but metrics differ slightly, which may cause minor text reflow in some documents.
A P6Exception is thrown if an unexpected error occurs during compression.
Example
// Overwrite the source file with the compressed version
p6.pdf.compress '/path/to/source.pdf'
// Save to a temporary file instead of overwriting
p6.pdf.compress '/path/to/source.pdf', { replace false }
// Save to a specific destination
p6.pdf.compress '/path/to/source.pdf', { destination '/path/to/target.pdf' }
// Higher quality images, keep colour
p6.pdf.compress '/path/to/source.pdf', { quality 0.7f; dpi 96; greyscale false }
// Maximum compression: low quality, greyscale, remove annotations
p6.pdf.compress '/path/to/source.pdf', { quality 0.1f; dpi 72; greyscale true; removeAnnotations true }
// Skip font substitution (preserves original fonts, less compression)
p6.pdf.compress '/path/to/source.pdf', { compressFonts false }
// Enforce a size limit
try {
p6.pdf.compress '/path/to/source.pdf', { threshold 500000 }
} catch (P6Exception e) {
p6.log.error "Compression failed: ${e.message}"
}