Adding Metadata and Full-text indexing to Rich Documents and Assets in CrafterCMS

Russ Danner

There are many use cases and types of experiences where we want to treat specific types of assets like rich documents, videos and high resolution images as first class content objects in the CMS with their own custom metadata and indexing of in-file metadata. CrafterCMS enables you to "jacket" assets with content type to support these scenarios.
In this video blog we cover:
- What is a document/asset jacket
- How document full-text and custom metadata indexing works
- How to configure your Crafter Studio project and deployer to support custom metadata for rich documents
Important Documentation Links:
- Deployer Documentation:
- Default Deployer Target Configuration :
- Permission Configuration:
- Studio Configuration:
- UI Configuration:
Sample Configuration:
Below you will find the configuration examples covered in the video.
Project Config (site-config.xml):
<folder name="Pages" path="/website" read-direct-children="false" attach-root-prefix="true"/>
<folder name="Components" path="/components" read-direct-children="false" attach-root-prefix="true"/>
<folder name="Documents" path="/documents" read-direct-children="false" attach-root-prefix="true"/>
<folder name="Taxonomy" path="/taxonomy" read-direct-children="false" attach-root-prefix="true"/>
<folder name="Assets" path="/static-assets" read-direct-children="false" attach-root-prefix="false"/>
<folder name="Templates" path="/templates" read-direct-children="false" attach-root-prefix="false"/>
<folder name="Scripts" path="/scripts" read-direct-children="false" attach-root-prefix="false"/>
<pattern-group name="component">
Permissions Config (permissions.xml):
<role name="author">
<rule regex="/site/website/.*">
<rule regex="/site/components|/site/components/.*">
Target YAML:
# The list of binary file mime types that should be indexed
- application/pdf
- application/msword
- application/vnd.openxmlformats-officedocument.wordprocessingml.document
- application/
- application/
- application/vnd.openxmlformats-officedocument.presentationml.presentation
# The regex path patterns for the metadata ("jacket") files of binary/document files
- ^/?site/documents/.+\.xml$
# The regex path patterns for binary/document files that are store remotely
remoteBinaryPathPatterns: &remoteBinaryPathPatterns
# HTTP/HTTPS URLs are only indexed if they contain the protocol (http:// or https://). Protocol relative
# URLs (like //mydoc.pdf) are not supported since the protocol is unknown to the back-end indexer.
- ^(http:|https:)//.+$
- ^/remote-assets/.+$
# The regex path patterns for binary/document files that should be associated to just one metadata file and are
# dependant on that parent metadata file, so if the parent is deleted the binary should be deleted from the index
childBinaryPathPatterns: *remoteBinaryPathPatterns
# The XPaths of the binary references in the metadata files
- //item/key
- //item/url
# Setting specific for authoring indexes
# Xpath for the internal name field
xpath: '*/internal-name'
- ^/?site/.+$
- ^/?static-assets/.+$
- ^/?remote-assets/.+$
- ^/?scripts/.+$
- ^/?templates/.+$
xpath: '*/content-type'
# Same as for delivery but include images and videos
- application/pdf
- application/msword
- application/vnd.openxmlformats-officedocument.wordprocessingml.document
- application/
- application/
- application/vnd.openxmlformats-officedocument.presentationml.presentation
- application/x-subrip
- image/*
- video/*
- audio/*
- text/x-freemarker
- text/x-groovy
- text/javascript
- text/css
# The regex path patterns for the metadata ("jacket") files of binary/document files
- ^/?site/documents/.+\.xml$
- ^/?static-assets/.+$
- ^/?remote-assets/.+$
- ^/?scripts/.+$
- ^/?templates/.+$
# Look into all XML descriptors to index all binary files referenced
- ^/?site/.+\.xml$
# Additional metadata such as contentLength, content-type specific metadata
- ^/?site/.+$
- ^/?config/.*$
# Include all fields marked as remote resources (S3, Box, CMIS)
- //item/key
- //item/url
- //*[@remote="true"]
Related Posts

Transitioning the JavaScript SDK to Semantic Versioning

Sumer Jabri

What is an AI CMS?

Amanda Lee

GenAI and Content Management: Best Practices and an Architectural Roadmap

Sara Williams

Multichannel Publishing for Enterprises

Amanda Jones