ElGigi opened a new pull request, #27:
URL: https://github.com/apache/tika-helm/pull/27
### Summary
This PR enhances the Tika Helm chart by allowing users to inject multiple
configuration files into the Tika configuration directory. It introduces a new
`additionalConfigs` dictionary in `values.yaml` that is automatically mapped to
the Tika ConfigMap.
### Motivation
The current chart implementation is limited to a single `tika-config.xml`
file. Advanced Tika deployments often require supplemental XML files.
A primary use case is overriding default MIME type detection for malformed
documents. For instance, handling PDF files with "noise" or "garbage" bytes
before the `%PDF-` header requires a custom `mimetypes.xml` referenced via
`<mime-table-path>` in the main `tika-config.xml`. This requires multiple files
to be present in the `/tika-config` volume.
### Changes
- **templates/configmap.yaml**: Modified to include `additionalConfigs` data
using `toYaml`. This allows for a flexible number of additional configuration
files.
- **values.yaml**: Added documentation and commented examples showing how to:
- Configure a custom MIME detection rule (handling shifted PDF headers).
- Reference these files within the main `tikaConfig`.
### Testing performed
- Validated that `additionalConfigs` keys are correctly rendered as
individual files in the resulting ConfigMap.
- Verified that the deployment successfully mounts the directory, making all
files available to the Tika Java process.
### Example
```yml
additionalConfigs:
custom-mimetypes.xml: |
<?xml version="1.0" encoding="UTF-8"?>
<mime-info>
<mime-type type="application/pdf">
<magic priority="80">
<match value="%PDF-" type="string" offset="0:8192"/>
</magic>
<glob pattern="*.pdf"/>
</mime-type>
</mime-info>
tikaConfig: |
<?xml version="1.0" encoding="UTF-8"?>
<properties>
<mtrandata>
<mime-table-path>/tika-config/custom-mimetypes.xml</mime-table-path>
</mtrandata>
</properties>
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]