ElGigi opened a new pull request, #27:
URL: https://github.com/apache/tika-helm/pull/27

   ### Summary
   This PR enhances the Tika Helm chart by allowing users to inject multiple 
configuration files into the Tika configuration directory. It introduces a new 
`additionalConfigs` dictionary in `values.yaml` that is automatically mapped to 
the Tika ConfigMap.
   
   ### Motivation
   The current chart implementation is limited to a single `tika-config.xml` 
file. Advanced Tika deployments often require supplemental XML files. 
   
   A primary use case is overriding default MIME type detection for malformed 
documents. For instance, handling PDF files with "noise" or "garbage" bytes 
before the `%PDF-` header requires a custom `mimetypes.xml` referenced via 
`<mime-table-path>` in the main `tika-config.xml`. This requires multiple files 
to be present in the `/tika-config` volume.
   
   ### Changes
   - **templates/configmap.yaml**: Modified to include `additionalConfigs` data 
using `toYaml`. This allows for a flexible number of additional configuration 
files.
   - **values.yaml**: Added documentation and commented examples showing how to:
       - Configure a custom MIME detection rule (handling shifted PDF headers).
       - Reference these files within the main `tikaConfig`.
   
   ### Testing performed
   - Validated that `additionalConfigs` keys are correctly rendered as 
individual files in the resulting ConfigMap.
   - Verified that the deployment successfully mounts the directory, making all 
files available to the Tika Java process.
   
   ### Example
   
   ```yml
   additionalConfigs:
     custom-mimetypes.xml: |
       <?xml version="1.0" encoding="UTF-8"?>
       <mime-info>
         <mime-type type="application/pdf">
           <magic priority="80">
             <match value="%PDF-" type="string" offset="0:8192"/>
           </magic>
           <glob pattern="*.pdf"/>
         </mime-type>
       </mime-info>
   
   tikaConfig: |
     <?xml version="1.0" encoding="UTF-8"?>
     <properties>
       <mtrandata>
         <mime-table-path>/tika-config/custom-mimetypes.xml</mime-table-path>
       </mtrandata>
     </properties>
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to