basix86 opened a new pull request, #2842:
URL: https://github.com/apache/tika/pull/2842
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
## TIKA-4698: Fix CompositeExternalParser failing to load in standalone OSGi
bundle
### Summary
When `tika-core` and `tika-parsers-standard-package` are deployed as
**separate OSGi bundles**, `CompositeExternalParser` was silently missing from
the parser list resolved by `DefaultParser`. This PR fixes that with a single
explicit `Import-Package` entry for `org.apache.tika.parser.external`.
### Root Cause
`CompositeExternalParser` lives in `tika-core` (package
`org.apache.tika.parser.external`) but is registered in
`tika-parsers-standard-package` only via the SPI text file
`META-INF/services/org.apache.tika.parser.Parser` — never in compiled bytecode.
`maven-bundle-plugin` (bnd) auto-detects `Import-Package` by scanning
bytecode only. Since the reference exists solely in a plain-text SPI file, bnd
silently omits `org.apache.tika.parser.external` from the generated
`MANIFEST.MF`. In a standalone OSGi deployment the bundle classloader of
`tika-parsers-standard-package` cannot wire to that package exported by
`tika-core`, so the SPI-loaded `CompositeExternalParser` fails to resolve and
is silently dropped.
This does not affect `tika-bundle-standard` (where everything is shaded into
a single fat bundle).
### Fix
One line added to the `maven-bundle-plugin` `<Import-Package>` instruction
in `tika-parsers-standard-package/pom.xml`:
```xml
<Import-Package>
org.w3c.dom,
org.apache.tika.parser.external, <!-- referenced only via SPI, invisible
to bnd -->
org.apache.tika.*,
*;resolution:=optional
</Import-Package>
```
### Steps to reproduce
- Use `tika-core:3.3.1` and `tika-parsers-standard-package:3.3.1` as
separate OSGi bundles
- Set `-Dorg.apache.tika.service.error.warn=true` to make load failures
visible (by default Tika silently ignores them)
- On bundle start, the Activator's `new
DefaultParser(Activator.class.getClassLoader())` triggers a
`ClassNotFoundException`:
```
WARN Could not load org.apache.tika.parser.external.CompositeExternalParser
java.lang.ClassNotFoundException:
org.apache.tika.parser.external.CompositeExternalParser
not found by org.apache.tika.parsers-standard-package
```
- `CompositeExternalParser` is absent from
`DefaultParser.getAllComponentParsers()`
**To verify the fix:** build this branch, install to local Maven repository
with `mvn install -DskipTests`, and repeat — `CompositeExternalParser` is now
present.
### Testing
We were unable to include an automated integration test for this fix. The
natural place would be `BundleIT` in `tika-bundle-standard`, which uses JUnit 4
+ PaxExam to exercise a real OSGi container. However, the module's `pom.xml`
already notes: _"pax doesn't yet work with junit4; checkstyle forbids junit4"_,
and the project's CI pipeline (`mvn clean test install -Pci`) runs only
surefire unit tests — `BundleIT` is not executed in CI. We would welcome
guidance from maintainers on the preferred approach for testing OSGi bundle
behaviour.
### Related Issues
- Fixes **[TIKA-4698](https://issues.apache.org/jira/browse/TIKA-4698)**
- Unblocks **[OAK-9752](https://issues.apache.org/jira/browse/OAK-9752)** —
Apache Jackrabbit OAK upgrade from Tika 2.x to Tika 3.x, which deploys Tika as
separate OSGi bundles and is directly blocked by TIKA-4698.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]