basix86 opened a new pull request, #2842:
URL: https://github.com/apache/tika/pull/2842

   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   ## TIKA-4698: Fix CompositeExternalParser failing to load in standalone OSGi 
bundle
   
   ### Summary
   
   When `tika-core` and `tika-parsers-standard-package` are deployed as 
**separate OSGi bundles**, `CompositeExternalParser` was silently missing from 
the parser list resolved by `DefaultParser`. This PR fixes that with a single 
explicit `Import-Package` entry for `org.apache.tika.parser.external`.
   
   ### Root Cause
   
   `CompositeExternalParser` lives in `tika-core` (package 
`org.apache.tika.parser.external`) but is registered in 
`tika-parsers-standard-package` only via the SPI text file 
`META-INF/services/org.apache.tika.parser.Parser` — never in compiled bytecode.
   
   `maven-bundle-plugin` (bnd) auto-detects `Import-Package` by scanning 
bytecode only. Since the reference exists solely in a plain-text SPI file, bnd 
silently omits `org.apache.tika.parser.external` from the generated 
`MANIFEST.MF`. In a standalone OSGi deployment the bundle classloader of 
`tika-parsers-standard-package` cannot wire to that package exported by 
`tika-core`, so the SPI-loaded `CompositeExternalParser` fails to resolve and 
is silently dropped.
   
   This does not affect `tika-bundle-standard` (where everything is shaded into 
a single fat bundle).
   
   ### Fix
   
   One line added to the `maven-bundle-plugin` `<Import-Package>` instruction 
in `tika-parsers-standard-package/pom.xml`:
   
   ```xml
   <Import-Package>
     org.w3c.dom,
     org.apache.tika.parser.external,   <!-- referenced only via SPI, invisible 
to bnd -->
     org.apache.tika.*,
     *;resolution:=optional
   </Import-Package>
   ```
   
   ### Steps to reproduce
   
   - Use `tika-core:3.3.1` and `tika-parsers-standard-package:3.3.1` as 
separate OSGi bundles
   - Set `-Dorg.apache.tika.service.error.warn=true` to make load failures 
visible (by default Tika silently ignores them)
   - On bundle start, the Activator's `new 
DefaultParser(Activator.class.getClassLoader())` triggers a 
`ClassNotFoundException`:
     ```
     WARN Could not load org.apache.tika.parser.external.CompositeExternalParser
     java.lang.ClassNotFoundException: 
org.apache.tika.parser.external.CompositeExternalParser
         not found by org.apache.tika.parsers-standard-package
     ```
   - `CompositeExternalParser` is absent from 
`DefaultParser.getAllComponentParsers()`
   
   **To verify the fix:** build this branch, install to local Maven repository 
with `mvn install -DskipTests`, and repeat — `CompositeExternalParser` is now 
present.
   
   ### Testing
   
   We were unable to include an automated integration test for this fix. The 
natural place would be `BundleIT` in `tika-bundle-standard`, which uses JUnit 4 
+ PaxExam to exercise a real OSGi container. However, the module's `pom.xml` 
already notes: _"pax doesn't yet work with junit4; checkstyle forbids junit4"_, 
and the project's CI pipeline (`mvn clean test install -Pci`) runs only 
surefire unit tests — `BundleIT` is not executed in CI. We would welcome 
guidance from maintainers on the preferred approach for testing OSGi bundle 
behaviour.
   
   ### Related Issues
   
   - Fixes **[TIKA-4698](https://issues.apache.org/jira/browse/TIKA-4698)**
   - Unblocks **[OAK-9752](https://issues.apache.org/jira/browse/OAK-9752)** — 
Apache Jackrabbit OAK upgrade from Tika 2.x to Tika 3.x, which deploys Tika as 
separate OSGi bundles and is directly blocked by TIKA-4698.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to