[
https://issues.apache.org/jira/browse/TIKA-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18086591#comment-18086591
]
ASF GitHub Bot commented on TIKA-4750:
--------------------------------------
Copilot commented on code in PR #2879:
URL: https://github.com/apache/tika/pull/2879#discussion_r3367368855
##########
docs/modules/ROOT/pages/configuration/index.adoc:
##########
@@ -97,7 +97,7 @@ JSON uses the backslash as an escape character, so path
options (e.g. `tesseract
* xref:configuration/parsers/pdf-parser.adoc[PDFParser] — PDF parsing options
* xref:configuration/parsers/tesseract-ocr-parser.adoc[TesseractOCRParser] —
OCR options for image-based text extraction
-* xref:configuration/parsers/tess4j-parser.adoc[Tess4J OCR Parser] —
in-process OCR via tess4j JNI bindings
+* xref:configuration/parsers/tess4j-parser.adoc[Tess4J OCR Parser] —
in-process OCR via tess4j JNI bindings (advanced users only; most users should
prefer the TesseractOCRParser above)
Review Comment:
This entry describes Tess4J as using “JNI bindings”, but Tess4JParser/Tess4J
use JNA (and the tess4j-parser page below also says JNA). Keeping “JNI” here is
factually incorrect and may mislead readers about the integration model.
> tika-4.0.0-alpha1 - tess4j-parser not available
> -----------------------------------------------
>
> Key: TIKA-4750
> URL: https://issues.apache.org/jira/browse/TIKA-4750
> Project: Tika
> Issue Type: Bug
> Affects Versions: 4.0.0
> Reporter: Adrian Bird
> Priority: Major
>
> I've tried to use the 'tess4j-parser' but get the following error:
>
> {noformat}
> DEBUG [main] 09:09:06,858
> org.apache.tika.config.loader.TikaObjectMapperFactory Loaded component
> registry: parse-context
> Exception in thread "main" org.apache.tika.exception.TikaConfigException:
> Unknown component type: 'tess4j-parser'
> at
> org.apache.tika.config.loader.ComponentInstantiator.instantiate(ComponentInstantiator.java:179)
> at
> org.apache.tika.config.loader.LoaderContext.instantiate(LoaderContext.java:110)
> at
> org.apache.tika.config.loader.ParserLoader.loadComponent(ParserLoader.java:61)
> at
> org.apache.tika.config.loader.ParserLoader.loadComponent(ParserLoader.java:46)
> at
> org.apache.tika.config.loader.AbstractSpiComponentLoader.load(AbstractSpiComponentLoader.java:107)
> at
> org.apache.tika.config.loader.TikaLoader.loadComponent(TikaLoader.java:683)
> at org.apache.tika.config.loader.TikaLoader.get(TikaLoader.java:647)
> at
> org.apache.tika.config.loader.TikaLoader.loadParsers(TikaLoader.java:247)
> at
> org.apache.tika.config.loader.TikaLoader.loadAutoDetectParser(TikaLoader.java:379)
> at org.apache.tika.cli.TikaCLI.configure(TikaCLI.java:901)
> at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:532)
> at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:267)
> Caused by: java.lang.ClassNotFoundException: Component 'tess4j-parser' is not
> registered. Components must be registered via @TikaComponent annotation or
> .idx file. Arbitrary class names are not allowed for security reasons.
> at
> org.apache.tika.serialization.ComponentNameResolver.resolveClass(ComponentNameResolver.java:116)
> at
> org.apache.tika.config.loader.ComponentInstantiator.instantiate(ComponentInstantiator.java:176)
> ... 11 more
> {noformat}
> FYI I've probably done all the testing I'm going to with this version.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)