[
https://issues.apache.org/jira/browse/SOLR-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16291935#comment-16291935
]
Tim Allison commented on SOLR-11701:
------------------------------------
I merged [[email protected]]'s mods and made a few updates for Tika
1.17.
I ran an integration test against 643 files in Apache Tika's unit test docs,
and I got the same # of documents indexed in Solr as tika-app.jar parsed
without exceptions.
{noformat}
public static void main(String[] args) throws Exception {
Path extracts = Paths.get("C:\\data\\tika_unit_tests_extracts");
SolrClient client = new
HttpSolrClient.Builder("http://localhost:8983/solr/fileupload_passt/").build();
for (File f : extracts.toFile().listFiles()) {
try (Reader r = Files.newBufferedReader(f.toPath(),
StandardCharsets.UTF_8)) {
List<Metadata> metadataList = JsonMetadataList.fromJson(r);
String ex =
metadataList.get(0).get(TikaCoreProperties.TIKA_META_EXCEPTION_PREFIX +
"runtime");
if (ex == null) {
SolrQuery q = new SolrQuery("id:
"+f.getName().replace(".json", ""));
QueryResponse response = client.query(q);
SolrDocumentList results = response.getResults();
if (results.getNumFound() != 1) {
System.err.println(f.getName() + " " +
results.getNumFound());
}
}
}
}
}
{noformat}
I did the usual dance:
{noformat}
ant clean-jars jar-checksums
ant precommit
{noformat}
[~erickerickson], this _should_ be good to go.
> Upgrade to Tika 1.17 when available
> -----------------------------------
>
> Key: SOLR-11701
> URL: https://issues.apache.org/jira/browse/SOLR-11701
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Tim Allison
>
> Kicking off release process for Tika 1.17 in the next few days. Please let
> us know if you have any requests.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]