[
https://issues.apache.org/jira/browse/LUCENE-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss reassigned LUCENE-8380:
-----------------------------------
Resolution: Fixed
Assignee: Dawid Weiss
Thanks Ruslan!
> UTF8TaxonomyWriterCache inconsistency
> -------------------------------------
>
> Key: LUCENE-8380
> URL: https://issues.apache.org/jira/browse/LUCENE-8380
> Project: Lucene - Core
> Issue Type: Bug
> Components: modules/facet
> Affects Versions: 7.1
> Reporter: Ruslan Torobaev
> Assignee: Dawid Weiss
> Priority: Minor
> Fix For: 7.5
>
> Attachments: LUCENE-8380.patch, lucene-taxonomy-cache-report.tar.gz,
> taxonomy-cache.json.gz, taxonomy.tar.gz
>
>
> I’m facing a problem with taxonomy writer cache inconsistency. At some point
> in time UTF8TaxonomyWriterCache starts to return wrong ord for some facet
> labels. As result wrong ord are written in doc facet fields, and wrong counts
> are returned (undercount) during search. This bug is manifested on different
> servers with different index contents (we have several separate indexes with
> unique data).
> Unfortunately I can’t reproduce this behaviour in tests.
> I've dumped "broken" UTF8TaxonomyWriterCache instance and created app to
> load it and to compare with real taxonomy. Dumps and app are in attachment.
> To run demo extract archives content and exec:
> {code}
> mvn compile
> mvn exec:java
> -Dexec.mainClass="me.torobaev.lucene.taxonomy.cache.TaxonomyCacheCheck"
> -DtaxonomyDir=../taxonomy/ -DcacheDump=../taxonomy-cache.json
> {code}
> As you can see, labels [frametype, 7] and [modification_id, 682] have same
> ord in cache.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]