Backward compatibility of FST50 and UniformSplit formats

2021-04-18 Thread Dmitry Emets
Hi! I cannot open by lucene master my indexes created by lucene 8.5. I get an error Exception in thread "main" org.apache.lucene.index.CorruptIndexException: codec mismatch: actual codec=Lucene84PostingsWriterDoc vs expected codec=Lucene90PostingsWriterDoc (resource=MMapIndexInput(path="C:\data\luc

Re: Deduplication of search result with custom with custom sort

2020-10-13 Thread Dmitry Emets
ck up and ask what the use-case > > > > is. Returning 6.5M docs to a user is useless, so are you’re doing > > > > some kind of analytics maybe? In which case, and again > > > > assuming you’re using Solr, Streaming Aggregation might > > > > be a better

Re: Deduplication of search result with custom with custom sort

2020-10-12 Thread Dmitry Emets
e you’re doing > > > some kind of analytics maybe? In which case, and again > > > assuming you’re using Solr, Streaming Aggregation might > > > be a better option. > > > > > > This really sounds like an XY problem. You’re trying to solve problem X > &

Re: Deduplication of search result with custom with custom sort

2020-10-09 Thread Dmitry Emets
problem X > and asking how to accomplish it with Y. What I’m questioning > is whether Y (grouping) is a good approach or not. Perhaps if > you explained X there’d be a better suggestion. > > Best, > Erick > > > On Oct 9, 2020, at 8:19 AM, Dmitry Emets wrote: > > >

Re: Deduplication of search result with custom with custom sort

2020-10-09 Thread Dmitry Emets
I have 12_000_000 documents, 6_500_000 groups With sort: It takes around 1 sec without grouping, 2 sec with grouping and 12 sec with setAllGroups(true) Without sort: It takes around 0.2 sec without grouping, 0.6 sec with grouping and 10 sec with setAllGroups(true) Thank you, Erick, I will look in

Re: Deduplication of search result with custom with custom sort

2020-10-09 Thread Dmitry Emets
Yes, it is пт, 9 окт. 2020 г. в 14:25, Diego Ceccarelli (BLOOMBERG/ LONDON) < dceccarel...@bloomberg.net>: > Is the field that you are using to dedupe stored as a docvalue? > > From: java-user@lucene.apache.org At: 10/09/20 12:18:04To: > java-user@lucene.apache.org > Subject: Deduplication of sea

Deduplication of search result with custom with custom sort

2020-10-09 Thread Dmitry Emets
Hi, I need to deduplicate search results by specific field and I have no idea how to implement this properly. I have tried grouping with setGroupDocsLimit(1) and it gives me expected results, but has not very good performance. I think that I need something like DiversifiedTopDocsCollector, but suit