CorruptIndexException after failed segment merge caused by No space left on device

2021-03-23 Thread Alexander Lukyanchikov
Hello everyone, Recently we had a failed segment merge caused by "No space left on device". After restart, Lucene failed with the CorruptIndexException. The expectation was that Lucene automatically recovers in such case, because there was no succesul commit. Is it a correct assumption, or I am mi

Re: CorruptIndexException after failed segment merge caused by No space left on device

2021-03-24 Thread Alexander Lukyanchikov
dless > > http://blog.mikemccandless.com > > > On Wed, Mar 24, 2021 at 12:55 PM Robert Muir wrote: > > > On Wed, Mar 24, 2021 at 1:41 AM Alexander Lukyanchikov < > > alexanderlukyanchi...@gmail.com> wrote: > > > > > Hello everyone, > > >

Taxonomy vs SSDVFF for faceted search

2021-04-28 Thread Alexander Lukyanchikov
Hello everyone, We are trying to choose between Taxonomy and SortedSetDocValuesFacetField implementations for faceted search, and based on available information and our quick tests, the difference is the following - - Taxonomy is faster at query time (on our test workload, the difference sometime

Re: Taxonomy vs SSDVFF for faceted search

2021-04-28 Thread Alexander Lukyanchikov
covid can be found here ( > > https://icite.od.nih.gov/covid19/search/#search:searchId=6089a5b7218c6902d422e907 > ). > If you click on the facet tab you can see how we use facets. I believe the > use case might largely drive the choice. > > Thanks, > Matt

Re: Taxonomy vs SSDVFF for faceted search

2021-04-29 Thread Alexander Lukyanchikov
ne. The framework is currently there for routing queries and such > but > > the actual copying of the index has not been implemented yet so I can't > > speak to that. Hope this helps some. > > > > Thanks, > > Matt > > > > > > On Wed, Apr 28, 2021 at 5

Re: Taxonomy vs SSDVFF for faceted search

2021-04-30 Thread Alexander Lukyanchikov
shards > there are. It should be append-only where new ordinals are created > when they're first seen, and then stay stable through merges. Or am I > misunderstanding your use-case and you're actually doing some shard > management on top of what Lucene is doing? > > C

NRT readers and overall indexing/querying throughput

2021-08-03 Thread Alexander Lukyanchikov
Hello everyone, We are considering switching from regular to NRT readers, hoping it would improve overall indexing/querying throughput and also optimize the turnaround time. I did some benchmarks, mostly to understand how much benefit we can get and make sure I'm implementing everything correctly.

Returning large resultset is slow and resource intensive

2022-03-08 Thread Alexander Lukyanchikov
Hello everyone, For our use case, we need to run queries which return the full matched result set. In some cases, this result set can be large (50k+ results out of 4 million total documents). Perf test showed that just 4 threads running random queries returning 50k results make Lucene utilize 100%