Re: Taxonomy vs SSDVFF for faceted search

2021-04-29 Thread Alexander Lukyanchikov
Hi Greg, Matt, Thank you for the responses, it's very helpful and great to hear that Taxonomy is successfully used for large scale products! Our biggest concern with it right now is future complications related to index split and merge, which we are most likely going to use to implement sharding a

Re: Negation search help

2021-04-29 Thread amitesh116
During this change I had to change the way I store indexes. This change results in too many .cfs and .fdt files generated against earlier. Previously there were 5-7 files in index folder, now it has grown to 40+. Does it affect having change in the way how indexes are stored internally with this ch

Re: Negation search help

2021-04-29 Thread amitesh116
//Method to create document private static Document createDocumentTextField(HashMap fields) { Document document = new Document(); for (String key : fields.keySet()) { String val = fields.get(key); Field f = new TextField(key, val, Field.Store.YES);

Re: Negation search help

2021-04-29 Thread Michael Wechner
Yes, it would be great if you could share code snippets. Maybe it will help others or maybe someone will have a suggestion to improve or an alternative. All the best Michael Am 29.04.21 um 14:35 schrieb amitesh116: Thank you Michael! I solved this requirement by setting the tokenStream at t

Re: Taxonomy vs SSDVFF for faceted search

2021-04-29 Thread Greg Miller
Hi Alex- Amazon's product search engine is built on top of Lucene, which is a fairly large-scale application (w.r.t. both index size, traffic and use-case complexity). We have found taxonomy-based faceting to work well for us generally, and haven't needed to do much to optimize beyond what's alrea

Re: Negation search help

2021-04-29 Thread amitesh116
Thank you Michael! I solved this requirement by setting the tokenStream at the field level and not leaving it to the analyzer. This gives control over altering the full text before tokenization using custom methods. This has memory overhead which is handled by writing the documents one at a time

Re: Taxonomy vs SSDVFF for faceted search

2021-04-29 Thread Matt Davis
Alex, We did consider trying to optimize Taxonomy indexing performance but we never really got around to it. The sidecar index is annoying to deal with and we have had occasional issues with it. Zulia has sharding implemented. The main issue here is not the taxonomy but rather just getting exact

Who can help my contact understand low level Lucene bottlenecks?

2021-04-29 Thread Charlie Hull
Hi all, I've been contacted by someone from the hardware arena who wants to understand which areas of Lucene could benefit from performance acceleration, removal of bottlenecks etc. I've asked a few of the usual suspects to no avail, so I'm putting out a wider call for anyone (probably a comm