Re: FSTs to drive type ahead search?

2013-11-23 Thread Michael McCandless
Try using one of FST-based suggesters and then compare? E.g., WFSTCompletionLookup, AnalyzingSuggester, FuzzySuggester. Mike McCandless http://blog.mikemccandless.com On Sat, Nov 23, 2013 at 6:17 AM, Gili Nachum wrote: > Hello! I've implemented a type ahead search by indexing all possible term

Re: Lucene multithreaded indexing problems

2013-11-23 Thread Daniel Penning
G1 and CMS are both tuned primarily for low pauses which is typically prefered for searching an index. In this case i guess that indexing throughput is prefered in which case using ParallelGC might be the better choice. Am 23.11.2013 17:15, schrieb Uwe Schindler: Hi, Maybe your heap size is

Re: Lucene multithreaded indexing problems

2013-11-23 Thread Daniel Penning
Maybe you should turn on Garbage Collection logging to confirm that you are running into some kind of memory problem. (start JVM with -verbose:gc) If the GC is running very often as soon as your indexing process slows down, i would suggest you to create a heapdump and check what the memory is us

RE: Lucene multithreaded indexing problems

2013-11-23 Thread Uwe Schindler
Hi, Maybe your heap size is just too big, so your JVM spends too much time in GC? The setup you described in your last eMail ist the "official supported" setup :-) Lucene has no problem with that setup and can index. Be sure: - Don't give too much heap to your indexing app. Larger heaps create m

Re: Lucene multithreaded indexing problems

2013-11-23 Thread Igor Shalyminov
So we return to the initially described setup: multiple parallel workers, each making "parse + indexWriter.addDocument()" for single documents with no synchronization at my side. This setup was also bad on memory consumption and thread blocking, as I reported. Or did I misunderstand you? -- I

FSTs to drive type ahead search?

2013-11-23 Thread Gili Nachum
Hello! I've implemented a type ahead search by indexing all possible terms' prefixes as fields on the docs. The resulting index is about 1gb in size and fits in the filesystem cache. Will implementing this differently, over FSTs instead of prefixes, would bare any performance/size/features advantag