Re: Using lucene for substring matching

2010-07-28 Thread Ian Lea
You could also look at MemoryIndex or InstantiatedIndex, both in lucene's contrib area. I think that I was also wondering if you might gain from using TermDocs or TermVectors or something directly. -- Ian. On Tue, Jul 27, 2010 at 9:34 PM, Geir Gullestad Pettersen wrote: > Thanks for your fee

Re: Using lucene for substring matching

2010-07-27 Thread William Newport
Ramdirectorys seem useful but as the index gets larger, java heap sizes can become a problem in terms of garbage collection pauses. Some customers are looking to use data grid products such as IBM websphere extreme scale or oracle coherence to act as the directory for the index. This stores the ind

Re: Using lucene for substring matching

2010-07-27 Thread Geir Gullestad Pettersen
Thanks for your feedback, Ian. I have written a first implementation of this service that works well. You mentioned something about technologies for speeding up lucene, something I am interested in knowing more about. Would you, or anyone, please mind elaborating a bit, or giving me some pointers?

Re: Using lucene for substring matching

2010-07-23 Thread Ian Lea
So, if I've understood this correctly, you've got some text and wan't to loop through a list of words and/or phrases, and see which of those match the text. e.g. text "some random article about something or other of some random length" words some - matches many - no match article - matches word

Using lucene for substring matching

2010-07-22 Thread Geir Gullestad Pettersen
Hi, I'm about to write an application that does very simple text analysis, namely dictionary based entity entraction. The alternative is to do in memory matching with substring: String text; // could be any size, but normally "news paper length" List matches; for( String wordOrPhrase : dictionary