RE: Is creating an analyzer expensive?

2012-08-17 Thread Uwe Schindler
You have to use the TokenStream retrieved by Analyzer in the specified order, otherwise it will not work correctly and will behave as described by you: reset() while (incrementToken()) end() close() You have to call reset() also when using for first time! That's specified in the specs. If you do

Re: Is creating an analyzer expensive?

2012-08-17 Thread rrs
Hi Simon, I'm trying to reuse a custom analyzer and it's not working unless I manually call reset() on the TokenStream. Basically the analyzer will work on the first string, but complete fail on any string after that. The weird part is that this is only necessary when using the SynonymFilter. I

Re: Find documents contained in search term

2012-08-17 Thread davidbrai
I was hoping I didn't have to iterate through the short documents. I have about ~1M of them currently and this process needs to be very fast. So I understand there is not such functionality available in lucene. -- View this message in context: http://lucene.472066.n3.nabble.com/Find-documents-c

Re: Find documents contained in search term

2012-08-17 Thread Ian Lea
Can't see how you could do it with standard queries, but you could reverse the process and use a MemoryIndex. Add the single target phrase to the memory index then loop round all docs executing a search for each one. Maybe use PrefixQuery although I'd worry about performance. Try it and see. Bu

Re: new segments and merged segments

2012-08-17 Thread Michael McCandless
Hmm, actually, we only warm newly merged (not newly flushed) segments, today. We don't warm flushed segments today because, in an NRT setting, it's just an added latency on turning around updates to the index (vs merging which is purely replacing old segments with new ones). But one hack you coul