RE: expensive post filtering of a query's result

2013-11-25 Thread Uwe Schindler
Hi, Lucene Filters are always executed before on the full index. This is done inside getDocIdSet(), which is similar to scorer() in Querys. Most filters return a bitset in this method, so they calculate the whole bitset on the full index - this is what your filter is doing. The strategy only ap

expensive post filtering of a query's result

2013-11-25 Thread Andreas Brandl
Hi, I have a Query that is fast and cheap to answer compared to a Filter implementation that is quite expensive (* for a background see below). I was under the impression that when combining Query and Filter, lucene is able to calculate matches based on the query and *for these matches* applies

RE: Lucene multithreaded indexing problems

2013-11-25 Thread Uwe Schindler
Hi, > But here's what I have. > > Today I looked at the indexer in the VisualVM, and I can definitely say that > the problem is in the memory: the resourses (which mostly are Document > fields) just don't go away. > I tried different GCs (Parallel, CMS, the default one), and every time the > beha

Revolution writeup

2013-11-25 Thread Michael Sokolov
I just posted a writeup of the Lucene/Solr Revolution Dublin conference. I've been waiting for videos to become available, but I got impatient. Slides are there, mostly though. Sorry if I missed your talk -- I'm hoping to catch up when the videos are posted... http://blog.safariflow.com/201

Re: Lucene multithreaded indexing problems

2013-11-25 Thread Desidero
Apparently writing emails first thing in the morning isn't always a great idea. I forgot to address the questions you had at the end: 1) How many threads are used by the indexWriter? A maximum of IndexWriterConfig.maxThreadStates threads can write at once. 2) When does it flush segments to d

Re: Lucene multithreaded indexing problems

2013-11-25 Thread Desidero
Providing your system specs, parallelism (# of writer threads in a specific example), and any other special values you set (RAMBufferSizeMB, maxThreadStates, etc) would be helpful. -- I do most of my development on a 64 core machine. When doing a full reindex, I have 64 worker threads doing conc

Scanning through inverted index

2013-11-25 Thread Michael Berkovsky
Hello! I wonder if there is a fast way to scan through the entire inverted index to collect words and documents they belong to. Thanks, mb

Re: Lucene multithreaded indexing problems

2013-11-25 Thread Igor Shalyminov
Thank you! But here's what I have. Today I looked at the indexer in the VisualVM, and I can definitely say that the problem is in the memory: the resourses (which mostly are Document fields) just don't go away. I tried different GCs (Parallel, CMS, the default one), and every time the behaviou

Re: Help in Lucene Postings Highlighter..

2013-11-25 Thread VIGNESH S
Hi Mike, I indexed 1 GB document with postingshighlighter and Fast Vector Highlighter. To my Surprise PostingsHighlighter took almost 1.6 times FastVectorHighlighter.. I thought storing document offset will take less space compared to Storing Term Vector. On Mon, Nov 25, 2013 at 7:04 PM, M

Re: Help in Lucene Postings Highlighter..

2013-11-25 Thread Michael McCandless
Yes, you need to store it; this is where PH gets the "original" content from for highlighting. Alternatively you can store/retrieve this content yourself and pass it to PH. But, what NPE did you hit? We should improve that if we can... Mike McCandless http://blog.mikemccandless.com On Mon, N

Help in Lucene Postings Highlighter..

2013-11-25 Thread VIGNESH S
Hi, I tried indexing for PostingsHighligher with TextField."TYPE_NOT_STOREDTYPE_STORED" and used postings highlighter..iam getting null pointer Exception. But if i use TextField.TYPE_STORED it is working properly.. can,t i use postingshighlighter without storing ?.Please kindly Help. Below is t