Re: Lucene in-memory index

2013-10-18 Thread Igor Shalyminov
But why is it so costly? In a regular query we walk postings and match document numbers, in a SpanQuery we match position numbers (or position segments), what's the principal difference? I think it's just that #documents << #positions. For "A,sg" and "A,pl" I use unordered SpanNearQueries with

Re: Lucene in-memory index

2013-10-18 Thread Michael McCandless
On Fri, Oct 18, 2013 at 1:19 PM, Igor Shalyminov wrote: > OK, it turns out that DirectPostingsFormat is really an extreme thing: 8GB of > index couldn't fit into 20+ java heap. > I wonder if there is a postings format that works from disk the standard way > but uses no compression? Yes, it's v

Re: Lucene in-memory index

2013-10-18 Thread Michael McCandless
Unfortunately, SpanNearQuery is a very costly query. What slop are you passing? You might want to check out https://issues.apache.org/jira/browse/LUCENE-5288 ... it adds proximity boosting to queries, but it's still very early in the iterating, and if you need a precise count of only those docume

Re: Lucene in-memory index

2013-10-18 Thread Igor Shalyminov
Hello! OK, it turns out that DirectPostingsFormat is really an extreme thing: 8GB of index couldn't fit into 20+ java heap. I wonder if there is a postings format that works from disk the standard way but uses no compression? -- Best Regards, Igor 18.10.2013, 02:06, "Igor Shalyminov" : > Mik

Re: external file stored field codec

2013-10-18 Thread Michael Sokolov
On 10/18/2013 1:08 AM, Shai Erera wrote: The codec intercepts merges in order to clean up files that are no longer referenced What happens if a document is deleted while there's a reader open on the index, and the segments are merged? Maybe I misunderstand what you meant by this statement, but