Re: Lucene in-memory index

2013-10-31 Thread Michael McCandless
On Fri, Oct 25, 2013 at 9:58 AM, Igor Shalyminov wrote: > What is ProxBooleanTermQuery? > I couldn't find it in the trunk and in that ticket's > (https://issues.apache.org/jira/browse/LUCENE-2878) patch. Sorry, this is on https://issues.apache.org/jira/browse/LUCENE-5288 Next time try searchin

Re: Lucene in-memory index

2013-10-25 Thread Igor Shalyminov
What is ProxBooleanTermQuery? I couldn't find it in the trunk and in that ticket's (https://issues.apache.org/jira/browse/LUCENE-2878) patch. And for now it's very fuzzy to me how the searching/scoring works. Are there any tutorials or talks on how do Queries, Scorers, Collectors interoperate?

Re: Lucene in-memory index

2013-10-23 Thread Michael McCandless
On Tue, Oct 22, 2013 at 9:43 AM, Igor Shalyminov wrote: > Thanks for the link, I'll definitely dig into SpanQuery internals very soon. You could also just make a custom query. If you start from the ProxBooleanTermQuery on that issue, but change it so that it rejects hits that didn't have terms

Re: Lucene in-memory index

2013-10-22 Thread Igor Shalyminov
Hello Mike! 19.10.2013, 14:54, "Michael McCandless" : > On Fri, Oct 18, 2013 at 5:50 PM, Igor Shalyminov > wrote: > >>  But why is it so costly? > > I think because the matching is inherently complex?  But also because > it does high-cost things like allocating new List and Set for every > match

Re: Lucene in-memory index

2013-10-19 Thread Michael McCandless
On Fri, Oct 18, 2013 at 5:50 PM, Igor Shalyminov wrote: > But why is it so costly? I think because the matching is inherently complex? But also because it does high-cost things like allocating new List and Set for every matched doc (e.g. NearSpansOrdered.shrinkToAfterShortestMatch) to hold all p

Re: Lucene in-memory index

2013-10-18 Thread Igor Shalyminov
But why is it so costly? In a regular query we walk postings and match document numbers, in a SpanQuery we match position numbers (or position segments), what's the principal difference? I think it's just that #documents << #positions. For "A,sg" and "A,pl" I use unordered SpanNearQueries with

Re: Lucene in-memory index

2013-10-18 Thread Michael McCandless
On Fri, Oct 18, 2013 at 1:19 PM, Igor Shalyminov wrote: > OK, it turns out that DirectPostingsFormat is really an extreme thing: 8GB of > index couldn't fit into 20+ java heap. > I wonder if there is a postings format that works from disk the standard way > but uses no compression? Yes, it's v

Re: Lucene in-memory index

2013-10-18 Thread Michael McCandless
Unfortunately, SpanNearQuery is a very costly query. What slop are you passing? You might want to check out https://issues.apache.org/jira/browse/LUCENE-5288 ... it adds proximity boosting to queries, but it's still very early in the iterating, and if you need a precise count of only those docume

Re: Lucene in-memory index

2013-10-18 Thread Igor Shalyminov
Hello! OK, it turns out that DirectPostingsFormat is really an extreme thing: 8GB of index couldn't fit into 20+ java heap. I wonder if there is a postings format that works from disk the standard way but uses no compression? -- Best Regards, Igor 18.10.2013, 02:06, "Igor Shalyminov" : > Mik

Re: Lucene in-memory index

2013-10-17 Thread Igor Shalyminov
Mike, For now I'm using just a SpanQuery over a ~600MB index segment single-threadedly (one segment - one thread, the complete setup is 30 segments with the total of 20GB). I'm trying to use Lucene for the morphologically annotated text corpus (namely, Russian National Corpus). The main query

Re: Lucene in-memory index

2013-10-17 Thread Michael McCandless
DirectPostingsFormat holds all postings in RAM, uncompressed, as simple java arrays. But it's quite RAM heavy... The hotspots may also be in the queries you are running ... maybe you can describe more how you're using Lucene? Mike McCandless http://blog.mikemccandless.com On Thu, Oct 17, 2013

Re: Lucene in-memory index

2013-10-17 Thread Igor Shalyminov
Hello! I've tried two approaches: 1) RAMDirectory, 2) MMapDirectory + tmpfs. Both work the same for me (the same bad:( ). Thus, I think my problem is not disk access (although I always see getPayload() in the VisualVM top). So, maybe the hard part in the postings traversal is decompression? Are

Re: Lucene in-memory index

2013-10-09 Thread Vitaly Funstein
I don't think you want to load indexes of this size into a RAMDirectory. The reasons have been listed multiple times here... in short, just use MMapDirectory. On Wed, Oct 9, 2013 at 3:17 PM, Igor Shalyminov wrote: > Hello! > > I need to perform an experiment of loading the entire index in RAM an

Lucene in-memory index

2013-10-09 Thread Igor Shalyminov
Hello! I need to perform an experiment of loading the entire index in RAM and seeing how the search performance changes. My index has TermVectors with payload and position info, StoredFields, and DocValues. It takes ~30GB on disk (the server has 48). _indexDirectoryReader = DirectoryReader.open

Re: How to create a Lucene in-memory index at webapp deployment time

2012-09-10 Thread Jack Krupansky
Follow the instruction here: http://lucene.apache.org/core/discussion.html -- Jack Krupansky -Original Message- From: Noopur Julka Sent: Monday, September 10, 2012 12:43 PM To: java-user@lucene.apache.org Cc: Dhananjeyan Balaretnaraja Subject: Re: How to create a Lucene in-memory

Re: How to create a Lucene in-memory index at webapp deployment time

2012-09-10 Thread Noopur Julka
unsuscribe me pls Regards, Noopur Julka On Fri, Sep 7, 2012 at 10:40 AM, Kasun Perera wrote: > I have a web java/jsp application running on Apache Tomcat server. In this > web application I have used lucene, to index and calculate similrarity > between some PDF documents(PDF documents are in

Re: How to create a Lucene in-memory index at webapp deployment time

2012-09-07 Thread Ian Lea
You can do stuff with scopes and contexts and web.xml and whatever (google something like tomcat application scope). Or use some static classes or singletons to look after the single index. -- Ian. On Fri, Sep 7, 2012 at 6:10 AM, Kasun Perera wrote: > I have a web java/jsp application running

How to create a Lucene in-memory index at webapp deployment time

2012-09-06 Thread Kasun Perera
I have a web java/jsp application running on Apache Tomcat server. In this web application I have used lucene, to index and calculate similrarity between some PDF documents(PDF documents are in the database). My live server dosent allow web-app to access files, so I have created the in-memory lucen