Re: external file stored field codec

2013-10-17 Thread Shai Erera
> > The codec intercepts merges in order to clean up files that are no longer > referenced > What happens if a document is deleted while there's a reader open on the index, and the segments are merged? Maybe I misunderstand what you meant by this statement, but if the external file is deleted, sin

Re: external file stored field codec

2013-10-17 Thread Michael Sokolov
On 10/13/13 8:09 PM, Michael Sokolov wrote: On 10/13/2013 1:52 PM, Adrien Grand wrote: Hi Michael, I'm not aware enough of operating system internals to know what exactly happens when a file is open but it sounds to be like having separate files per document or field adds levels of indirection

Re: How to get Total Result count using searchAfter approach

2013-10-17 Thread Michael McCandless
You can still use TopDocs.totalHits from searchAfter; that will be correct. Providing "Last" with searchAfter is not really possible; it's also somewhat strange (does anybody really use that?). Maybe you could reverse your sort, take page 1, reverse its hits? Mike McCandless http://blog.mikemcc

How to get Total Result count using searchAfter approach

2013-10-17 Thread raghavendra.k.rao
Hi, In my current implementation of Lucene 4.3 where there are millions of indexed records, I do a regular search() and get the topDocs.totalHits as the count of results. As part of this, I store all the results in the session and then let the user paginate through the results. With this, I am

RE: QueryParser stripping off Hyphen from query

2013-10-17 Thread raghavendra.k.rao
Ian - Thank you for your inputs. Regards, Raghu -Original Message- From: Ian Lea [mailto:ian@gmail.com] Sent: Tuesday, October 15, 2013 11:43 AM To: java-user@lucene.apache.org Subject: Re: QueryParser stripping off Hyphen from query If you want to keep hyphens you could try Whites

Re: Lucene in-memory index

2013-10-17 Thread Igor Shalyminov
Mike, For now I'm using just a SpanQuery over a ~600MB index segment single-threadedly (one segment - one thread, the complete setup is 30 segments with the total of 20GB). I'm trying to use Lucene for the morphologically annotated text corpus (namely, Russian National Corpus). The main query

Re: Detect index changes

2013-10-17 Thread Alice Wong
Mike, you are right. I used StringField, but id_to_delete has a typo and thus a mismatch. Still good to confirm the understanding is correct. Thanks for your helps. On Thu, Oct 17, 2013 at 3:54 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > Your understanding is correct, and afte

Re: Lucene in-memory index

2013-10-17 Thread Michael McCandless
DirectPostingsFormat holds all postings in RAM, uncompressed, as simple java arrays. But it's quite RAM heavy... The hotspots may also be in the queries you are running ... maybe you can describe more how you're using Lucene? Mike McCandless http://blog.mikemccandless.com On Thu, Oct 17, 2013

Re: Lucene in-memory index

2013-10-17 Thread Igor Shalyminov
Hello! I've tried two approaches: 1) RAMDirectory, 2) MMapDirectory + tmpfs. Both work the same for me (the same bad:( ). Thus, I think my problem is not disk access (although I always see getPayload() in the VisualVM top). So, maybe the hard part in the postings traversal is decompression? Are

Re: PhraseQuery boost doesn't affect ScoreDoc.score

2013-10-17 Thread Ian Lea
Boosting query clauses means more "this clause is more important than that clause" rather than "make the score for this search higher". I use it for biblio searching when want to search across multiple fields and want matches in titles to be more important than matches in blurbs.. Amended version

Re: Detect index changes

2013-10-17 Thread Michael McCandless
Your understanding is correct, and after reopen you should see the document deleted, so I'm not sure offhand why you aren't. BTW it's w.deleteDocuments not w.removeDocuments. And you don't need to commit in order to see changes in the reopened NRT reader (this is the whole point: commit is very c

Re: Optimizing Filters

2013-10-17 Thread Ian Lea
Yes, I think you should have a play. But on an index that is as realistic as you can make it - there may be variations in performance of the different queries and filters depending on term frequencies and loads of other stuff I don't understand. General point being simply that YMMV. -- Ian. On

Re: Search sentence from document based on keyword as input using lucene

2013-10-17 Thread Ian Lea
If you're using Solr you'd be better off asking this on the Solr list: http://lucene.apache.org/solr/discussion.html. You might also like to clarify what you want with regard to sentence vs document. If you want to display the sentences of a matched doc, surely you just do it: store what you need

Search sentence from document based on keyword as input using lucene

2013-10-17 Thread Avni Sompura
Hi Team, I have one requirement where i have to display sentences of valid document if the keyword(input string) is found in that document. I am thinking if parent-child relation will work? DocBean int doc_id String doc_path String content_id ContentBean int content_id String content; Need y