Re: Help needed ordering search results

2009-10-01 Thread Karl Wettin
Not quite sure what you ask for, but I think you want to use a span near query (for adding boost to phrases) in a disjunction max query (to define weights of the different fields). karl 1 okt 2009 kl. 02.40 skrev mitu2009: Hi, I've 3 records in Lucene index. Record 1 contains healt

Help needed bubbling up relevant records with most recent date

2009-10-01 Thread mitu2009
Hi, I've got 5 records in Lucene index. a.Record 1 contains--tax analysis.Date field value is March 2009 b.Record 2 contains--Senior tax analyst.Date field value is Aug 2009 c.Record 3 contains--Senior tax analyst.Date field value is July 2009 d.Record 4 contains--tax analyst.Date field value

Efficiently reopening remotely-distributed indexes in 2.9?

2009-10-01 Thread Nigel
I have a question about the reopen functionality in Lucene 2.9. As I understand it, since FieldCaches are now per-segment, it can avoid reloading everything when the index is reopened, and instead just load the new segments. For background, like many people we have a distributed architecture wher

TermPositions with custom Tokenizer

2009-10-01 Thread Christopher Tignor
Hello, I have created a custom Tokenizer and am trying to set and extract my own positions for each Token using: reusableToken.reinit(word.getWord(),tokenStart,tokenEnd); later when querying my index using a SpanTermQuery the start() and end() tags don't correspond to these values but seem to co

Re: document diversity

2009-10-01 Thread Tricia Williams
Hi Mike, The first thing that comes to mind is to run a query for each document type (assuming that you have a field that stores the type) and qualify the document type: for example type:pdf. Then you would have to write something to combine the query results drawing an equal number of hits

Re: document diversity

2009-10-01 Thread Phil Whelan
Hi Mike, I'd simply store a field "doctype" with values "pdf", "txt", "html" and perform a separate search for each type. Although, I'd be interested if anyone has a cooler way of doing this. Cheers, Phil On Thu, Oct 1, 2009 at 9:56 AM, Michael Masters wrote: > I was wondering if there is any w

Re: Filtering on two date fields simultaneously

2009-10-01 Thread Dragan Jotanovic
Thanks, I will try NumberRangeQuery On Thu, Oct 1, 2009 at 4:01 PM, Grant Ingersoll wrote: > > On Sep 29, 2009, at 11:30 AM, Dragan Jotanovic wrote: > >> Hi, I was thinking a long time how to implement this kind of >> functionality but couldn't figure out anything appropriate. >> In my lucene doc

document diversity

2009-10-01 Thread Michael Masters
I was wondering if there is any way to control what kind of documents are returned from a search. For example, lets say we have an index built from different types of documents (pdf, txt, html, etc.). Is there a way to have the first x results have a specified distribution of document types? It wou

Re: [ANN] Luke 0.9.9 release

2009-10-01 Thread Andrzej Bialecki
Andrzej Bialecki wrote: Hi all, I'm happy to announce the new release of Luke - the Lucene Index Toolbox. There's a bug in this version in that it doesn't show TermVectors for a field. I'll fix it in a few days - I'm waiting for other potential bugs to show up. So if you find something that

Re: Filtering on two date fields simultaneously

2009-10-01 Thread Grant Ingersoll
On Sep 29, 2009, at 11:30 AM, Dragan Jotanovic wrote: Hi, I was thinking a long time how to implement this kind of functionality but couldn't figure out anything appropriate. In my lucene document, I have two date fields: start and end date. As a search input I have current date (NOW). I need t

Re: Implement SpanScorer on 2.9 lucene lib!

2009-10-01 Thread Mark Miller
Felipe Lobo wrote: > Here's the code: > -- > Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter(), new > QueryScorer(query)); > > highlighter.setTextFragmenter(new SimpleFragmenter(9)); > > String fieldName = "Title"; > > St

Re: Implement SpanScorer on 2.9 lucene lib!

2009-10-01 Thread Felipe Lobo
Here's the code: -- Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter(), new QueryScorer(query)); highlighter.setTextFragmenter(new SimpleFragmenter(9)); String fieldName = "Title"; String text = document.getField(fieldN

Re: Implement SpanScorer on 2.9 lucene lib!

2009-10-01 Thread Mark Miller
Felipe Lobo wrote: > Hi, thanks for the answer but it didn't work. > I stopped rewriting the query and used the queryscorer but it don't > highlight. > The part of the query i'm doing wildcard is the number part, like this: > "HC 100930027253" > The HC is hightlighted but the numbers aren't: > "Ha

How to test if an IndexReader is still open?

2009-10-01 Thread Chris Bamford
Hi, In an attempt to balance searching efficiency against the number of open file descriptors on my system, I cache IndexSearchers with a "last used" timestamp. A background cache manager thread then periodically checks the cache for any that haven't been used in a while and removes them from

RE: Pagination and Sorting

2009-10-01 Thread Uwe Schindler
Hi Anshum, That is exactly the same code he is using (only that he does not instantiate the collector; IndexSearcher.search(query, int) does exactly that internally :-) His problem was, that if offset+limit is large or Integer.MAX_VALUE that he runs out of memory. - Uwe Schindler H.-H.-Meie

Re: Pagination and Sorting

2009-10-01 Thread Anshum
@Christian : Which version of Lucene are you using? For lucene 2.9 this would work. *__code snippet__* IndexReader r = IndexReader.open("/home/anshum/index/indexname", true); IndexSearcher s = new IndexSearcher(r); QueryParser qp = new QueryParser("testfield",new StopAnalyzer()); Query q = qp.par

RE: Pagination and Sorting

2009-10-01 Thread Uwe Schindler
I forgot to mention: Because of this, e.g. even Google (who do not use Lucene :-]) does not let you go beyond a limit to a very large page number. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Uwe Schin

Re: How does the term infos file (.tis) works?

2009-10-01 Thread Michael McCandless
On Thu, Oct 1, 2009 at 8:21 AM, iron light wrote: > The reason is I wanna dig deeply. OK :) That's fun! > I just read the code. And found that  the index namespace (IndexWriter!) in > so tough for me. > Is there any document, resource or blog about the code? In general there's no separate doc

RE: Pagination and Sorting

2009-10-01 Thread Uwe Schindler
Hi Chris, > Uwe, > > > You are using TopDocs incorrectly. Normally you use *not* > Integer.MAX_VALUE, > > as the upper bound of your pagination window as numer of documents. So > if > > user wants to display documents 90 to 100, just set the number to 100 > docs. > > If the user then goes to docs

Re: Implement SpanScorer on 2.9 lucene lib!

2009-10-01 Thread Felipe Lobo
Hi, thanks for the answer but it didn't work. I stopped rewriting the query and used the queryscorer but it don't highlight. The part of the query i'm doing wildcard is the number part, like this: "HC 100930027253" The HC is hightlighted but the numbers aren't: "Habeas Corpus HC 100930027253 ES 10

Re: How does the term infos file (.tis) works?

2009-10-01 Thread iron light
Thanks, Mike. The reason is I wanna dig deeply. I just read the code. And found that the index namespace (IndexWriter!) in so tough for me. Is there any document, resource or blog about the code? IL On Thu, Oct 1, 2009 at 8:53 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > It's b

RE: Pagination and Sorting

2009-10-01 Thread Uwe Schindler
But a collector will not output the documents in sorted order... - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Anshum [mailto:ansh...@gmail.com] > Sent: Thursday, October 01, 2009 1:58 PM > To: java-use

Re: Pagination and Sorting

2009-10-01 Thread Christian Robert
Anshum, > You could get the hits in a collector and pass the sort to the > collector as it would be the collect function that handles the > sorting. > > searcherObject.search(query,collector); > > Hope that gives you some headway. :) Not quite (yet?) ;-) What do you mean by passing the Sort t

Re: Pagination and Sorting

2009-10-01 Thread Anshum
Hey Christian, Try what I wrote in the last reply. Would work absolutely fine. Have tested that for very large datasets. -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw On Thu,

Re: How does the term infos file (.tis) works?

2009-10-01 Thread Michael McCandless
It's better to use the TermEnum API (IndexReader.terms()) to step through the terms, than to directly access the raw file (unless you have some reason to do so...). Mike On Wed, Sep 30, 2009 at 6:29 AM, iron light wrote: > I try to traverse all the term text in one tis files. And it failed. the

Re: Pagination and Sorting

2009-10-01 Thread Christian Robert
Uwe, > You are using TopDocs incorrectly. Normally you use *not* Integer.MAX_VALUE, > as the upper bound of your pagination window as numer of documents. So if > user wants to display documents 90 to 100, just set the number to 100 docs. > If the user then goes to docs 100 to 110, just reexecute t

Re: Results of setting LogMergePolicy "calibrateSizeByDeletes=true"

2009-10-01 Thread Michael McCandless
Can you turn on IndexWriter's infoStream and post the resulting output? Enabling calibrateSizeByDeletes doesn't automatically mean that segments with many deletes will be merged. EG if your mergeFactor is high relative to the number of segments you have at each level, then no merging will take pl

Re: Pagination and Sorting

2009-10-01 Thread Anshum
You could get the hits in a collector and pass the sort to the collector as it would be the collect function that handles the sorting. searcherObject.search(query,collector); Hope that gives you some headway. :) -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here b

RE: Pagination and Sorting

2009-10-01 Thread Uwe Schindler
Hallo Chris, You are using TopDocs incorrectly. Normally you use *not* Integer.MAX_VALUE, as the upper bound of your pagination window as numer of documents. So if user wants to display documents 90 to 100, just set the number to 100 docs. If the user then goes to docs 100 to 110, just reexecute t

Pagination and Sorting

2009-10-01 Thread Christian Robert
Hello everybody, I'm looking at quite an interesting challenge right now, so I hope that somebody out there will be able to assist me. What I'm trying to do is returning search results both sorted and paginated. So far I haven't been able to come up with a working solution. Pagination without so

Re: Lucene 2.9 and performance of readers per segment.

2009-10-01 Thread Mark Miller
Per segment over many segments is actually a bit faster for none sort cases and many sort cases -but an optimized index will still be fastest - the speed benifit of many segments comes when reopening - so say for realtime search - in that case you may want to sac the opt perf for a segment

Lucene 2.9 and performance of readers per segment.

2009-10-01 Thread Marc Sturlese
Hey there, Until now when using Lucene 2.4 I was always optimizing my index using compound file after updating it. I was doing that because if not I could feel a lot performance loss in search responses. Now in Lucene 2.9 there are per segment readers and I have read something about it performes b

Why it doesn't work about IndexWriter deleteDocuments

2009-10-01 Thread Bon
Hi all, I've a problem about using IndexWriter#deleteDocuments to delete more then one document at once. the following is my code: Try 1: StringBuffer query_values = new StringBuffer(); query_values.append(UNIQUEID_FIELD_NAME); query_values.append(":(");