querying without hits

2008-10-13 Thread David Massart
Dear all, Could one of you point me to an example of code for querying without using the deprecated class Hits ? Thank you, David

RE: Detecting why a collection of documents matched a query

2008-10-13 Thread Michael Garski
I've seen this question come up a few times on the list in the past with the potential solutions of: 1. Parsing out the results of the Explain() method 2. Perform a regex on the data post-search to determine which field contained the match 3. Searching each field independently and removing duplic

Re: Searching sets of documents

2008-10-13 Thread 叶双明
I don't understand your problem? do you index file but want to search folder which contain the files? when you want to search folder, you can index folder, the data is all files under it. 2008/10/13 <[EMAIL PROTECTED]> > The docs are already indexed. > > > -Original Message- > > From: ??

Custom Sorting Based on Input Value

2008-10-13 Thread Ravis
Hi, I have a sorting requirement where I need to bubble up documents exactly matching of particular value passed in sort criteria. For example: Hypothetically, Say I am sorting on field A. I want all values matching value '5' on top and then regular sorting for other values. So I would like to

Re: search with accent not match

2008-10-13 Thread lekamm
Does this : http://www.blardone.org/2008/10/12/lucene-query-accented-character/ solve your problem ? Cheers, lekamm Christophe from paris wrote: > > Hello > > I'm use FrenchAnalyzer for index > > IndexWriter writer = new IndexWriter(pathOfIndex, new FrenchAnalyzer(), > true); > Document

Re: custom tag scoring question

2008-10-13 Thread Chris Hostetter
You'll probably want to take a look at using Payloads and BoostingTermQuery ... i believe the combination of the two will solve your problem perfectly -- just set the payload to be the "relevancy" value for your entity. -Hoss

Re: Wildcard query ...

2008-10-13 Thread Chris Hostetter
BooleanQuery picks a Scorer based on the number of clauses and what their options are ... all of teh scorers it might pick from are smart enough to continuously reorder the clauses having them "skip ahead" to the next document they match, beyond whatever docIds it already knows can't match (ba

Re: is there an histogram feature in lucene ak Magelan

2008-10-13 Thread Julien Nioche
Hi Thomas, Have a look at SOLR (*lucene.apache.org/solr*). It is based on Lucene and provides additional functionalities including faceted search. Best, Julien 2008/10/13 Thomas Birnbaum <[EMAIL PROTECTED]> > hi... > > currently we are using an propetary search engine witch supports a > histor

is there an histogram feature in lucene ak Magelan

2008-10-13 Thread Thomas Birnbaum
hi... currently we are using an propetary search engine witch supports a historam. looks like this... if i search for audi a4 i get the search results including how mutch red blue or black cars are in the result result total 400 red 50 blue 50 black 100 green 200 private seller 50 commercial 35

highlighter / fragmenter performance for large fields

2008-10-13 Thread Beard, Brian
We index some documents which have an "all" field containing all of the data which can be searched on. One of the problems we're having is when this field is say 10Mbytes the highlighter takes about a second to calculate the best fragments. The search only takes 30 milliseconds. I've accomodated t

Re: Sorting posting lists before intersection

2008-10-13 Thread Renaud Delbru
Hi, Paul Elschot wrote: This could be done, but since not all scorers will be TermScorers it will be necessary to add a method to Scorer (or perhaps even to its DocIdSetIterator superclass): public abstract int estimatedDocFreq(); and implement this for all existing instances. TermScorer co

Re: Sorting posting lists before intersection

2008-10-13 Thread Renaud Delbru
Andrzej Bialecki wrote: Renaud Delbru wrote: Hi Andrzej, sorry for the late reply. I have looked at the code. As far as I understand, you sort the posting lists based on the first doc skip. The first posting list will be the one who have the first biggest document skip. Do the sparseness of

Re: Modification of positional information encoding

2008-10-13 Thread Renaud Delbru
Hi, Michael McCandless wrote: This looks right, though you would also need to modify SegmentMerger to read & write your new format when merging segments. Another thing you could do is grep for "omitTf" which should touch exactly the same places you need to touch. Ok, thanks for the pointers.

Re: Sorting posting lists before intersection

2008-10-13 Thread Paul Elschot
Op Monday 13 October 2008 17:00:06 schreef Andrzej Bialecki: > Renaud Delbru wrote: > > Hi Andrzej, > > > > sorry for the late reply. > > > > I have looked at the code. As far as I understand, you sort the > > posting lists based on the first doc skip. The first posting list > > will be the one who

Re: Sorting posting lists before intersection

2008-10-13 Thread Andrzej Bialecki
Renaud Delbru wrote: Hi Andrzej, sorry for the late reply. I have looked at the code. As far as I understand, you sort the posting lists based on the first doc skip. The first posting list will be the one who have the first biggest document skip. Do the sparseness of posting lists is a good p

Re: Modification of positional information encoding

2008-10-13 Thread Michael McCandless
Renaud Delbru wrote: Hi, We are trying to modify the positional encoding of a term occurrence for experimentation purposes. One solution we adopt is to use payloads to sotre our own positional information encoding, but with this solution, it becomes difficult to measure the increase or

Re: Sorting posting lists before intersection

2008-10-13 Thread Renaud Delbru
Hi Andrzej, sorry for the late reply. I have looked at the code. As far as I understand, you sort the posting lists based on the first doc skip. The first posting list will be the one who have the first biggest document skip. Do the sparseness of posting lists is a good predictor for sampling

Modification of positional information encoding

2008-10-13 Thread Renaud Delbru
Hi, We are trying to modify the positional encoding of a term occurrence for experimentation purposes. One solution we adopt is to use payloads to sotre our own positional information encoding, but with this solution, it becomes difficult to measure the increase or decrease of index size. It

Re: Question regarding sorting and memory consumption in lucene

2008-10-13 Thread Ganesh
Hello Mark, I am also facing the same sorting issue. In my case there will be only addition and deletion of data [no modification of existing records]. Whether i could rely on the indexed order of sorting. "SortField.FIELD_DOC" is the one helps to do sorting on indexed order? Regards Ganesh

Re: Single searcher vs Multi Searcher

2008-10-13 Thread Ganesh
Hello Anshum, My criteria to shard would be date. I am planning to maintain 10 days of data in one DB. I think the memory required for single searcher and multi searcher would be more or less the same. In your case you may perform search on one DB but you might have all searcher objects of all

RE: Searching sets of documents

2008-10-13 Thread spring
The docs are already indexed. > -Original Message- > From: ??? [mailto:[EMAIL PROTECTED] > Sent: Montag, 13. Oktober 2008 02:28 > To: java-user@lucene.apache.org > Subject: Re: Searching sets of documents > > all folders which match "A AND Y", do you search for file name? > If yes, A or

Is there a way to get numTokens via search?

2008-10-13 Thread Andrew Rimmer
I am doing a search using Lucene, and when I get the search results (hits), I want to be able to get the number of tokens in a certain field. Is this possible? Where is this sort of information stored? I know the IndexSearcher.Explain can get some information, but it seems mostly in free text and