Re: setting position value at indexing time

2007-11-19 Thread John Wang
Yes, I am. The UID example Michael gave provides a way for us not to branch from lucene code base. I am trying to improve on it by storing the uid using position (since position info is not used for ids) which would buy use in load time quite a bit. -John On Nov 19, 2007 4:28 PM, Yonik Seeley <

Re: setting position value at indexing time

2007-11-19 Thread Yonik Seeley
On Nov 19, 2007 6:39 PM, John Wang <[EMAIL PROTECTED]> wrote: > oh, is there a way of opening that? Well, you can keep track of position increments yourself and then choose the correct position increment so that the position you want is indexed. AFAIK, positions increments must be positive , so y

Re: setting position value at indexing time

2007-11-19 Thread John Wang
oh, is there a way of opening that? In the UID example Mike gave, it seems that uid can be stored in the position part of the data. It would be very efficient in both load time and index size to be able to do that. Thanks -john On Nov 19, 2007 1:22 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: >

Re: Scoring for all the documents in the index relative to a query

2007-11-19 Thread Paul Elschot
Gentlefolk, Well, the javadocs as patched at LUCENE-584 try to change all the cases of zero scoring to 'non matching'. I'm happily bracing for a minor conflict with that patch. In case someone wants to take another look at the javadocs as patched there, don't let me stop you... Regards, Paul Els

Re: Time of processing hits.doc()

2007-11-19 Thread Haroldo Nascimento
In the sample TestSort represents my problem: In the info below I need get the list of "contents" that contains "x" (A,C,E,G,I) and other list of index (5,2,3) that not contain info replicated. The first list I get using any query of type: query = new TermQuery (new Term ("contents", "x"))

Re: Scoring for all the documents in the index relative to a query

2007-11-19 Thread Yonik Seeley
On Nov 19, 2007 5:03 PM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > (I'm not actually sure how the Hits class treats negative values All Lucene search methods except ones that take a HitCollector filter out final scores <= 0 Solr does allow scores <=0 through since it had different collection m

Re: Scoring for all the documents in the index relative to a query

2007-11-19 Thread Chris Hostetter
: My simplistic view has been that all the docs returned via Hits : or HitCollector have scores > 0, and all the rest have scores of 0, : and this view is supported by the explanation of : HitCollector.collect : : " Called once for every non-zero scoring document, with the : document number and i

Re: setting position value at indexing time

2007-11-19 Thread Yonik Seeley
On Nov 19, 2007 4:14 PM, John Wang <[EMAIL PROTECTED]> wrote: >What is the right way of setting customized position value on a > token at indexing time. You set the positionIncrement, and the lucene indexing code determines the absolute position. You can't set an absolute position yourself.

setting position value at indexing time

2007-11-19 Thread John Wang
Hi: What is the right way of setting customized position value on a token at indexing time. Thanks -John - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Time of processing hits.doc()

2007-11-19 Thread Haroldo Nascimento
German, How would be it ? You have 2 index ?. One for seach main (keyword) and other for location ? You do 2 search, The first is the search main e the second is the search location ,but insert the filter. What type of Filter do use ? I have the bitset of search main (keyword), but I

Re: neither IndexWriter nor IndexReader would delete documents

2007-11-19 Thread flateric
Hallo Daniel; the number returned by delete is 0, but the "uid" shows up in Luke so it is there. I close the reader after every delete and then re-open it for the next delete (see my code snippets below). Eric Daniel Naber-10 wrote: > > On Sonntag, 18. November 2007, flateric wrote: > >> Ind

Re: Scoring for all the documents in the index relative to a query

2007-11-19 Thread HAIDUC SONIA
Thank you guys for your prompt answers. I'm a beginner with Lucene and I still had some unclarities regarding its scoring function. Your answers really cleared things up for me. I guess a direct comparison with LSI is not possible after all, only the comparison between LSI and the pure VSM. Th

Re: Scoring for all the documents in the index relative to a query

2007-11-19 Thread Donna L Gresh
I could be mistaken, but I think the earlier answer was right; a document with no terms matching has a score of 0, so you can assume that all documents NOT returned by the query have a score of 0. If you look at the scoring formula on this page, it is hard to see how you can get a negative scor

Re: Scoring for all the documents in the index relative to a query

2007-11-19 Thread Grant Ingersoll
Lucene only scores those documents that have at least one match term, it doesn't implement a pure vector space model whereby all documents are scored (it uses a combination of the Boolean Model and VSM). Thus, I am not sure you can do a pure comparison. I suppose you could simulating the

Re: Scoring for all the documents in the index relative to a query

2007-11-19 Thread Michael Busch
Hi Sonia, I agree with Erick here. Negative scores don't make sense and Lucene never computes scores for documents that don't match a query. E. g. if your query is: "term1 OR term2", then every document that contains term1 or term2 or both will have a score greater than 0. But if two docs don't c

Re: Scoring for all the documents in the index relative to a query

2007-11-19 Thread HAIDUC SONIA
I am trying to order all the documents in the index according to their similarity to a given query. I am interested in having a complete list of *all* the documents in the index with their score. From what I understood by reading some documentation, Lucene internally assigns scores to all the do

Re: Scoring for all the documents in the index relative to a query

2007-11-19 Thread HAIDUC SONIA
I am trying to order all the documents in the index according to their similarity to a given query. I am interested in having a complete list of *all* the documents in the index with their score. From what I understood by reading some documentation, Lucene internally assigns scores to all the do

Re: Scoring for all the documents in the index relative to a query

2007-11-19 Thread Erick Erickson
Could you explain a bit more what problem you're trying to solve? The reason I ask is that your question doesn't make sense to me, since I have no idea what you expect by the term "negative score". My simplistic view has been that all the docs returned via Hits or HitCollector have scores > 0, and

Re: neither IndexWriter nor IndexReader would delete documents

2007-11-19 Thread Erick Erickson
Also, are you re-opening the reader underlying your *searcher* before you query and still get the deleted docs? Also, look with Luke to see if the specific uid you *think* you've deleted is really gone. Best Erick On Nov 19, 2007 6:42 AM, Daniel Naber <[EMAIL PROTECTED]> wrote: > On Sonntag, 18

Re: Time of processing hits.doc()

2007-11-19 Thread German Kondolf
A facet is a group condition, could be a single value of the doc or a set of filters. On Nov 19, 2007 1:09 PM, Haroldo Nascimento <[EMAIL PROTECTED]> wrote: > German, > > When You said: > "I collect every facet's bitset ... " > what is a facet ? Is there the each option of filter of your site ?

Re: Time of processing hits.doc()

2007-11-19 Thread German Kondolf
I have already defined a Lucene Filter for every "id" of "ubicacion". I just create the bitset for every value, and count it against the result. One possible optimization is to read the terms of the field you're trying to "group", that's the optimization we'll be working soon on our app. I never

Re: Time of processing hits.doc()

2007-11-19 Thread Haroldo Nascimento
German, When You said: "I collect every facet's bitset ... " what is a facet ? Is there the each option of filter of your site ? How you get the every facets ? On Nov 19, 2007 1:05 PM, Haroldo Nascimento <[EMAIL PROTECTED]> wrote: > German, > > What I need is similar to the your site > ht

Re: Time of processing hits.doc()

2007-11-19 Thread Haroldo Nascimento
German, What I need is similar to the your site http://listados.deremate.com.ar/panaderia . I have many results of search, but I show any result (for example: first 10 for first page) , but for create the options of filter of location I need read all results fof search. The problem of performa

Scoring for all the documents in the index relative to a query

2007-11-19 Thread HAIDUC SONIA
Hi everyone, I am trying to obtain the score for each document in the index relative to a given query. For example, if I have the query "search file", I am trying to get the list of all documents in the index and their scores relative to the given query. I tried first using Hits, which gave me

Re: Time of processing hits.doc()

2007-11-19 Thread Grant Ingersoll
I think, based on your previous question, that you just need to use the search() method that returns TopDocs, not the lower-level HitCollector method. From the TopDocs, you can then access the ScoreDoc, which will give you info about the doc and the score. See http://www.lucenebootcamp.com/

Re: Time of processing hits.doc()

2007-11-19 Thread German Kondolf
Why do you need the doc's info? If you're grouping you may not need detail on each group condition. Here is a sample of faceted (grouped) search: http://listados.deremate.com.ar/mp3 (Sorry, it's in spanish) Simply I collect every facet's bitset and intersect it against the result's bitset (keywo

Re: Problem in Running Lucene Demo

2007-11-19 Thread Doron Cohen
Try "java -verbose" to see more info on class loading. Also try "java -classpath=yourClassPath" from command line. Note that separators in the classpath may differ between operating systems - e.g. ";" in Windows but ":" in Linux... Doron Liaqat Ali <[EMAIL PROTECTED]> wrote on 19/11/2007 15:43:30

Re: Time of processing hits.doc()

2007-11-19 Thread Haroldo Nascimento
Mark, How I can get the information of Document. I think that is in the implementation do method abstract collect. How I can get it . Below is the example of javadoc the Lucene. Searcher searcher = new IndexSearcher(indexReader); final BitSet bits = new BitSet(indexReader.maxDoc()); se

Re: Lucene Setting

2007-11-19 Thread Grant Ingersoll
Ah, I see. This just means change into the directory via the command line where you unpacked the installation. HTH, Grant On Nov 19, 2007, at 8:34 AM, Liaqat Ali wrote: I m new to lucene and want to clear about some questions. When I unpacked the Lucene, which i downloaded from Apache site

RE: Lucene Setting

2007-11-19 Thread Chhabra, Kapil
Liaqat, What exactly are you looking for? Are you sure you want to build the source of lucene and then use it? Alternatively you could simply use the lucene jar file (ie. already built for you) and start playing around with it. This jar file is bundled in the archive that you might have downloaded.

Problem in Running Lucene Demo

2007-11-19 Thread Liaqat Ali
Hi All, I m new to Lucene. I m facing problem while running the Lucene Demo to index lucene src code. I download the 2.1.0 version of Lucene and extracted it binary to C:\lucene-2.1.0. I also set up the CLASSPATH to Lucene-Core and Lucene Demo Jar files. But when i execute the following co

Lucene Setting

2007-11-19 Thread Liaqat Ali
I m new to lucene and want to clear about some questions. When I unpacked the Lucene, which i downloaded from Apache site. I ran the Build.txt file and there are five steps to set up lucene. Lucene Build Instructions $Id: BUILD.txt 476955 2006-11-19 22:28:41Z hossman $ Basic steps: 0) Instal

Re: Lucene setting

2007-11-19 Thread Grant Ingersoll
Can you provide more details? Are you actually using Lucene or some third party product that uses Lucene? What steps did you take to get this? -Grant On Nov 19, 2007, at 5:42 AM, Liaqat Ali wrote: Hi All, Can some explain to me this line. I encounter this line while setting up Lucene

Re: neither IndexWriter nor IndexReader would delete documents

2007-11-19 Thread Daniel Naber
On Sonntag, 18. November 2007, flateric wrote: > IndexReader ir = IndexReader.open(fsDir); > ir.deleteDocuments(new Term("uid", uid)); > ir.close(); > > Has absolutely no effect. What number does ir.deleteDocuments return? If it's 0, the uid cannot be found. If it's > 0: note that you need to re

Re: neither IndexWriter nor IndexReader would delete documents

2007-11-19 Thread flateric
Hallo Daniel; thank you for your quick reply. The "uid" field exists (UN_TOKENIZED and stored). The IndexWriter is also closed while I'm using the IndexReader to delete. Thanks, Eric Daniel Naber-10 wrote: > > On Sonntag, 18. November 2007, flateric wrote: > >> Has absolutely no effect. I al

RE: neither IndexWriter nor IndexReader would delete documents

2007-11-19 Thread flateric
Hallo Kapil; thanks for your quick answer: * "An IndexReader can be opened on a directory for which an IndexWriter is opened already, but it cannot be used to delete documents from the index then. " sounded like a match, but I checked that and the IndexWriter is definitely closed. Regards, Eric

Re: Time of processing hits.doc()

2007-11-19 Thread German Kondolf
You sould never use the hits for other use than retrieving a group of results (usually a page of 10-20-30 docs). You could see Apache Solr's implementation of faceted search. I've use that code as a guide to group & count diferent facets (or conditions, fields as you wanna call it), is pretty fast

Lucene setting

2007-11-19 Thread Liaqat Ali
Hi All, Can some explain to me this line. I encounter this line while setting up Lucene... Connect to the top-level of your Lucene installation Kindly guide me in this regard. Liaqat Ali - To unsubscribe, e-mail: [EMAIL

Re: XML parsing using Lucene in Java

2007-11-19 Thread Catalin Mititelu
Hi Fayyaz, I recommend to use SAX or, maybe, a custom parser for large xml files .It should be faster than using Digester. The main difference between those xml parsers is that Digester needs to load the entire xml document in memory when it creates those objects, meanwhile you can parse the doc