Re: Generalized proximity query performance

2007-10-07 Thread Chris Hostetter
: If I could intelligently rewrite queries, this would be better formulated : as: : title:"harry potter"~5 genre:books : : Instead, since I don't have that knowledge, I should perhaps rewrite several : guesses, and take the dismax. These guesses are equivalent to passing the right. okay. the b

Re: norms(String field, byte[] bytes, int offset)

2007-10-07 Thread Karl Wettin
7 okt 2007 kl. 19.26 skrev Michael McCandless: I guess we could change the code to only load up until the end of the byte array that's passed in, but, that weakens the error checking? Ie if the intent is to "load all norms", it's nice to catch the error (that you passed in a too-small byte arra

Re: Use of Field(String name, TokenStream tokenStream)

2007-10-07 Thread Chris Hostetter
: I am observing that a Field constructed using tokenStream i.e Filed fl = : new Field(String name, TokenStream tokenStream) is not converted to the : lower case when stored in the index. : The terms in the index are exactly same as those in tokenStream. : When I do a phrase search,the PhraseQu

Re: Help with Lucene Indexer crash recovery

2007-10-07 Thread Chris Hostetter
: That said, it should never in fact cause index corruption, as far as I : know. Lucene is "semi-transactional": at any & all moments you should : be able to destroy the JVM and the index will be unharmed. I would : really like to get to the bottom of why this is not the case here. At any point

Re: Group of documents.

2007-10-07 Thread Chris Hostetter
: The only solution that we have in our minds now is to have two indexes one : for articles and one for feeds. There are two problems with this approach : 1) redundancy this isn't really a "problem" a lucene index is designed to make searching fast, not to be a normalized data store -- there are

Re: got stuck in running lucene demo

2007-10-07 Thread Chris Hostetter
: java org.apache.lucene.demo.IndexFiles {full-path-to-lucene}/src : : i dont know from which directory i have to execute this command. it doesn't matter which directory you run it from -- that's why it says "{full-path-to-lucene}". as long as you set your CLASSPATH up like the previous step

Re: norms(String field, byte[] bytes, int offset)

2007-10-07 Thread Yonik Seeley
On 10/7/07, Michael McCandless <[EMAIL PROTECTED]> wrote: > Actually, MultiReader & MultiSegmentReader make use of this method, to > load all norms from each sub-reader into a single byte array. Right. Karl, see MultiSegmentReader.norms(String field) for how this method is used. You want to be a

Use of Field(String name, TokenStream tokenStream)

2007-10-07 Thread Developer Developer
Hello Frens, I am observing that a Field constructed using tokenStream i.e Filed fl = new Field(String name, TokenStream tokenStream) is not converted to the lower case when stored in the index. The terms in the index are exactly same as those in tokenStream. When I do a phrase search,the Phras

Re: norms(String field, byte[] bytes, int offset)

2007-10-07 Thread Michael McCandless
Actually, MultiReader & MultiSegmentReader make use of this method, to load all norms from each sub-reader into a single byte array. I guess we could change the code to only load up until the end of the byte array that's passed in, but, that weakens the error checking? Ie if the intent is to "loa

Re: norms(String field, byte[] bytes, int offset)

2007-10-07 Thread Karl Wettin
7 okt 2007 kl. 13.47 skrev Michael McCandless: I think the intention of that method is to load all norms for that reader into the byte array, so I think it's reasonable that an exception is thrown if you provide a byte array that's too small. Hmm, OK. But I don't understand why there is s

Re: Lucene newbee quesiton- Term Positions

2007-10-07 Thread Developer Developer
Hi Eric, Thanks for the quick reply.My index does not return any hits when i search for certain phrases . I am very sure that the indexed documents does have those phrases in them. Therefore i want to just list all the terms and their postions for given document just to make sure that the ind

Re: Lucene newbee quesiton- Term Positions

2007-10-07 Thread Karl Wettin
7 okt 2007 kl. 18.38 skrev Erick Erickson: I suspect that this is more work than you think, not to mention very slow. This is just due to the nature of an inverted index To see what I mean, get a copy of Luke and have it reconstruct one of your documents and you'll see what the performanc

Re: Lucene newbee quesiton- Term Positions

2007-10-07 Thread Erick Erickson
I suspect that this is more work than you think, not to mention very slow. This is just due to the nature of an inverted index To see what I mean, get a copy of Luke and have it reconstruct one of your documents and you'll see what the performance is like. I think Luke has all the example cod

Lucene newbee quesiton- Term Positions

2007-10-07 Thread Developer Developer
Hello, I have simple lucene 2.2 index created. I want to list all the terms and their positions in a document. how can I do it ? Can you please provide some sample code. Thanks !

Re: Group of documents.

2007-10-07 Thread Yonik Seeley
On 10/6/07, Raghu Ram <[EMAIL PROTECTED]> wrote: > But then how can i search for feeds ??? I'm not quite sure what you mean by "search for feeds"... but assuming you want a list of feeds that contain articles with the search terms, you could do faceting on the "feeds" field. That would let you kn

Re: norms(String field, byte[] bytes, int offset)

2007-10-07 Thread Michael McCandless
I think the intention of that method is to load all norms for that reader into the byte array, so I think it's reasonable that an exception is thrown if you provide a byte array that's too small. Though maybe it would be friendlier to throw an IllegalArgumentException that says "the byte array is

Re: Group of documents.

2007-10-07 Thread Alf Eaton
Make a separate index of feeds? alf Raghu Ram wrote: > But then how can i search for feeds ??? > > On 10/6/07, Alf Eaton <[EMAIL PROTECTED]> wrote: >> Raghu Ram wrote: >>> Hi, >>> We have an application in which we want to index feeds. Each feed >> is a >>> collection of articles and some