Re: Indexing with SnowballAnalyzer and multiple languages in a single index

2006-04-25 Thread Daniel Noll
[EMAIL PROTECTED] wrote: You can have multiple languages in the same index. Just make sure that your language identification process is consistent. You might still get some false positives, for example, if there's a German root that has the same letters as a French root, but means something dif

Re: Alphanumeric model ids

2006-04-25 Thread Jeremy Hanna
Thanks Chris, it works like a champ now. I had thought I looked at the queries themselves with toString but in any case, the queries actually work now. I didn't realize that Lucene was customizable on so many levels - when you create the analyzer, when you create the index, when you perfo

performance differences between 1.4.3 and 1.9.1

2006-04-25 Thread RONALD MANTAY
Hi chaps , I ran the same search code with lucene-1.4.3.jar and then with lucene-core-1.9.1.jar The good news is there appeared to be a performance improvement with 1.9.1 both with single index searching both exact and fuzzy mode, However when searching muliple indexes with mul

Re: Alphanumeric model ids

2006-04-25 Thread Chris Hostetter
I bet that if you look at the toString() of the query you get back from your query parser, you'll see that the non numeric part numbers have been stemmed. You took the right steps when you indexed the field as UN_TOKENIZED, but at query time your query parser doesn't know about that -- take a loo

Alphanumeric model ids

2006-04-25 Thread Jeremy Hanna
I am trying to search by a number of fields including an alphanumeric model id. This is just the model id that comes from manufacturers. I've tried to use a StandardAnalyzer and a SnowballAnalyzer to index the data. Then I search with the associated analyzer using a MultiFieldQueryParse

Re: Lucene Eclipse Integration

2006-04-25 Thread Otis Gospodnetic
Also, this question may be better for one of the Eclipse groups, because Eclipse already uses Lucene for indexing (of help? code?), so they will be able to tell you how to integrate Eclipse and Lucene. Otis - Original Message From: Chris Hostetter <[EMAIL PROTECTED]> To: Lucene Users

Clucene document parsers

2006-04-25 Thread John Paige
Hello, Can somebody tell me what document parsers are available that can be used with CLucene? I know for lucene, XML->Text, pdf->Text, doc->Text, html->Text and RTF->Text all parsers are avilable. Have all of these been ported to CLucene? Thanks, John

Re: search problem

2006-04-25 Thread Chris Hostetter
: Problem: while there is a hit, only the timestamp and ip of the very first : line in the logfile are shown, but not the "matching" ip and timestamp later : in the logfile. Any suggestions how to get to the "right entries" ? It sounds like you are creating one Document per logfile, and then usin

Re: Lucene Eclipse Integration

2006-04-25 Thread Chris Hostetter
First off: i've changed the reply to be the [EMAIL PROTECTED] list ... that is the appropriate place to ask questions about using the Lucene APIs. Second: once you have a test index built, and you can do some test searches to verify it contains what you think it does (take a look at Luke to be su

Re: search problem

2006-04-25 Thread karl wettin
25 apr 2006 kl. 17.54 skrev April06: We indexed several logfiles which contain for example a timestamp, an ip and additional information (all defined as a field) all in one line. A logfile itself contains many of these lines. We used a BooleanQuery (timestamp / ip) to search for a ip betwe

Search algo for the postings ( or TermFreqs)

2006-04-25 Thread Prasenjit Mukherjee
Given a term "myterm", what kind of search algorithm lucene uses to get to the postings list(i.e. the term-frequency location in .frq file) ? From what I understood by looking into the lucene fileformat, is that it keeps the whole of .tii file in memory and and does a skipped linear search o

search problem

2006-04-25 Thread April06
We indexed several logfiles which contain for example a timestamp, an ip and additional information (all defined as a field) all in one line. A logfile itself contains many of these lines. We used a BooleanQuery (timestamp / ip) to search for a ip between a defined range of time. Problem: while

Re: Lucene, TREC, and WT10G

2006-04-25 Thread Grant Ingersoll
It is up to you to create a program to do this, but it is relatively easy. You may want to search the web, chances are someone has posted code to do this, as a number of people have used Lucene in TREC in the past. Good luck, Grant thanh nguyen wrote: Hi trupti, Thank for your response. I h

Re: Accessing fields during search

2006-04-25 Thread Yonik Seeley
On 4/25/06, Oskar Berger <[EMAIL PROTECTED]> wrote: > What is the most efficient approach to access field values during a > search? If you are implementing a hit collector, and want to know a field value for each document coming in, you can use the FieldCache if the field is indexed and not tokeni

Re: Lucene, TREC, and WT10G

2006-04-25 Thread thanh nguyen
Hi trupti, Thank for your response. I have another question. Whether Lucene can receive a topic file like " 1 abc def " and produce a result_file which we can use with trec_eval program (trec_eval relevant_file result_file , relevant_file is the judgement file of TREC for these topic) ?? Th

Accessing fields during search

2006-04-25 Thread Oskar Berger
Hello masters, What is the most efficient approach to access field values during a search? I am only interested in accessing a couple of fields for counting, and am thus not in need of storing the values as if sorting on the fields. Where to look? Regards, /oskar -