Re: How to pass additional information into Similarity.scorePayload(...)

2008-02-13 Thread Paul Elschot
Op Thursday 14 February 2008 02:11:24 schreef Cedric Ho: > I am using Lucene's Built-in query classes: TernQuery, PhraseQuery, > WildcardQuery, BooleanQuery and many of the SpanQueries. > > The info I am going to pass in is just some weightings for different > part of the indexed contents. For exa

Re: matching products with suggest feature

2008-02-13 Thread Shai Erera
Is this Speller class a Lucene class? I didn't find it in the main code stream, maybe it's part of contrib? Anyway, still it depends how it is implemented (OR or AND). For example, someone indexed a document with the word "abcde" and the index keeps the ngrams "abc", "bcd" and "cde". Then somebody

Re: How to pass additional information into Similarity.scorePayload(...)

2008-02-13 Thread Grant Ingersoll
The only Query that currently utilizes the scorePayload functionality is the BoostingTermQuery. I guess I would have a look at that as a starting point. On Feb 13, 2008, at 8:11 PM, Cedric Ho wrote: I am using Lucene's Built-in query classes: TernQuery, PhraseQuery, WildcardQuery, Boolean

Re: How to pass additional information into Similarity.scorePayload(...)

2008-02-13 Thread Cedric Ho
I am using Lucene's Built-in query classes: TernQuery, PhraseQuery, WildcardQuery, BooleanQuery and many of the SpanQueries. The info I am going to pass in is just some weightings for different part of the indexed contents. For example if the payload indicate that a term is in the 2nd paragraph, t

[OT][ANN] Nanoki

2008-02-13 Thread Petite Abeille
[Not even remotely related to Lucene, Java, Apache or anything] Nanoki, a sweet little wiki engine implemented in Lua [1]. http://alt.textdrive.com/nanoki/ Online demo: http://svr225.stepx.com:3388/nanoki Kind regards, PA. [1] http://www.lua.org/about.html -

Question .. advanced query

2008-02-13 Thread Abeba Tensai
Hi, I am trying to perform a query that will enable me to run the following logic against the index: "4 3 [>=7] [>=4]" .. [>=7] means the third number must be greater or equal to 7 and [>=4] means that the fourth number must be greater or equal to 4 .. I have reviewed the query syntax and it seems

Re: matching products with suggest feature

2008-02-13 Thread Cam Bazz
Hello Shai, The class that does the matching is Speller. It does not work query based but rather there is a method called - suggestSimilar(String word, int numSug); where the numSug is number of suggestions. The words are kept in the index as ngrams. For example abcde is kept as abc bcd cde. So th

RE: design: merging resultset from RDBMS with lucene search results

2008-02-13 Thread spring
The metadata is quite offen altered and there are millions of documents. Also document access is secured by complex sql statements which lucene might not support. So this is not an option I think. > -Original Message- > From: John Byrne [mailto:[EMAIL PROTECTED] > Sent: Mittwoch, 13. Febr

Re: design: merging resultset from RDBMS with lucene search results

2008-02-13 Thread John Byrne
Hi, You might consider avoiding this problem altogether, by simply adding the meta data to your Lucene index. Lucene can handle untokenized fields, which is ideal for meta data. It might not be as quick as the RDB, but you could perhaps optimize by only searching in the RDB when you only need

design: merging resultset from RDBMS with lucene search results

2008-02-13 Thread spring
Hi, I have the following scenario: RDBMS which contains the metadata for documents (ID, customer number, doctype etc.). Now I want to add fulltext search support. So I will index the documents content in lucene and add the documents ID as a stored field in lucene. Now somebody wants to search l

Re: matching products with suggest feature

2008-02-13 Thread Shai Erera
What is the default Operator of your QueryParser? Is it AND_OPERATOR or OR_OPERATOR. If it's OR ... then it's strange. If it's AND, then once you add more terms than what exists, it won't find anything. On Feb 13, 2008 6:54 PM, Cam Bazz <[EMAIL PROTECTED]> wrote: > Hello; > > I am trying to make

matching products with suggest feature

2008-02-13 Thread Cam Bazz
Hello; I am trying to make a product matcher based on lucene's ngram based suggest. I did some changes so that instead of giving the speller a dictionary I feed it with a List. For example lets say I have "HP NC4400 EY605EA CORE 2 DUO T5600 1.83GHz/512MB/80GB/12.1'' NOTEBOOK" and I index it with

Retrieving parsed query string terms

2008-02-13 Thread Cesar Ronchese
Hey, its me again :P I've been looking a way to retrieve all parsed terms from a given query string, but no success till now. For example, I need convert: word01 "word02 word03" AND word04 theField:(xyz) mod_date:[20020101 TO 20030101] (abc OR xyp) into: word01 "word02 word03"

Re: Lucene multiple field search performance

2008-02-13 Thread Erick Erickson
Have you looked at the query.toString()? In particular, is your date being split up into pieces on the slashes? But why it's working today, I have no clue. Unless you were seeing results on a freshly-opened reader yesterday Erick On Feb 13, 2008 7:12 AM, Cesar Ronchese <[EMAIL PROTECTED]> wr

Re: Stored Field vs "offset plus external file"?

2008-02-13 Thread Andrzej Bialecki
eks dev wrote: 2. We use Lucene Index wit MMAP directory now, so the concern is that index could grow too large for MMAP with stored field like that. Is there a way to say, "do not use MMAP Directory for stored Fields, rather FSDirectory". I think not, but it is worth to ask as I think this cou

Stored Field vs "offset plus external file"?

2008-02-13 Thread eks dev
I would like to try to replace our external storage of documents with Lucene stored field, so a few questions before we proceed: Background: We store currently complete documents in a simple binary file and only keep offsets into this file as a Stored field in Lucene index. Documents (compre

Re: Using Lucene 2.3.0 with PDFBox

2008-02-13 Thread Jan Peter Stotz
Naman Gupta schrieb: PDF Box uses a particular function of the Object 'Field' which is only there in the lucene 1.4.3. *Field.UnIndexed("path", file.getPath() ) This statement should be a good replacement: new Field("path", file.getPath(), Field.Store.YES, Field.Index.UN_TOK

Re: Lucene multiple field search performance

2008-02-13 Thread Cesar Ronchese
Yes, it is optimized already. But today, when I got to test again, its looks like quick. :S I can't understand why. Michael Stoppelman wrote: > > Did your index size increase drastically? > > As a first step I would recommend optimizing your index if you haven't > already. > > -M > > On

Re: How to pass additional information into Similarity.scorePayload(...)

2008-02-13 Thread Grant Ingersoll
Are you writing your own Query? What kind of info did you have in mind? scorePayload is called from the query scoring class, so I am not sure how you would pass in info to it unless you were writing your own Query class. -Grant On Feb 13, 2008, at 4:31 AM, Cedric Ho wrote: Hi all, My

Using Lucene 2.3.0 with PDFBox

2008-02-13 Thread Naman Gupta
Hey I am having problem using PDF Box for parsing the PDF and coverting them to Lucene Document using the following statement. Document doc = LucenePDFDocument.getDocument( file ); PDF Box uses a particular function of the Object 'Field' which is only there in the lucene 1.4.3.

RE: Inverted letters

2008-02-13 Thread Ulrich Vachon
Thank you for your responses. But i'm very surprised to see the FuzzyQuery used in my junit test work at 100%. I must only to determine if this simple algo work with full data. RAMDirectory directory = new RAMDirectory(); IndexWriter writer = new IndexWriter(directory, new MyAnalyze

How to pass additional information into Similarity.scorePayload(...)

2008-02-13 Thread Cedric Ho
Hi all, My problem is I have some additional weighting info that come with each search. And I need to take both the weighting info and the payload to calculate scores. So how do I access the weighting info in Similarity.scorePayload(String,byte[],int,int) ? I've thought about using a ThreadLocal,

Re: Retain the index

2008-02-13 Thread Michael McCandless
You could create your own SimpleFSLockFactory, specifying a different lock dir, and pass that to FSDirectory.getDirectory. This way you can control where the lock file is created and hopefully put it in a dir that you're able to delete from. Or, update to a newer version of Lucene, which

Re: Retain the index

2008-02-13 Thread anjana m
hey i am running my indexer on application server/ production server.. i cant delete the files.. please give me a good solution.. changing true and false is not working..if have add new docs.. can nay one help me..please... regards anjana On Jan 31, 2008 3:02 PM, Michael McCandless <[EMAIL PROT