RE: Getting Payload from Hits

2008-02-11 Thread Allahbaksh Mohammedali Asadullah
Hi, Karl thanks for the reply. But I am not able follow you. Should I extend Query class and How should I get matching term. Can you please elaborate on it. Regards, Allahbakhs -Original Message- From: Karl Wettin [mailto:[EMAIL PROTECTED] Sent: Monday, February 11, 2008 9:53 PM To: ja

Re: Lucene syntax query matched against a string content

2008-02-11 Thread Nilesh Bansal
Excellent. MemoryIndex solves the problem. I didn't knew about this index. Thanks. -Nilesh On Feb 8, 2008 8:23 AM, Erick Erickson <[EMAIL PROTECTED]> wrote: > You might want to check out MemoryIndex before rejecting putting a single > doc in memory and searching against it. It's quite fast, altho

Re: update field boost

2008-02-11 Thread Chris Hostetter
: I read the doc for the api indexreader.setNorm() after I posted the question : earlier. To use that setNorm() to modify the field boost, it seems to me that : one has to know how the boost is folded to the norm (in the default impl, it's : boost* lengthNorm) and has to know the old norm value wh

Re: update field boost

2008-02-11 Thread yu
thanks, Hoss! I read the doc for the api indexreader.setNorm() after I posted the question earlier. To use that setNorm() to modify the field boost, it seems to me that one has to know how the boost is folded to the norm (in the default impl, it's boost* lengthNorm) and has to know the old norm

Re: update field boost

2008-02-11 Thread Chris Hostetter
: It's clear that there is no easy way to do "in-place" doc update in the lucene : index, but I think it should be theoretically possible to update the field and : doc boostings in place, that is, without deleting and re-adding the doc and : it's fields. Does anyone know how? boosts are folded in

Re: How to promote an unstemmed match over a stemmed match in an index that's stemmed...

2008-02-11 Thread Jake Mannix
The way I've always done this was to index two fields: say, "contents" and "contents_unstemmed", (using a PerFieldAnalyzer) and then query on both of them. This has the double effect of a) boosting unstemmed hits, because every unstemmed match is also a stemmed one, so the BooleanQuery combining

Re: Indexing accented characters, then searching by any form

2008-02-11 Thread Cesar Ronchese
Well, it is done now. As final result, I surrended myself to "double-storing". This way, I have indexed the original text with COMPRESSED option to save some space. And to highlight the results correctly, I made some matching between unaccented-words and original words by regular expressions, an

corruption issue with 2.3 and autoCommit=false

2008-02-11 Thread Michael McCandless
Heads up! We are working through what looks like an index corruption issue when you use autoCommit=false with IndexWriter, in Lucene 2.3, so please try to avoid doing so if you can... Details are here: https://issues.apache.org/jira/browse/LUCENE-1173 Mike --

update field boost

2008-02-11 Thread Jay
Hi, It's clear that there is no easy way to do "in-place" doc update in the lucene index, but I think it should be theoretically possible to update the field and doc boostings in place, that is, without deleting and re-adding the doc and it's fields. Does anyone know how? Thanks! Jay -

RE: Delete problems O.O

2008-02-11 Thread Cesar Ronchese
Cool man. The Hits.id(int) worked fine. Thanks for the detailed info. And hopefully your answer is going to usefull for future Google searches. ;) Cesar Steven A Rowe wrote: > > Hi Cesar, > > On 02/11/2008 at 2:19 PM, Cesar Ronchese wrote: >> I'm running problems with document deletion.

Re: Distributed Indexes

2008-02-11 Thread Ruslan Sivak
Basically the index is big is because there is a large number of documents, but each individual document is very small. There is also a lot of redundancy, which, I believe is also why the index size is fairly small. Basically I am using the index to store the user's profile information, and

Re: Indexing accented characters, then searching by any form

2008-02-11 Thread Petite Abeille
On Feb 11, 2008, at 4:00 PM, Cesar Ronchese wrote: For example: Indexed word: usuário Terms typed by the user, to find the word above: usuário or usuario or usuãrio, etc. If you feel ambitious, you can try something along the lines of Sean M. Burke's Unidecode!: http://interglacial.com/~s

RE: Delete problems O.O

2008-02-11 Thread Steven A Rowe
Hi Cesar, On 02/11/2008 at 2:19 PM, Cesar Ronchese wrote: > I'm running problems with document deletion. > [...] > This simply doesn't delete anything from the Index. > > //see the code sample: > //"theFieldName" was previously stored as Field.Store.YES and > Field.Index.TOKENIZED. > Term t =

Re: Indexing accented characters, then searching by any form

2008-02-11 Thread Cesar Ronchese
Ops! Found a situation here Karl: If the content is stored without accents, everything is OK. But, as my content is stored with accents, and I noticed the ISOFilter just removes the accent from the search terms, it is not returning to my Hits collection. Any idea how to fix it? -- View this mes

Re: Indexing accented characters, then searching by any form

2008-02-11 Thread Cesar Ronchese
Woot, Karl. It worked like a charm! It even worked with the Highlighter. THANKS! karl wettin-3 wrote: > > > 11 feb 2008 kl. 18.16 skrev Cesar Ronchese: > >> I don't know how to set that filter to Query object. > > It is a TokenStream you filter, not the Query. In your case the > TokenStr

Delete problems O.O

2008-02-11 Thread Cesar Ronchese
Hey All. I'm running problems with document deletion. I tried to use DeleteDocuments() and DeleteDocument() methods, both are with problems, according explained below: 1) DeleteDocuments(term) This simply doesn't delete anything from the Index. //see the code sample: //"theFieldName" was pre

Re: How to promote an unstemmed match over a stemmed match in an index that's stemmed...

2008-02-11 Thread Michael Stoppelman
Ah, very cool. Thanks for the tip. -M On Feb 11, 2008 10:58 AM, Erick Erickson <[EMAIL PROTECTED]> wrote: > You have to bet a bit clever. You can certainly inject the original with > an > increment of 0. See SynonymAnalyzer in Lucene In Action. This will not > break phrase queries since your two

Re: How to promote an unstemmed match over a stemmed match in an index that's stemmed...

2008-02-11 Thread Erick Erickson
You have to bet a bit clever. You can certainly inject the original with an increment of 0. See SynonymAnalyzer in Lucene In Action. This will not break phrase queries since your two tokens occupy the same position. But you'll have to do something like add a $ to the original at index time. That w

Re: Indexing accented characters, then searching by any form

2008-02-11 Thread Karl Wettin
11 feb 2008 kl. 18.16 skrev Cesar Ronchese: I don't know how to set that filter to Query object. It is a TokenStream you filter, not the Query. In your case the TokenStream is produced by the QueryParser invoking analyzer.tokenStream(field, new StringReader(input)). So what you have to

Re: Indexing accented characters, then searching by any form

2008-02-11 Thread Erick Erickson
See below... On Feb 11, 2008 12:17 PM, Cesar Ronchese <[EMAIL PROTECTED]> wrote: > > Hey, Erick. You inferred right. > > I analized your code and it looks like a common Indexing and Searching > code. > Are you sure you pasted the correct code? :P > Did you try to run it? It's just a self-contai

How to promote an unstemmed match over a stemmed match in an index that's stemmed...

2008-02-11 Thread Michael Stoppelman
Hi all, I've got an index with tokens that are stemmed. Sometimes I really need to boost the unstemmed version of a query word to get the most relevant documents. Example: Query: [olives]. I don't want to match documents with the words: oliver, oliver's, etc... Since I'm stemming when creating t

Re: Indexing accented characters, then searching by any form

2008-02-11 Thread Cesar Ronchese
Hey, Erick. You inferred right. I analized your code and it looks like a common Indexing and Searching code. Are you sure you pasted the correct code? :P Anyways, is the concept about doubling storing data, one content with accents and other without? If yes, I did it earlier, but once I search i

Re: Indexing accented characters, then searching by any form

2008-02-11 Thread Cesar Ronchese
> One more thing, > are you aware of that you are supposed to apply that filter on the > query too? I don't know how to set that filter to Query object. I've searched to see if it is possible, but I can't find references. If it is possible, do you have a quick example? I'm searching this way:

Re: Getting Payload from Hits

2008-02-11 Thread Grant Ingersoll
Right now there is not a good way, other than to use the TermPositions. See https://issues.apache.org/jira/browse/LUCENE-1001 for some thoughts on adding the ability. Unfortunately, I ran into a roadblock, and haven't been able to get back to it. If you feel you can submit a patch, it w

Re: Distributed Indexes

2008-02-11 Thread Ruslan Sivak
Cedric Ho wrote: On Feb 9, 2008 12:07 AM, Ruslan Sivak <[EMAIL PROTECTED]> wrote: The app does other things then search the index. I'm basically using ColdFusion for the website and have four instances running on two servers for load balancing. Each app does the searches, and the search tim

Re: Getting Payload from Hits

2008-02-11 Thread Karl Wettin
You would have to collect the payloads from matching terms by extending a query. See this recent thread: http://www.nabble.com/Faceting-with-payloads-td15322956.html#a15322956 Are you sure this is what you want to do? What is it you store in the payloads, and how do you plan to use this info

Re: Indexing accented characters, then searching by any form

2008-02-11 Thread Cesar Ronchese
Hey Karl. Thanks for the response. I have some doubts more: 1) About the ISOLatin1AccentFilter class: > What is the problem you have with this? Are they not unique enough? I need to store the words in the way it was written. So, if the text to be indexed contains the word "usuário", my user expe

Re: Indexing accented characters, then searching by any form

2008-02-11 Thread Erick Erickson
I'm inferring that you need the original text for display purposes or some such, but want to search a "canonical" form. So the following may be totally irrelevant if my inference is wrong. Indexed and stored are two very distinct things in Lucene. If you create a field that is both stored and

RE: Getting Payload from Hits

2008-02-11 Thread Allahbaksh Mohammedali Asadullah
Hi, Thanks for the reply. But is there any way that from the search result I can get Payload. See my requirement is when user search for some field I want to display also additional data which is stored as Payload. Regards, Allahbaksh -Original Message- From: Karl Wettin [mailto:[E

IndexDeletionPolicy

2008-02-11 Thread Robert . Hastings
Has anyone contributed an IndexDeletionPolicy that has been tested on an NFS system? Bob Hastings Ancept Inc.

Re: Getting Payload from Hits

2008-02-11 Thread Karl Wettin
11 feb 2008 kl. 14.46 skrev Allahbaksh Mohammedali Asadullah: d.add(new Field("f1", "This field has no payloads", Field.Store.NO, Field.Index.TOKENIZED)); d.add(new Field("f2", "This field has payloads in all docs", Field.Store.YES, Field.Index.TOKENIZED)); Document doc = hits.doc(i); He

Re: Indexing accented characters, then searching by any form

2008-02-11 Thread Karl Wettin
11 feb 2008 kl. 16.08 skrev Karl Wettin: All I could find is about the ISOLatin1AccentFilter class, which as far I could understand, it just removes the accented chars so I can store it in its unaccented form. What is the problem you have with this? Are they not unique enough? One more

Re: Indexing accented characters, then searching by any form

2008-02-11 Thread Karl Wettin
11 feb 2008 kl. 16.00 skrev Cesar Ronchese: Hello, guys. I've searching the google to make the lucene performs accent- insensitive searches. All I could find is about the ISOLatin1AccentFilter class, which as far I could understand, it just removes the accented chars so I can store it

Re: large term vectors

2008-02-11 Thread Karl Wettin
http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/document/Field.Index.html#NO_NORMS ? 11 feb 2008 kl. 15.55 skrev <[EMAIL PROTECTED]>: Hi Grant, Lucene 2.2.0 I'm not actually explicitely storing term vectors. It seems the huge amount of byte arrays is actually coming from SegmentR

Indexing accented characters, then searching by any form

2008-02-11 Thread Cesar Ronchese
Hello, guys. I've searching the google to make the lucene performs accent-insensitive searches. All I could find is about the ISOLatin1AccentFilter class, which as far I could understand, it just removes the accented chars so I can store it in its unaccented form. What I would like to know is,

RE: large term vectors

2008-02-11 Thread marc.dumontier
Hi Grant, Lucene 2.2.0 I'm not actually explicitely storing term vectors. It seems the huge amount of byte arrays is actually coming from SegmentReader.norms. Maybe that cache constantly grows as I read somewhere that it's on-demand. I'm not using any field or document boosting..is there some way

RE: large term vectors

2008-02-11 Thread marc.dumontier
No, it's split into about 100 individual indexes. But I'm running my 64-bit JVM with around 10gb max memory in order to avoid running out of memory after running all my unit tests (I have some other indexes as well running as part of this application). Upon further investigation, it seems to have

Getting Payload from Hits

2008-02-11 Thread Allahbaksh Mohammedali Asadullah
Hi, I have saved payload in my index. When the user types the query I get HIT document. From HIT document how I can get the value of Payload for particular tree. For example _analyzer = new PayloadAnalyzer(); _writer = new IndexWriter(new File("d:/test1"), _analyz

Re: Distributed Indexes

2008-02-11 Thread Grant Ingersoll
Solr has a strategy using rsync that makes it relatively easy to copy an index around to other servers. It uses rsync to just copy the diffs, so you could easily mirror this in your application. There is no SQL backend for Lucene, but at 4mb you could certainly serialize it as a blob to a

Re: large term vectors

2008-02-11 Thread Grant Ingersoll
Hi Marc, Can you give more info about what your field properties are? Your subject line implies you are storing term vectors, is that the case? Also, what version of Lucene are you using? Cheers, Grant On Feb 8, 2008, at 10:51 AM, <[EMAIL PROTECTED]> <[EMAIL PROTECTED] > wrote: Hi,