Re: IndexOutOfBoundsException

2008-08-14 Thread Yonik Seeley
(switching to java-user) OK, that's great that it's so reproducable. To rule out a JVM bug, it would be great if you could try out Sun's 1.6.0_03 to see if it still happens. -Yonik On Thu, Aug 14, 2008 at 10:18 PM, Ian Connor <[EMAIL PROTECTED]> wrote: > I seem to be able to reproduce this very e

Re: Payloads and tokenizers

2008-08-14 Thread Antony Bowesman
Thanks for your comments Doron. I found the earlier discussions on the dev list (21/12/06), where this issue is discussed - my use case is similar to Nadav Har'El. Implementing payloads via Tokens explicitly prevents the use of payloads for untokenized fields, as they only support field.string

Re: Lucene search for OR

2008-08-14 Thread Daniel Noll
AlexElba wrote: Hello I am trying to search for or(Oregon) even when it is not capitalized it is not returning any results. How to search for 'or' ? It sounds like you might have indexed with English stop words, as "or" is certainly in that list. Trivial way to check is to search for "the".

Re: Case Sensitivity

2008-08-14 Thread Andre Rubin
Sergey, Based on a recent discussion I posted: http://www.nabble.com/Searching-Tokenized-x-Un_tokenized-td18882569.html , you cannot use Un_Tokenized because you can't have any analyzer run thorugh it. My suggestion, use a tokenized filed and a custom made Analyzer. Haven't figure out all the det

Lucene search for OR

2008-08-14 Thread AlexElba
Hello I am trying to search for or(Oregon) even when it is not capitalized it is not returning any results. How to search for 'or' ? -- View this message in context: http://www.nabble.com/Lucene-search-for-OR-tp18990623p18990623.html Sent from the Lucene - Java Users mailing list archive at Nab

lucene/nutch question...

2008-08-14 Thread bruce
Hi. Got a very basic lucene/nutch question. Assume I have a page that has a form. Within the form are a number of select/drop-down boxes/etc... In this case, each object would comprise a variable which would form part of the query string as defined in the form action. Is there a way for lucene/nu

test

2008-08-14 Thread bruce
test - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Do doc-ids always increment by 1 if no deletion is done?

2008-08-14 Thread Michael McCandless
This is true of Lucene today, with one exception: if an add/updateDocument call hits an exception it's possible it consumed a docID (which is immediately marked as deleted). This will cause your index to have deletions. I don't think this behavior is guaranteed in future releases of Lucene. Mike

Re: Issue while creating Regular Expressions

2008-08-14 Thread Erick Erickson
What are you trying to do with the regex? And why is it appropriate to the Lucene list? What is a segment and how does it relate to Lucene? It would really help if you showed us some example input and what transformation you are trying to implement with your regex. If it's a pure regex question yo

Issue while creating Regular Expressions

2008-08-14 Thread Hareesh
I have a small problem. I will describe you the problem first ..I am working on a search Engine now, in which the crawling is done using Heritrix and the crawled data is the input for my Logic. while trying to index the ARC files from Heritrix its not creating indexes in the desired format. The Ex

Re: Case Sensitivity

2008-08-14 Thread Erick Erickson
Be aware that StandardAnalyzer lowercases all the input, both at index and query times. Field.Store.YES will store the original text without any transformations, so doc.get() will return the original text. However, no matter what the Field.Store value, the *indexed* tokens (using TOKENIZED as you F

Do doc-ids always increment by 1 if no deletion is done?

2008-08-14 Thread Gunjan Juyal
Hi. I am creating an index where there are no deletions, just additions. After the index creation is done I need to create another mapping of doc-ids to some data. If there are only additions and no deletions then can we assume that the doc-ids will be in the same order in which the documents we

Re: Case Sensitivity

2008-08-14 Thread Doron Cohen
> > In example I want to show what I stored field as Field.Index.NO_NORMS > > As I understand it means what field contains original string > despite what analyzer I chose(StandardAnalyzer by default). > This would be achieved by UN_TOKENIZED. The NO_NORMS just guides Lucene to avoid normalizin

Re: Case Sensitivity

2008-08-14 Thread Sergey Kabashnyuk
Thanks for you reply Erick. About the only way to do this that I know of is to index the data three times, once without any case changing, once uppercased and once lowercased. You'll have to watch your analyzer, probably making up your own (easily done, see the synonym analyzer in Lucene in Ac

Re: Case Sensitivity

2008-08-14 Thread Erick Erickson
About the only way to do this that I know of is to index the data three times, once without any case changing, once uppercased and once lowercased. You'll have to watch your analyzer, probably making up your own (easily done, see the synonym analyzer in Lucene in Action). Your example doesn't tell

Re: Payloads and tokenizers

2008-08-14 Thread Doron Cohen
IIRC first versions of patches that added payloads support had this notion of payload by field rather than by token, but later it was modified to be by token only. I have seen two code patterns to add payloads to tokens. The first one created the field text with a reserved separator/delimiter whi

Re: CheckIndex possibly not detecting/fixing all corruptions?

2008-08-14 Thread Michael McCandless
OK thanks for bringing closure to this John, and good luck tracking it down. Mike John O'Brien <[EMAIL PROTECTED]> wrote: > Hi Mike, >Apologies for the delay in getting back. > I have since figured out that the reason Luke gave an error when we searched > on the "fixed" index was (possibl

Re: Case Sensitivity

2008-08-14 Thread Sergey Kabashnyuk
Hello. I have the similar question. I need to implement 1. Case sensitive search. 2. Lower case search for concrete field. 3. Upper case search for concrete filed. For now I use new Field(“PROPERTIES”, content, Field.Store.NO, Field.Index