RE: question on custom filter

2009-07-20 Thread OBender
Never mind, I think I got it. -Original Message- From: OBender [mailto:osya_ben...@hotmail.com] Sent: Monday, July 20, 2009 4:42 PM To: java-user@lucene.apache.org Subject: RE: question on custom filter No, it reversed in the e-mail. Funny though, when I insert it in to the Excel it

RE: question on custom filter

2009-07-20 Thread OBender
tom filter Obender, does the following text appear like the image in the link, or not? שומר אחי http://farm1.static.flickr.com/3/10445435_75b4546703.jpg?v=0 On Mon, Jul 20, 2009 at 3:34 PM, OBender wrote: > I've checked, and it appears to be enabled. > > -Original Message-

RE: question on custom filter

2009-07-20 Thread OBender
u :) ? -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Monday, July 20, 2009 3:34 PM To: java-user@lucene.apache.org Subject: Re: question on custom filter Obender, I think your input is incorrect. The hebrew text you pasted in your example appears incorrect. Its gonna be h

RE: question on custom filter

2009-07-20 Thread OBender
I've checked, and it appears to be enabled. -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Monday, July 20, 2009 3:18 PM To: java-user@lucene.apache.org Subject: Re: question on custom filter Obender, based on your previous comments (that you see text display

RE: question on custom filter

2009-07-20 Thread OBender
3:03 PM To: java-user@lucene.apache.org Subject: Re: question on custom filter Obender, i ran your code and it did what I expected (but not what you pasted): First token is: (טוֹב,0,4) Second token is: (עֶרֶב,5,10) I also loaded up your SimpleWhitespaceAnalyzer in Luke, with the same results. On M

RE: question on custom filter

2009-07-20 Thread OBender
che.org Subject: Re: question on custom filter Obender, I think something in your environment / display environment might be causing some confusion. Are you using microsoft windows? If so, please verify that support for right-to-left languages is enabled [control panel/regional and language options].

RE: question on custom filter

2009-07-20 Thread OBender
: Re: question on custom filter Obender, This is not true. the text you pasted is the following in unicode: \N{HEBREW LETTER TET} \N{HEBREW LETTER VAV} \N{HEBREW POINT HOLAM} \N{HEBREW LETTER BET} \N{SPACE} \N{HEBREW LETTER AYIN} \N{HEBREW POINT SEGOL} \N{HEBREW LETTER RESH} \N{HEBREW POINT SEGOL

RE: question on custom filter

2009-07-20 Thread OBender
Hold on a second, the phrase that you included link to is not in the correct order of words! -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Monday, July 20, 2009 2:07 PM To: java-user@lucene.apache.org Subject: Re: question on custom filter Obender, This is not

RE: question on custom filter

2009-07-20 Thread OBender
ly 20, 2009 2:07 PM To: java-user@lucene.apache.org Subject: Re: question on custom filter Obender, This is not true. the text you pasted is the following in unicode: \N{HEBREW LETTER TET} \N{HEBREW LETTER VAV} \N{HEBREW POINT HOLAM} \N{HEBREW LETTER BET} \N{SPACE} \N{HEBREW LETTER AYIN} \N{HEBREW

RE: question on custom filter

2009-07-20 Thread OBender
ssage- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Monday, July 20, 2009 1:43 PM To: java-user@lucene.apache.org Subject: Re: question on custom filter Obender, I don't think its as difficult as you think. Your filter does not need to be aware of this issue at all. In unicode, rig

question on custom filter

2009-07-20 Thread OBender
Hi All! Let say I have a filter that produces new tokens based on the original ones. How bad will it be if my filter sets the start of each token to 0 and end to the length of a token? An example (based on the phrase "How are you?": Original token: [you?] (8,12) New tokens: [you]

RE: Tokenizer question: how can I force ? and ! to be separate tokens?

2009-07-17 Thread OBender
ou > are using an Analyzer that already uses WhiteSpaceTokenizer... but you > likely are) > > OBender wrote: >> Hi All, >> >> >> >> I need to make ? and ! characters to be a separate token e.g. to >> split [how >> are you?] in to 4 to

Why next(Token) in CharTokenizer is final?

2009-07-17 Thread OBender
Hi All, I think this is a question to Lucene dev team. Why the next(Token) method of CharTokenizer was made final? It is quite inconvenient and I don't see the reason why it is so. Thanks. - To unsubscribe, e-mail: java-user-u

Tokenizer queston: how can I force ? and ! to be separate tokens?

2009-07-17 Thread OBender
Hi All, I need to make ? and ! characters to be a separate token e.g. to split [how are you?] in to 4 tokens [how], [are], [you] and [?] what would be the best way to do this? Thanks

strange issues with IRISH

2009-07-13 Thread OBender
Hi All, I've came across very strange issue with Irish language. I have the following set of strings in Irish: ag an gcrosbhealach seo, Lean ar an mуrbhealach., Lean an bуthar seo., An bhfuil ... in am imeacht?, An ... sin an t-am ceart? And here is a search string: an Sear

RE: Hindi, diacritics and search results

2009-07-10 Thread OBender Hotmail
java-user@lucene.apache.org Subject: Re: Hindi, diacritics and search results Which analyzer in particular are you using? Its probably not doing what you want for hindi. These "diacritics" are important (vowels, etc). On Fri, Jul 10, 2009 at 3:10 PM, OBender wrote: > Hi All, >

Hindi, diacritics and search results

2009-07-10 Thread OBender
Hi All, I'm using the default setup of lucene (no custom analyzers configured) and came across the following issue: In Hindi if there is a letter with a diacritic in a phrase lucene will find the phrase with this letter even if the search string is for the letter without a diacritics. Is this

RE: Lucene and multi-lingual Unicode - advice needed

2009-06-16 Thread OBender Hotmail
on, Jun 15, 2009 at 10:30 PM, OBender Hotmail wrote: > That's the thing there is no actual requirement. > I've been presented with all the languages that company theoretically > provides. > My guess is that what I'm going to end up with is all western languages, good >

RE: Lucene and multi-lingual Unicode - advice needed

2009-06-15 Thread OBender Hotmail
h? I think you might have larger problems! On Mon, Jun 15, 2009 at 9:18 PM, OBender Hotmail wrote: > Here is the list of possible languages. Don't laugh :) I know those are > almost all world languages but it is a true requirement. Well, actual number > will be closer to 70 not 100

RE: Lucene and multi-lingual Unicode - advice needed

2009-06-15 Thread OBender Hotmail
return ts; } } can you give a better idea as to what languages you have and what your search requirements are (accent marks, punctuation, etc etc) ? On Mon, Jun 15, 2009 at 5:39 PM, OBender Hotmail wrote: > I've looked over SolR quickly, it is a bit too heavy for my project. > So w

RE: Lucene and multi-lingual Unicode - advice needed

2009-06-15 Thread OBender Hotmail
-user@lucene.apache.org Subject: Re: Lucene and multi-lingual Unicode - advice needed Well just reply back if SolR is inappropriate for your needs. In that case, you will need to build a custom analyzer (its not too bad), so that you can use compass. On Mon, Jun 15, 2009 at 4:19 PM, OBender Ho

RE: Lucene and multi-lingual Unicode - advice needed

2009-06-15 Thread OBender Hotmail
ework, but recently there has been some improvements added to SolR so that the default type 'text' is pretty good for multilingual processing. In fact I hope in the future it will be improved in lucene so that your decision is really based upon other application needs... On Mon, Jun 15, 2009

Lucene and multi-lingual Unicode - advice needed

2009-06-15 Thread OBender Hotmail
Hi All! I'm new to Lucene so forgive me if this question was asked before. I have a database with records in the same table in many different languages (up to 70) it includes all W-European, Arabic, Eastern, CJK, Cyrillic, etc. you name it. I've looked at what people say about Lucene and it l