Never mind, I think I got it.
-Original Message-
From: OBender [mailto:osya_ben...@hotmail.com]
Sent: Monday, July 20, 2009 4:42 PM
To: java-user@lucene.apache.org
Subject: RE: question on custom filter
No, it reversed in the e-mail. Funny though, when I insert it in to the Excel
it
tom filter
Obender, does the following text appear like the image in the link, or not?
שומר אחי
http://farm1.static.flickr.com/3/10445435_75b4546703.jpg?v=0
On Mon, Jul 20, 2009 at 3:34 PM, OBender wrote:
> I've checked, and it appears to be enabled.
>
> -Original Message-
u
:) ?
-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com]
Sent: Monday, July 20, 2009 3:34 PM
To: java-user@lucene.apache.org
Subject: Re: question on custom filter
Obender, I think your input is incorrect. The hebrew text you pasted
in your example appears incorrect. Its gonna be h
I've checked, and it appears to be enabled.
-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com]
Sent: Monday, July 20, 2009 3:18 PM
To: java-user@lucene.apache.org
Subject: Re: question on custom filter
Obender, based on your previous comments (that you see text display
3:03 PM
To: java-user@lucene.apache.org
Subject: Re: question on custom filter
Obender, i ran your code and it did what I expected (but not what you pasted):
First token is: (טוֹב,0,4)
Second token is: (עֶרֶב,5,10)
I also loaded up your SimpleWhitespaceAnalyzer in Luke, with the same results.
On M
che.org
Subject: Re: question on custom filter
Obender, I think something in your environment / display environment
might be causing some confusion.
Are you using microsoft windows? If so, please verify that support for
right-to-left languages is enabled [control panel/regional and
language options].
: Re: question on custom filter
Obender, This is not true.
the text you pasted is the following in unicode:
\N{HEBREW LETTER TET}
\N{HEBREW LETTER VAV}
\N{HEBREW POINT HOLAM}
\N{HEBREW LETTER BET}
\N{SPACE}
\N{HEBREW LETTER AYIN}
\N{HEBREW POINT SEGOL}
\N{HEBREW LETTER RESH}
\N{HEBREW POINT SEGOL
Hold on a second, the phrase that you included link to is not in the correct
order of words!
-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com]
Sent: Monday, July 20, 2009 2:07 PM
To: java-user@lucene.apache.org
Subject: Re: question on custom filter
Obender, This is not
ly 20, 2009 2:07 PM
To: java-user@lucene.apache.org
Subject: Re: question on custom filter
Obender, This is not true.
the text you pasted is the following in unicode:
\N{HEBREW LETTER TET}
\N{HEBREW LETTER VAV}
\N{HEBREW POINT HOLAM}
\N{HEBREW LETTER BET}
\N{SPACE}
\N{HEBREW LETTER AYIN}
\N{HEBREW
ssage-
From: Robert Muir [mailto:rcm...@gmail.com]
Sent: Monday, July 20, 2009 1:43 PM
To: java-user@lucene.apache.org
Subject: Re: question on custom filter
Obender, I don't think its as difficult as you think. Your filter does
not need to be aware of this issue at all.
In unicode, rig
Hi All!
Let say I have a filter that produces new tokens based on the original ones.
How bad will it be if my filter sets the start of each token to 0 and end to
the length of a token?
An example (based on the phrase "How are you?":
Original token:
[you?] (8,12)
New tokens:
[you]
ou
> are using an Analyzer that already uses WhiteSpaceTokenizer... but you
> likely are)
>
> OBender wrote:
>> Hi All,
>>
>>
>>
>> I need to make ? and ! characters to be a separate token e.g. to
>> split [how
>> are you?] in to 4 to
Hi All,
I think this is a question to Lucene dev team.
Why the next(Token) method of CharTokenizer was made final?
It is quite inconvenient and I don't see the reason why it is so.
Thanks.
-
To unsubscribe, e-mail: java-user-u
Hi All,
I need to make ? and ! characters to be a separate token e.g. to split [how
are you?] in to 4 tokens [how], [are], [you] and [?] what would be the best
way to do this?
Thanks
Hi All,
I've came across very strange issue with Irish language.
I have the following set of strings in Irish:
ag an gcrosbhealach seo,
Lean ar an mуrbhealach.,
Lean an bуthar seo.,
An bhfuil ... in am imeacht?,
An ... sin an t-am ceart?
And here is a search string: an
Sear
java-user@lucene.apache.org
Subject: Re: Hindi, diacritics and search results
Which analyzer in particular are you using?
Its probably not doing what you want for hindi. These "diacritics" are
important (vowels, etc).
On Fri, Jul 10, 2009 at 3:10 PM, OBender wrote:
> Hi All,
>
Hi All,
I'm using the default setup of lucene (no custom analyzers configured) and
came across the following issue:
In Hindi if there is a letter with a diacritic in a phrase lucene will find
the phrase with this letter even if the search string is for the letter
without a diacritics.
Is this
on, Jun 15, 2009 at 10:30 PM, OBender
Hotmail wrote:
> That's the thing there is no actual requirement.
> I've been presented with all the languages that company theoretically
> provides.
> My guess is that what I'm going to end up with is all western languages, good
>
h?
I think you might have larger problems!
On Mon, Jun 15, 2009 at 9:18 PM, OBender Hotmail wrote:
> Here is the list of possible languages. Don't laugh :) I know those are
> almost all world languages but it is a true requirement. Well, actual number
> will be closer to 70 not 100
return ts;
}
}
can you give a better idea as to what languages you have and what your
search requirements are (accent marks, punctuation, etc etc) ?
On Mon, Jun 15, 2009 at 5:39 PM, OBender Hotmail wrote:
> I've looked over SolR quickly, it is a bit too heavy for my project.
> So w
-user@lucene.apache.org
Subject: Re: Lucene and multi-lingual Unicode - advice needed
Well just reply back if SolR is inappropriate for your needs.
In that case, you will need to build a custom analyzer (its not too
bad), so that you can use compass.
On Mon, Jun 15, 2009 at 4:19 PM, OBender Ho
ework, but recently there has been some improvements added to SolR
so that the default type 'text' is pretty good for multilingual
processing.
In fact I hope in the future it will be improved in lucene so that
your decision is really based upon other application needs...
On Mon, Jun 15, 2009
Hi All!
I'm new to Lucene so forgive me if this question was asked before.
I have a database with records in the same table in many different languages
(up to 70) it includes all W-European, Arabic, Eastern, CJK, Cyrillic, etc.
you name it.
I've looked at what people say about Lucene and it l
23 matches
Mail list logo