Another good reference is this one: http://unicode.org/reports/tr29/
Since the latest Lucene uses this for the basis of its text
segmentation, it's worth getting familiar with it.
On Fri, Mar 30, 2012 at 10:09 AM, Robert Muir wrote:
> On Fri, Mar 30, 2012 at 1:03 PM, Denis Brodeur wrote:
>> Tha
On Fri, Mar 30, 2012 at 1:03 PM, Denis Brodeur wrote:
> Thanks Robert. That makes sense. Do you have a link handy where I can
> find this information? i.e. word boundary/punctuation for any unicode
> character set?
>
yeah, usually i use
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[\u0
fileformat.info
On Mar 30, 2012, at 1:04 PM, Denis Brodeur wrote:
> Thanks Robert. That makes sense. Do you have a link handy where I can
> find this information? i.e. word boundary/punctuation for any unicode
> character set?
>
> On Fri, Mar 30, 2012 at 12:57 PM, Robert Muir wrote:
>
>> On F
Thanks Robert. That makes sense. Do you have a link handy where I can
find this information? i.e. word boundary/punctuation for any unicode
character set?
On Fri, Mar 30, 2012 at 12:57 PM, Robert Muir wrote:
> On Fri, Mar 30, 2012 at 12:46 PM, Denis Brodeur
> wrote:
> > Hello, I'm currently w
On Fri, Mar 30, 2012 at 12:46 PM, Denis Brodeur wrote:
> Hello, I'm currently working out some problems when searching for Tibetan
> Characters. More specifically: /u0f10-/u0f19. We are using the
unicode doesn't consider most of these characters part of a word: most
are punctuation and symbols
Hello, I'm currently working out some problems when searching for Tibetan
Characters. More specifically: /u0f10-/u0f19. We are using the
StandardAnalyzer (3.4) and I've narrowed the problem down to
StandardTokenizerImpl throwing away these characters i.e. in
getNextToken(), falls through case1:
I have added the wait but the script still crashes from time to time.
(I noticed that the value of self.jvm.attachCurrentThread() is always 0, i.e.
the script always enters the while loop only once).
thanks
-Original Message-
From: Greg Bowyer [mailto:gbow...@shopzilla.com]
Sent: 29 Mar
Surge 2012, the scalability conference, September 27-28, Baltimore, MD
has opened its CFP. Please visit http://omniti.com/surge/2012/cfp for
details.
--
Katherine Jeschke
Director of Marketing and Creative Services
OmniTI Computer Consulting, Inc.
7070 Samuel Morse Drive, Ste.150
Columbia, MD 210