Re: Chinese Segmentation with Phase Query

Cedric Ho Fri, 09 Nov 2007 16:51:13 -0800

The CJKAnalyzer is too simple for our need. But thanks for suggesting anyway.


Cheers,
Cedric

On Nov 9, 2007 10:43 PM, Open Study <[EMAIL PROTECTED]> wrote:
> Hi Cedric
>
> You may try the CJKAnalyzer within the lucene sandbox. It doesn't give
> a perfect solution for Chinese word segmentation, but will solve the
> problem in your case.
>
>
> On Nov 9, 2007 10:59 AM, Cedric Ho <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > We are having an issue while indexing Chinese Documents in Lucene.
> >
> > Some background first:
> > Since CJK languages doesn't have space between words, we first have to
> > determine the words from sentences. e.g.
> >
> > a sentence containing characters ABC, it may be segmented into AB, C or A, 
> > BC.
> >
> > the problem is sometimes there can be ambiguities in how the sentence
> > should be segmented. It is possible that
> > both AB, C and A, BC are valid segmentations.
> >
> > In this cases we would like to index both segmentation into the index:
> >
> > AB offset (0,1) position 0
> > C offset (2,2) position 1
> > A offset (0,0) position 0
> > BC offset (1,2) position 1
> >
> > Now the problem is, when someone search using a PhraseQuery (AC) it
> > will find this line ABC because it match A (position 0) and C
> > (position 1).
> >
> > Are there any ways to search for exact match using the offset
> > information instead of the position information ?
> >
> > Best Regards,
> > Cedric
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Chinese Segmentation with Phase Query

Reply via email to