Re: Hypenated word

2005-06-13 Thread Andy Roberts
On Monday 13 Jun 2005 14:52, Markus Wiederkehr wrote: > On 6/13/05, Andy Roberts <[EMAIL PROTECTED]> wrote: > > On Monday 13 Jun 2005 13:18, Markus Wiederkehr wrote: > > > I see, the list of exceptions makes this a lot more complicated than I > > > thought... Thanks a lot, Erik! > > > > I expect yo

Re: Hypenated word

2005-06-13 Thread Erik Hatcher
On Jun 13, 2005, at 10:55 AM, Andy Roberts wrote: On Monday 13 Jun 2005 13:18, Markus Wiederkehr wrote: I see, the list of exceptions makes this a lot more complicated than I thought... Thanks a lot, Erik! I expect you'll need to do some pre-processing. Read in your text into a buffer

Re: Hypenated word

2005-06-13 Thread Peter A. Friend
On Jun 13, 2005, at 6:18 AM, Markus Wiederkehr wrote: I see, the list of exceptions makes this a lot more complicated than I thought... Thanks a lot, Erik! There is a section about the problems that hyphens create in "Foundations of Statistical Natural Language Processing". Not only are t

Re: Hypenated word

2005-06-13 Thread Markus Wiederkehr
On 6/13/05, Andy Roberts <[EMAIL PROTECTED]> wrote: > On Monday 13 Jun 2005 13:18, Markus Wiederkehr wrote: > > I see, the list of exceptions makes this a lot more complicated than I > > thought... Thanks a lot, Erik! > > > > I expect you'll need to do some pre-processing. Read in your text into a

Re: Hypenated word

2005-06-13 Thread Andy Roberts
On Monday 13 Jun 2005 13:18, Markus Wiederkehr wrote: > I see, the list of exceptions makes this a lot more complicated than I > thought... Thanks a lot, Erik! > I expect you'll need to do some pre-processing. Read in your text into a buffer, line-by-line. If a given line ends with a hyphen, you

Re: Hypenated word

2005-06-13 Thread Markus Wiederkehr
I see, the list of exceptions makes this a lot more complicated than I thought... Thanks a lot, Erik! Markus On 6/13/05, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > On Jun 13, 2005, at 7:08 AM, Markus Wiederkehr wrote: > > I work on an application that has to index OCR texts of scanned books. >

Re: Hypenated word

2005-06-13 Thread Erik Hatcher
On Jun 13, 2005, at 7:08 AM, Markus Wiederkehr wrote: I work on an application that has to index OCR texts of scanned books. Naturally there occur many words that are hyphenated across lines. I wonder if there is already an Analyzer or maybe a TokenFilter that can merge those syllables back int

Hypenated word

2005-06-13 Thread Markus Wiederkehr
Hello, I work on an application that has to index OCR texts of scanned books. Naturally there occur many words that are hyphenated across lines. I wonder if there is already an Analyzer or maybe a TokenFilter that can merge those syllables back into whole words? It looks like Erik Hatcher uses so