date:20170916

Re: German decompounding/tokenization with Lucene?

2017-09-16 Thread Dawid Weiss

Hi Mike. Search lucene dev archives. I did write a decompounder with Daniel Naber. The quality was not ideal but perhaps better than nothing. Also, Daniel works on languagetool.org? They should have something in there. Dawid On Sep 16, 2017 1:58 AM, "Michael McCandless" wrote: > Hello, > > I ne

Re: German decompounding/tokenization with Lucene?

2017-09-16 Thread Tommaso Teofili

+1, some time ago I also used the decompounder mentioned by Dawid and was satisfied back then. Regards, Tommaso Il giorno sab 16 set 2017 alle ore 09:29 Dawid Weiss ha scritto: > Hi Mike. Search lucene dev archives. I did write a decompounder with Daniel > Naber. The quality was not ideal but

RE: German decompounding/tokenization with Lucene?

2017-09-16 Thread Uwe Schindler

Hi Michael, I had this issue just yesterday. I did that several times and I built a good dictionary in the meantime. I have an example for Solr or Elasticsearch with the same data. It uses the HyphenationCompoundTokenFilter, but with ZIP file *and* dictionary (it's important to have both). The

RE: German decompounding/tokenization with Lucene?

2017-09-16 Thread Uwe Schindler

Hi, I published my work on Github: https://github.com/uschindler/german-decompounder Have fun. I am not yet 100% sure about the License of the data file. The original Author (Björn Jacke) did not publish any license; but LibreOffice publishes his files Under LGPL. So to be safe, I applied the

RE: German decompounding/tokenization with Lucene?

2017-09-16 Thread Markus Jelsma

Hello Uwe, Thanks for getting rid of the compounds. The dictionary can be smaller, it still has about 1500 duplicates. It is also unsorted. Regards, Markus -Original message- > From:Uwe Schindler > Sent: Saturday 16th September 2017 12:16 > To: java-user@lucene.apache.org > Subject: R

RE: German decompounding/tokenization with Lucene?

2017-09-16 Thread Uwe Schindler

Send a pull request. :) Uwe Am 16. September 2017 12:42:30 MESZ schrieb Markus Jelsma : >Hello Uwe, > >Thanks for getting rid of the compounds. The dictionary can be smaller, >it still has about 1500 duplicates. It is also unsorted. > >Regards, >Markus > > >-Original message- >> From:Uwe

RE: German decompounding/tokenization with Lucene?

2017-09-16 Thread Markus Jelsma

Sorry, i would if i were on Github, but i am not. Thanks again! Markus -Original message- > From:Uwe Schindler > Sent: Saturday 16th September 2017 12:45 > To: java-user@lucene.apache.org > Subject: RE: German decompounding/tokenization with Lucene? > > Send a pull request. :) > > Uwe

RE: German decompounding/tokenization with Lucene?

2017-09-16 Thread Uwe Schindler

Ok sorting and deduping should be easy with a simple command line. Reason is that it was created from 2 files of Björn Jacke's Data. I thought that I deduped it... Uwe Am 16. September 2017 12:46:29 MESZ schrieb Markus Jelsma : >Sorry, i would if i were on Github, but i am not. > >Thanks again

RE: German decompounding/tokenization with Lucene?

2017-09-16 Thread Uwe Schindler

Hi, I deduped it. Thanks for the hint! Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Uwe Schindler [mailto:u...@thetaphi.de] > Sent: Saturday, September 16, 2017 12:51 PM > To: java-user@lucene.apache.or

Re: German decompounding/tokenization with Lucene?

2017-09-16 Thread Michael McCandless

Whoa, thank you Uwe! I will have a look; too bad about the licensing, but I know dictionaries are often licensed with LGPL. Mike McCandless http://blog.mikemccandless.com On Sat, Sep 16, 2017 at 7:03 AM, Uwe Schindler wrote: > Hi, > > I deduped it. Thanks for the hint! > > Uwe > > - > Uwe

Still using lucene 2.3, is compatible with java 8?

2017-09-16 Thread Lisheng Zhang

Hi, in one of our product we are still using lucene 2.3, is lucene 2.3 compatible with java 1.8? Thanks very much for helps, Lisheng

Re: Still using lucene 2.3, is compatible with java 8?

2017-09-16 Thread Erick Erickson

I doubt anyone has tested it. I'd compile it under Java 8 and see if all of the tests run. Best, Erick On Sat, Sep 16, 2017 at 7:41 AM, Lisheng Zhang wrote: > Hi, in one of our product we are still using lucene 2.3, is lucene 2.3 > compatible with java 1.8? > > Thanks very much for helps, Lishen

Re: German decompounding/tokenization with Lucene?

Re: German decompounding/tokenization with Lucene?

RE: German decompounding/tokenization with Lucene?

RE: German decompounding/tokenization with Lucene?

RE: German decompounding/tokenization with Lucene?

RE: German decompounding/tokenization with Lucene?

RE: German decompounding/tokenization with Lucene?

RE: German decompounding/tokenization with Lucene?

RE: German decompounding/tokenization with Lucene?

Re: German decompounding/tokenization with Lucene?

Still using lucene 2.3, is compatible with java 8?

Re: Still using lucene 2.3, is compatible with java 8?

12 matches

Site Navigation

Mail list logo

Footer information