Hi Mike. Search lucene dev archives. I did write a decompounder with Daniel
Naber. The quality was not ideal but perhaps better than nothing. Also,
Daniel works on languagetool.org? They should have something in there.
Dawid
On Sep 16, 2017 1:58 AM, "Michael McCandless"
wrote:
> Hello,
>
> I ne
+1, some time ago I also used the decompounder mentioned by Dawid and was
satisfied back then.
Regards,
Tommaso
Il giorno sab 16 set 2017 alle ore 09:29 Dawid Weiss
ha scritto:
> Hi Mike. Search lucene dev archives. I did write a decompounder with Daniel
> Naber. The quality was not ideal but
Hi Michael,
I had this issue just yesterday. I did that several times and I built a good
dictionary in the meantime.
I have an example for Solr or Elasticsearch with the same data. It uses the
HyphenationCompoundTokenFilter, but with ZIP file *and* dictionary (it's
important to have both). The
Hi,
I published my work on Github:
https://github.com/uschindler/german-decompounder
Have fun. I am not yet 100% sure about the License of the data file. The
original
Author (Björn Jacke) did not publish any license; but LibreOffice publishes his
files
Under LGPL. So to be safe, I applied the
Hello Uwe,
Thanks for getting rid of the compounds. The dictionary can be smaller, it
still has about 1500 duplicates. It is also unsorted.
Regards,
Markus
-Original message-
> From:Uwe Schindler
> Sent: Saturday 16th September 2017 12:16
> To: java-user@lucene.apache.org
> Subject: R
Send a pull request. :)
Uwe
Am 16. September 2017 12:42:30 MESZ schrieb Markus Jelsma
:
>Hello Uwe,
>
>Thanks for getting rid of the compounds. The dictionary can be smaller,
>it still has about 1500 duplicates. It is also unsorted.
>
>Regards,
>Markus
>
>
>-Original message-
>> From:Uwe
Sorry, i would if i were on Github, but i am not.
Thanks again!
Markus
-Original message-
> From:Uwe Schindler
> Sent: Saturday 16th September 2017 12:45
> To: java-user@lucene.apache.org
> Subject: RE: German decompounding/tokenization with Lucene?
>
> Send a pull request. :)
>
> Uwe
Ok sorting and deduping should be easy with a simple command line. Reason is
that it was created from 2 files of Björn Jacke's Data. I thought that I
deduped it...
Uwe
Am 16. September 2017 12:46:29 MESZ schrieb Markus Jelsma
:
>Sorry, i would if i were on Github, but i am not.
>
>Thanks again
Hi,
I deduped it. Thanks for the hint!
Uwe
-
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Uwe Schindler [mailto:u...@thetaphi.de]
> Sent: Saturday, September 16, 2017 12:51 PM
> To: java-user@lucene.apache.or
Whoa, thank you Uwe! I will have a look; too bad about the licensing, but
I know dictionaries are often licensed with LGPL.
Mike McCandless
http://blog.mikemccandless.com
On Sat, Sep 16, 2017 at 7:03 AM, Uwe Schindler wrote:
> Hi,
>
> I deduped it. Thanks for the hint!
>
> Uwe
>
> -
> Uwe
Hi, in one of our product we are still using lucene 2.3, is lucene 2.3
compatible with java 1.8?
Thanks very much for helps, Lisheng
I doubt anyone has tested it. I'd compile it under Java 8 and see if
all of the tests run.
Best,
Erick
On Sat, Sep 16, 2017 at 7:41 AM, Lisheng Zhang wrote:
> Hi, in one of our product we are still using lucene 2.3, is lucene 2.3
> compatible with java 1.8?
>
> Thanks very much for helps, Lishen
12 matches
Mail list logo