> Anyway, in my personal opinion, Lucene does not need to consider whether
the system dictionary status is good or not.
Please don't get me wrong, but I don't think so.
Creating a customized or re-trained system dictionary still needs deep
knowledge about language and machine-learning. Even among
Oh, I think my explanation was not enough. Sorry...
I mentioned the following sentences.
=
1. Modify your dictionary file and rebuild.
1-1) Install MeCab
1-2) Install MeCab Dictionary
1-3) Modify your dictionary file
1-4) Make it to tar.gz
==
Hi,
The system dictionary is not a mere "word collection", it includes a
machine-learned language model which is carefully trained by
researchers. If you want to replace the system dictionary, you have to
start from "re-train" the model. This needs expert knowledge so I do
not recommend to just mo
On Sun, 26 May 2019 at 23:49, Namgyu Kim wrote:
> I think so about that approach.
> It's not user-friendly and it is not good for the user.
I think it's better to get the parameters in
JapaneseTokenizer.
>
> What do you think about this?
A way to override the system dictionary would be useful
I've been able to build a dictionary using DictionaryBuilder (I guess that
is what the "regenerate" task must be using?)
=>
Yes. That's right.
The "regenerate" run commands in the following order:
1) Compile the code (compile-tools)
2) Download the jar file (download-dict)
3) Save Noun.proper.csv d
Thanks, Namgyu. I've been able to build a dictionary using
DictionaryBuilder (I guess that is what the "regenerate" task must be
using?) and I can replace the existing one on the classpath with jar
surgery for now. Not a very user-friendly approach, but it will enable
me to run some experiments and
Sorry for the wrong information, Mike.
Tomoko is right.
I checked it wrong.
User dictionary is independent from the system dictionary. If you give
the user entries, JapaneseTokenizer builds two FSTs one for the
built-in dictionary and one for the user dictionary and they are
retrieved separately.