Hi,
I have a question regarding the format of the Index created by DocMaker,
from EnWikiContentSource.
After creating the Index from dump of all Wikipedia's articles (
https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-
pages-articles-multistream.xml.bz2), I'm having trouble understanding th
acets).
>
> Uwe
>
> -
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Armins Stepanjans [mailto:armins.bagr...@gmail.com]
> > Sent: Tuesday, January 9, 2018 2:52 PM
Hi,
I'm not sure I understand your question.
There should be no confusion about setting a Maven snapshot dependency in
the pom file, as you can specify version with
8.0-SNAPSHOT (substituting 8.0 with the version you
want).
However, in the case you are looking for a particular version of Lucene,
Hi,
When I create a document with multiple StringFields and add it to
IndexWriter using addDocument(Document), the StringFields within the
Document are not tokenized nor filtered according to Analyzer's
specifications, however when I test my Analyzer, while looping through
tokens by explicitly cal
Hi,
I am looking for a tokenizer, where I could specify a delimiter by which
the words are tokenized, for example if I choose the delimiters as ' ' and
'_' the following string:
"foo__bar doo"
would be tokenized into:
"foo", "", "bar", "doo"
(The analyzer could further filter empty tokens, since h
dler
> Achterdiek 19, D-28357 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Armins Stepanjans [mailto:armins.bagr...@gmail.com]
> > Sent: Monday, January 8, 2018 2:09 PM
> > To: java-user@lucene.apache.org
k = CharTokenizer.fromSeparatorCharPredicate(ch ->
> Character.isWhitespace || ch == '_');
>
> Uwe
>
> -
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From:
Hi,
I am looking for a tokenizer, where I could specify a delimiter by which
the words are tokenized, for example if I choose the delimiters as ' ' and
'_' the following string:
"foo__bar doo"
would be tokenized into:
"foo", "", "bar", "doo"
(The analyzer could further filter empty tokens, since h