Hi.
Thanks for the reply. Of course each document go into exactly one shard.
> On Mar 31, 2017, at 15:01, Erick Erickson wrote:
>
> I don't believe addIndexes does much except rewrite the
> segments file (i.e. the file that tells Lucene what
> the current segments are).
>
> That said, if you'r
I don't believe addIndexes does much except rewrite the
segments file (i.e. the file that tells Lucene what
the current segments are).
That said, if you're desperate you can optimize/force-merge.
Do note, though, that no deduplication is done. So if the
indexes you're merging have docs with the s
Yeah, I definitely will look into PreAnalyzedField as you and Michail suggest.
Thank you.
> On Mar 30, 2017, at 19:15, Uwe Schindler wrote:
>
> But that's hard to implement. I'd go for Solr instead of doing that on your
> own!
---
Denis Bazhenov
Interesting. In case of addIndexes() does Lucene perform any optimization on
segments before searching over individual segments or those indexes are
searched "as is”?
> On Mar 30, 2017, at 19:09, Mikhail Khludnev wrote:
>
> I believe you can have more shards for indexing and then merge (and no
Ok, so a flexible interface would be to be able to pass some
TokenFilterFactory that would be called each time a TokenFilter is
needed. Would that be ok?
El 30/03/17 a las 03:47, Uwe Schindler escribió:
A TokenFilter object already build won't work, because the Analyzer must create
new insta
Thank you Uwe, that was really helpful.
Leonardo.
From: Uwe Schindler
Sent: 30 March 2017 13:14:17
To: java-user@lucene.apache.org
Subject: RE: CustomAnalyzer.Builder builder()
Empty is not null:
> .addTokenFilter(StopFilterFactory.class,
> "ignoreCase",
Empty is not null:
> .addTokenFilter(StopFilterFactory.class,
> "ignoreCase", "true",
> "words", "",
> "format", "wordset")
This will cause the empty named file be loaded, which may not work with all
class loaders. Just remove the useless parameters. Remove "words" and "format".
I
Hi,
I do not know to what file name do you refer, can you please be a little more
specific? Did you mean a stopwords file name?
Currently I am developing an application which do not have a name yet, this is
why you see the class MySearchApp, which is a test source file on which I am
testing
Hi,
I am still a bit confused why you use an empty file name! Is this just
copypasted here for privacy reasons wthout filename, or is there really no file
name? This would explain why it may not work with the defaults.
Uwe
-
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.d
Hi,
Yes you are right, using the ClasspathResourceLoader class did solved the
issue, passing my on class as parameter:
ClasspathResourceLoader resourceLoader = new
ClasspathResourceLoader(MySearchApp.class);
Analyzer analyzer = CustomAnalyzer.builder(resourceLoader)
.withTokenizer(
Hi,
> >>> I should have mentioned that I for compatibility reasons still need to
> >>> be able to read/write indexes created with the old version, i.e., with
> >>> the 5.0 codec.
> >
> > The old codecs are read-only! As said before, you can only specify the
> codec for IndexWriter. That means new s
Hi Uwe,
>>> I should have mentioned that I for compatibility reasons still need to
>>> be able to read/write indexes created with the old version, i.e., with
>>> the 5.0 codec.
>
> The old codecs are read-only! As said before, you can only specify the codec
> for IndexWriter. That means new sege
Hi,
there is no easy way to do this with Lucene. The analysis part is tightly bound
to IndexWriter. There are ways to decouple this, but you have to write your own
Analyzer and some network protocol.
Solr has something lik this, it's called PreAnalyzedField: This is a field type
that has some
I believe you can have more shards for indexing and then merge (and not
literally, but just by addIndexes() or so ) them to smaller number for
search. Transferring indices is more efficient (scp -C) than separate
tokens and their attributes over the wire.
On Thu, Mar 30, 2017 at 12:02 PM, Denis Ba
We already have done this. Many years ago :)
At the moment we have 7 shards. The problem with getting more shards is that
search become less cost effective (in terms of cluster CPU time per request) as
you split index in more shards. Considering response time is good enough and
the fact search
Hi,
the document does not contain the analyzed tokens. The Lucene Analyzers are
called inside the IndexWriter *during* indexing, so there is no way to do that
somewhere else. The IndexableDocument instances by Lucene are just iterables of
IndexableField that contain the unparsed fulltext as pas
What if totalHits > 1?
TX
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
Hi.
We have in-house distributed Lucene setup. 40 dual-socket servers with
approximatley 700 cores divided in 7 partitions. Those machines are doing index
search only. Indexes are prepared on several isolated machines (so called,
Index Masters) and distributed over the cluster with plain rsync.
Hi,
> > I should have mentioned that I for compatibility reasons still need to
> > be able to read/write indexes created with the old version, i.e., with
> > the 5.0 codec.
The old codecs are read-only! As said before, you can only specify the codec
for IndexWriter. That means new segemnts to al
Hi,
you have to define your own codec only during indexing, so you can just update
that for the migration. This then affects all new segments written to your
index.
To read indexes, Lucene will automatically load the codec based on the names
written to index files. If you want to open 5.x inde
Hi Adrien,
> If you move to Lucene 6.1, then this should be Lucene60Codec. More
> generally that would be the same codec that is returned by Codec.getDefault.
I should have mentioned that I for compatibility reasons still need to
be able to read/write indexes created with the old version, i.e., w
21 matches
Mail list logo