Re: Document serializable representation

Mikhail Khludnev Thu, 30 Mar 2017 02:10:08 -0700

I believe you can have more shards for indexing and then merge (and not
literally, but just by addIndexes() or so ) them to smaller number for
search. Transferring indices is more efficient (scp -C) than separate
tokens and their attributes over the wire.


On Thu, Mar 30, 2017 at 12:02 PM, Denis Bazhenov <dot...@gmail.com> wrote:

> We already have done this. Many years ago :)
>
> At the moment we have 7 shards. The problem with getting more shards is
> that search become less cost effective (in terms of cluster CPU time per
> request) as you split index in more shards. Considering response time is
> good enough and the fact search nodes are ~90% of all hardware budget of
> the cluster, it’s much more cost effective to split analysis from
> IndexWriter than split index in more shards. It simply would require from
> us to put disproportionately more hardware in cluster.
>
> > On Mar 30, 2017, at 18:36, Uwe Schindler <u...@thetaphi.de> wrote:
> >
> > What you would better do is to just split your index into multiple
> shards and have separate IndexWriter instances on different machines. Those
> can act on their own. This is what Elasticsearch or Solr are doing: They
> accept the document, decide which shard they should be located and transfer
> the plain fieldname:value pairs over the network. Each node then creates
> Lucene IndexableDocuments out of it and passes to their own IndexWriter.
>
> ---
> Denis Bazhenov <dot...@gmail.com>
>
>
>
>
>
>


-- 
Sincerely yours
Mikhail Khludnev

Re: Document serializable representation

Reply via email to