I believe you can have more shards for indexing and then merge (and not literally, but just by addIndexes() or so ) them to smaller number for search. Transferring indices is more efficient (scp -C) than separate tokens and their attributes over the wire.
On Thu, Mar 30, 2017 at 12:02 PM, Denis Bazhenov <dot...@gmail.com> wrote: > We already have done this. Many years ago :) > > At the moment we have 7 shards. The problem with getting more shards is > that search become less cost effective (in terms of cluster CPU time per > request) as you split index in more shards. Considering response time is > good enough and the fact search nodes are ~90% of all hardware budget of > the cluster, it’s much more cost effective to split analysis from > IndexWriter than split index in more shards. It simply would require from > us to put disproportionately more hardware in cluster. > > > On Mar 30, 2017, at 18:36, Uwe Schindler <u...@thetaphi.de> wrote: > > > > What you would better do is to just split your index into multiple > shards and have separate IndexWriter instances on different machines. Those > can act on their own. This is what Elasticsearch or Solr are doing: They > accept the document, decide which shard they should be located and transfer > the plain fieldname:value pairs over the network. Each node then creates > Lucene IndexableDocuments out of it and passes to their own IndexWriter. > > --- > Denis Bazhenov <dot...@gmail.com> > > > > > > -- Sincerely yours Mikhail Khludnev