We being new to Lucene are struggling hard to dimension our application
w.r.t search throughput
As stated by "Mike McCandless" in the following thread, we had ran our
cases with restrictive data set also.
http://mail-archives.apache.org/mod_mbox/lucene-java-user/201701.mbox/raw/%3CCAL8Pwka4RC3c%2B%3D9wapH2%2B_JKSimnHmgRs3Eq6O_e7HrF6iBXPw%40mail.gmail.com%3E/

Results (with vCPUs: 4,  Threads: 8):
Total data ~3 Million.

Search with Condition #1 which matches 3 Million data set:
CPU Utilization : ~95%
Search throughput is VERY LOW (~10/sec)

Search with Condition #2 (Restrictive Condition) which matches only 25K
data set:
CPU Utilization dropped but is ~85%
Search throughput : 140/sec

My Observation:
#1 Search threads (inline with CPU cores) reduces CPU utilization.
#2 Restrictive data set increases throughput but still depends on total
index size.

Can we go with the above results, i.e. with restrictive data set of 25K
only, Lucene can give a max. throughput of 140/sec. (Average
I have no Apache Lucene figures to benchmark with?



Apache Lucene page elaborates more on Indexing capabilities in the
following link
http://home.apache.org/~mikemccand/lucenebench/indexing.html

BUT, Lucene page has no such elaboration for SEARCH capabilities which is
the heart of Lucene.
It would be nice if Lucene Community provides such elaborated figures as a
benchmark.


Regards
Rajnish


On Mon, Jan 9, 2017 at 7:51 AM, Michael Peterson <quu...@gmail.com> wrote:

> I know when I first came to Lucene and ran some tests/benchmarks on
> indexing and searching, I was surprised at how high the CPU usage was - I
> could easily saturate an 8 or 16 CPU system. I expected it to be strongly
> IO-bound and I've heard others express that surprise when they first learn
> about it or see it for themselves.
>
> So it may help the original poster to just set that expectation - that
> Lucene's work load on both the indexing and searching side is CPU intensive
> and by adding more indexing or search threads, CPU usage will increase. So
> users of Lucene need to do capacity planning of their systems with that in
> mind.
>
> -Michael Peterson
>
>
> On Sun, Jan 8, 2017 at 8:16 PM, Denis Bazhenov <dot...@gmail.com> wrote:
>
> > One should really put things in context. If those searches are some kind
> > of background workload (no users are waiting for search results), I’ll
> > agree 100% CPU utilization is kind of ideal situation. But if we’re
> > speaking of interactive system where there is user waiting behind each
> > search query, I’d say 100% CPU utilization is not a good place to be in.
> > It’s a almost guarantee for a resource starvation.
> >
> > So, one should distinguish those type of workloads.
> >
> > Also there is universal kind of recipe for those situations. Run a JVM
> > profiler/sampler (JMC, for example) and look where this time is spent. It
> > might have been GC.
> >
> > > On Jan 4, 2017, at 18:05, Adrien Grand <jpou...@gmail.com> wrote:
> > >
> > > Well, you could but that would not make sense, 100% CPU usage is really
> > the
> > > best you can get. Why would you like to make things worse artificially?
> >
> > ---
> > Denis Bazhenov <dot...@gmail.com>
> >
> >
> >
> >
> >
> >
>

Reply via email to