[jira] [Comment Edited] (SOLR-10317) Solr Nightly Benchmarks

Ishan Chattopadhyaya (JIRA) Wed, 29 Mar 2017 10:33:59 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-10317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947470#comment-15947470
 ]


Ishan Chattopadhyaya edited comment on SOLR-10317 at 3/29/17 5:33 PM:
----------------------------------------------------------------------

Here's a rough list of the top of my head. It would be good for a student to 
add to this list whatever I've missed out for the sake of completeness:
# Indexing benchmarks
## Standalone
## SolrCloud (various simple configurations (0) )
## new replication mode (SOLR-9835) *
# Various types of queries:
## Querying on numeric fields (exact queries, range queries)
## Querying on text fields
## Querying on string fields
## Sorting on numeric fields, string fields (with and without docValues)
## Extended Dismax queries
## Spatial search (using various strategies) *
# Query (all the above) on
## Standalone Solr
## SolrCloud (on some simple configurations (0) )
## Also, good if this can be tried out on the new replication mode (SOLR-9835). 
*
# Partial Updates benchmarks (atomic updates, in-place updates)
# Faceting (string fields, numeric fields, enum fields)
## JSON Faceting
## Classic faceting *
# Grouping (string fields, numeric fields, enum fields) *
# Spell check *

A Wikipedia based dataset is usually available on all the Jenkins instances, 
and could be used for the purpose. Any other suitable dataset is also welcome. 
[~steve_rowe], [~thetaphi], can you please point to the downloadable link for 
the enwiki.random.lines.txt file? (I have it, but forgot where I got it from).

If I've missed out something, please feel free to comment. [~viveknarang], 
please feel free to ask any follow up question on any of the above, if you 
don't have clarity. I've added asterisks around items that are nice to have, 
but not strictly necessary (in terms of GSoC evaluation criteria) -- something 
like stretch goals.

(0) - Some simple SolrCloud configurations could be:
# 1 shard, 2-3 replicas
# 2 shards, 1 replica each
# 2 shards, 2 replicas each


was (Author: ichattopadhyaya):
Here's a rough list of the top of my head. It would be good for a student to 
add to this list whatever I've missed out for the sake of completeness:
# Indexing benchmarks
## Standalone
## SolrCloud (various simple configurations (0) )
## new replication mode
# Various types of queries:
## Querying on numeric fields (exact queries, range queries)
## Querying on text fields
## Querying on string fields
## Sorting on numeric fields, string fields (with and without docValues)
## Extended Dismax queries
## Spatial search (using various strategies)
# Query (all the above) on
## Standalone Solr
## SolrCloud (on some simple configurations (0) )
## Also, good if this can be tried out on the new replication mode (SOLR-9835).
# Partial Updates benchmarks (atomic updates, in-place updates)
# Faceting (string fields, numeric fields, enum fields)
# Grouping (string fields, numeric fields, enum fields)
# Spell check

A Wikipedia based dataset is usually available on all the Jenkins instances, 
and could be used for the purpose. [~steve_rowe], [~thetaphi], can you please 
point to the downloadable link for the enwiki.random.lines.txt file? (I have 
it, but forgot where I got it from).

If I've missed out something, please feel free to comment.

(0) - Some simple SolrCloud configurations could be:
# 1 shard, 2-3 replicas
# 2 shards, 1 replica each
# 2 shards, 2 replicas each

> Solr Nightly Benchmarks
> -----------------------
>
>                 Key: SOLR-10317
>                 URL: https://issues.apache.org/jira/browse/SOLR-10317
>             Project: Solr
>          Issue Type: Task
>            Reporter: Ishan Chattopadhyaya
>              Labels: gsoc2017, mentor
>
> Solr needs nightly benchmarks reporting. Similar Lucene benchmarks can be 
> found here, https://home.apache.org/~mikemccand/lucenebench/.
> Preferably, we need:
> # A suite of benchmarks that build Solr from a commit point, start Solr 
> nodes, both in SolrCloud and standalone mode, and record timing information 
> of various operations like indexing, querying, faceting, grouping, 
> replication etc.
> # It should be possible to run them either as an independent suite or as a 
> Jenkins job, and we should be able to report timings as graphs (Jenkins has 
> some charting plugins).
> # The code should eventually be integrated in the Solr codebase, so that it 
> never goes out of date.
> There is some prior work / discussion:
> # https://github.com/shalinmangar/solr-perf-tools (Shalin)
> # https://github.com/chatman/solr-upgrade-tests/blob/master/BENCHMARKS.md 
> (Ishan/Vivek)
> # SOLR-2646 & SOLR-9863 (Mark Miller)
> # https://home.apache.org/~mikemccand/lucenebench/ (Mike McCandless)
> # https://github.com/lucidworks/solr-scale-tk (Tim Potter)
> There is support for building, starting, indexing/querying and stopping Solr 
> in some of these frameworks above. However, the benchmarks run are very 
> limited. Any of these can be a starting point, or a new framework can as well 
> be used. The motivation is to be able to cover every functionality of Solr 
> with a corresponding benchmark that is run every night.
> Proposing this as a GSoC 2017 project. I'm willing to mentor, and I'm sure 
> [~shalinmangar] and [[email protected]] would help here.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SOLR-10317) Solr Nightly Benchmarks

Reply via email to