[jira] [Commented] (SOLR-11299) Time partitioned collections (umbrella issue)

David Smiley (JIRA) Fri, 13 Oct 2017 13:57:40 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-11299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16204175#comment-16204175
 ]


David Smiley commented on SOLR-11299:
-------------------------------------

Nice article on some of these ideas by [[email protected]]
https://blog.cloudera.com/blog/2013/10/collection-aliasing-near-real-time-search-for-really-big-data/
   4 years old but still relevant.

> Time partitioned collections (umbrella issue)
> ---------------------------------------------
>
>                 Key: SOLR-11299
>                 URL: https://issues.apache.org/jira/browse/SOLR-11299
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>            Reporter: David Smiley
>            Assignee: David Smiley
>
> Solr ought to have the ability to manage large-scale time-series data (think 
> logs or sensor data / IOT) itself without a lot of manual/external work.  The 
> most naive and painless approach today is to create a collection with a high 
> numShards with hash routing but this isn't as good as partitioning the 
> underlying indexes by time for these reasons:
> * Easy to scale up/down horizontally as data/requirements change.  (No need 
> to over-provision, use shard splitting, or re-index with different config)
> * Faster queries: 
>     ** can search fewer shards, reducing overall load
>     ** realtime search is more tractable (since most shards are stable -- 
> good caches)
>     ** "recent" shards (that might be queried more) can be allocated to 
> faster hardware
>     ** aged out data is simply removed, not marked as deleted.  Deleted docs 
> still have search overhead.
> * Outages of a shard result in a degraded but sometimes a useful system 
> nonetheless (compare to random subset missing)
> Ideally you could set this up once and then simply work with a collection 
> (potentially actually an alias) in a normal way (search or update), letting 
> Solr handle the addition of new partitions, removing of old ones, and 
> appropriate routing of requests depending on their nature.
> This issue is an umbrella issue for the particular tasks that will make it 
> all happen -- either subtasks or issue linking.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-11299) Time partitioned collections (umbrella issue)

Reply via email to