[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

Yonik Seeley (JIRA) Mon, 28 Jan 2019 06:28:56 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754041#comment-16754041
 ]


Yonik Seeley commented on SOLR-13101:
-------------------------------------

Thinking about how to kick this off... 
At the most basic level, looking at the HDFS layout scheme we see this ("test" 
is the name of the collection):
{code}
local_file_system://.../node1/test_shard1_replica_n1/core.properties
hdfs://.../data/test/core_node2/data/
{code}
And core.properties looks like:
{code}
numShards=1
collection.configName=conf1
name=test_shard1_replica_n1
replicaType=NRT
shard=shard1
collection=test
coreNodeName=core_node2
{code}

It seems like the most basic desirable change would be to the naming scheme for 
collections with shared storage.
Instead of .../<collection_name>/<core_node_name>/data
it should be .../<collection_name>/<shard_name>/data
since there is only one canonical index per shard.



> Shared storage support in SolrCloud
> -----------------------------------
>
>                 Key: SOLR-13101
>                 URL: https://issues.apache.org/jira/browse/SOLR-13101
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>            Reporter: Yonik Seeley
>            Priority: Major
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>    - durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>    - could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>    - don't pay for what you don't need
>    - a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

Reply via email to