[
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754041#comment-16754041
]
Yonik Seeley commented on SOLR-13101:
-------------------------------------
Thinking about how to kick this off...
At the most basic level, looking at the HDFS layout scheme we see this ("test"
is the name of the collection):
{code}
local_file_system://.../node1/test_shard1_replica_n1/core.properties
hdfs://.../data/test/core_node2/data/
{code}
And core.properties looks like:
{code}
numShards=1
collection.configName=conf1
name=test_shard1_replica_n1
replicaType=NRT
shard=shard1
collection=test
coreNodeName=core_node2
{code}
It seems like the most basic desirable change would be to the naming scheme for
collections with shared storage.
Instead of .../<collection_name>/<core_node_name>/data
it should be .../<collection_name>/<shard_name>/data
since there is only one canonical index per shard.
> Shared storage support in SolrCloud
> -----------------------------------
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
> Issue Type: New Feature
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCloud
> Reporter: Yonik Seeley
> Priority: Major
>
> Solr should have first-class support for shared storage (blob/object stores
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS,
> etc).
> The key component will likely be a new replica type for shared storage. It
> would have many of the benefits of the current "pull" replicas (not indexing
> on all replicas, all shards identical with no shards getting out-of-sync,
> etc), but would have additional benefits:
> - Any shard could become leader (the blob store always has the index)
> - Better elasticity scaling down
> - durability not linked to number of replcias.. a single replica could be
> common for write workloads
> - could drop to 0 replicas for a shard when not needed (blob store always
> has index)
> - Allow for higher performance write workloads by skipping the transaction
> log
> - don't pay for what you don't need
> - a commit will be necessary to flush to stable storage (blob store)
> - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with
> blob stores. We probably want one that treats local disk as a cache since
> the latency to remote storage is so large. I think there are still some
> "locking" issues to be solved here (ensuring that more than one writer to the
> same index won't corrupt it). This should probably be pulled out into a
> different JIRA issue.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]