>>1) "So if your node tokens are set as "vertexid_" all keys with the same prefix will be in the same range." Adding to Aaron's comment - This will be the case if you use OrderPreservingPartitioner. RandomPartitioner(the default) will distribute the tokens randomly across nodes.
On Mon, Nov 15, 2010 at 2:47 PM, Aaron Morton <aa...@thelastpickle.com>wrote: > Rows are distributed around the cluster according to the ordering from the > Partitioner used, and the Replication Strategy. All data for the same key > will be stored together, and then replicated RF times. > > To answer your questions... > 1) Each node is responsible for the keys between the previous nodes token > and it's own. So if your node tokens are set as "vertexid_" all keys with > the same prefix will be in the same range. Note that the row data will be > stored on RF replicas, and not just on the node with the appropriate token. > > 2) I *think* you want to look at > o.a.c.s.StorageService.getNaturalEndpoints() , this is not exposed to the > outside world though. However *every* read or write request is sent to all > replicas, even those at CL ONE. There is no concept of one node been the > only place that a row is stored. > > FWIW it sounds like you want to disable some of the fine work cassandra > does to ensure your data is replicated and available. By deciding that one > machine will be responsible for a portion of the data you are introducing a > single point of failure. Try writing your app against a cluster and let > cassandra take care of things, then dive into the details. For example I > cannot remember anyone on the list having serious issues with network > overhead. > > You may also want to consider flock db from twitter, it sits on top of a > sharded MySQL db https://github.com/twitter/flockdb > > <https://github.com/twitter/flockdb>Hope that helps. > Aaron > > > On 16 Nov, 2010,at 03:53 AM, Claudio Martella <claudio.marte...@tis.bz.it> > wrote: > > Hello list, > > I'm in the process of writing an application which uses cassandra as a > "storage" backend. The application is a graph database and it's supposed > to be a baseline application for further development in the field. > > The idea is to implement a property graph: a multigraph (multiple edges > connecting two vertices are possible) with properties in the form of > name/value for edges and vertices. The idea is to traverse the graph > with queries like "give me all the women that are liked by men i know", > something like: > > Vertex[name=claudio]=>outgoingEdge[type=knows]=>Vertex[gender=male]=>outgoingEdge[type=likes]=>Vertex[gender=female]. > This is basically a step by step expansion/filtering based on properties. > > In my architecture my application-logic node is coupled with the > cassandra node storing its data. I'd like to have some kind of "atomic > set" of data that is "granted" to be stored on the same cassandra node > (in my case the vertex, its adj list, its properties, its edges and > their properties), so that i can issue the required filtering and > expansion to a particular node which will issue the logic behind it (and > i can route such request with the same logic cassandra routes its > requests). > This is in an effort to (a) minimize network i/o (i'd be able to send > the query token to the application node which would issue a local get to > its local cassandra) and (b) distribute computation (i'd be able to > distribute filtering between all the nodes storing for example the > node's neighborhood). This is still not optimal, but it would be a good > start. > > For this reason i thought about a datamodel that has composite keys: > > vertexid and edgeid are uuids while propertyname is a string. > > CF vertices { > > vertexid_propertyname { > > propertyvalue: null > } > } > > > CF edges { > > vertexid_[in|out]_propertyname_edgeid { > > propertyvalue: othervertexid > } > } > > With this datamodel i could easily and efficiently issue slices and > ranges to cassandra with the equality predicates on properties i need. > What i need now is to partition my data on the prefix "vertexid_". Such > a datamodel does have a concept of "ascending ordering", so i thought > about OPP, but to my understanding OPP does not grant that all the data > starting with the same prefix will end up in the same cassandra node, > but only some of it. My set of data about a vertex could still be split > between two cassandra nodes in case the token ends up being a key in the > middle of the set, right? > > What i require exactly is: > > (1) to have all the rows belonging to the same vertexid (which is a > uuid) on the same cassandra node. Can i achieve this? > (2) given this partitioning, know the IP of the cassandra node storing > that vertex data, from outside of cassandra. This is the logic cassandra > uses to route requests for keys and i have to access it from outside. > > Can anybody comment about these? > > > Thanks > > > Claudio > > > Unit Research & Development - Analyst > > TIS innovation park > Via Siemens 19 | Siemensstr. 19 > 39100 Bolzano | 39100 Bozen > Tel. +39 0471 068 123 > Fax +39 0471 068 129 > claudio.marte...@tis.bz.it http://www.tis.bz.it > > Short information regarding use of personal data. According to Section 13 > of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we > process your personal data in order to fulfil contractual and fiscal > obligations and also to send you information regarding our services and > events. Your personal data are processed with and without electronic means > and by respecting data subjects' rights, fundamental freedoms and dignity, > particularly with regard to confidentiality, personal identity and the right > to personal data protection. At any time and without formalities you can > write an e-mail to priv...@tis.bz.it in order to object the processing of > your personal data for the purpose of sending advertising materials and also > to exercise the right to access personal data and other rights referred to > in Section 7 of Decree 196/2003. The data controller is TIS Techno > Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the > complete information on the web site www.tis.bz.it. > > >