I assume that Reports is the Super column family, the first 1: is the report id and in the topology is the row key, that the second 1: is the report line and in the Cassandra topology the super column, and that "value 1" is the column name. If this is not the case, maybe explain the topology better.
> Can I get guarantees that all reports lines of one report will be > located on the same node in such configuration? Yes. If I understood the topology right each replica of a report will be stored together on a single node (and even be stored in only a few locations on disk if you do not update the reports much). On Wed, Nov 9, 2011 at 04:47, Denis Gabaydulin <gaba...@gmail.com> wrote: > Hi, first of all, let me say thank you for the the amazing product :-) > So, I have a couple of questions about internal physical data layout. > > Suppose, I have the following data schema: > > Reports:{ > 1:{ > 1:{"value1":"some val", "value2":"some val"}, > 2:{"value1":"some val", "value2":"some val"} > ... > }, > 2:{ > 1:{"value1":"some val", "value2":"some val"}, > 2:{"value1":"some val", "value2":"some val"} > ... > } > ... > } > > An each report is represented by a set of report records. > > Most of the data queries select report by id and all his report lines. > I'm going to use the multiget super slice query with ranges(in term of > Hector client) for it. Will it be efficient? > > Another question related with physical layout of the data. I'm going > to apply SimpleStrategy with the random partitioner. > The replication factor is 1 or 2(it depends on numbers of nodes in the > production environment). > Can I get guarantees that all reports lines of one report will be > located on the same node in such configuration? >