I assume that Reports is the Super column family, the first 1: is the
report id and in the topology is the row key, that the second 1: is
the report line and in the Cassandra topology the super column, and
that "value 1" is the column name. If this is not the case, maybe
explain the topology better.

> Can I get guarantees that all reports lines of one report will be
> located on the same node in such configuration?

Yes. If I understood the topology right each replica of a report will
be stored together on a single node (and even be stored in only a few
locations on disk if you do not update the reports much).

On Wed, Nov 9, 2011 at 04:47, Denis Gabaydulin <gaba...@gmail.com> wrote:
> Hi, first of all, let me say thank you for the the amazing product :-)
> So, I have a couple of questions about internal physical data layout.
>
> Suppose, I have the following data schema:
>
> Reports:{
>    1:{
>        1:{"value1":"some val", "value2":"some val"},
>        2:{"value1":"some val", "value2":"some val"}
>        ...
>    },
>    2:{
>        1:{"value1":"some val", "value2":"some val"},
>        2:{"value1":"some val", "value2":"some val"}
>        ...
>    }
>    ...
> }
>
> An each report is represented by a set of report records.
>
> Most of the data queries select report by id and all his report lines.
> I'm going to use the multiget super slice query with ranges(in term of
> Hector client) for it. Will it be efficient?
>
> Another question related with physical layout of the data. I'm going
> to apply SimpleStrategy with the random partitioner.
> The replication factor is 1 or 2(it depends on numbers of nodes in the
> production environment).
> Can I get guarantees that all reports lines of one report will be
> located on the same node in such configuration?
>

Reply via email to