I would also recommend two column families. Storing the key as NxN would
require you to hit multiple machines to query for an entire row or column
with RandomPartitioner. Even with OPP you would need to pick row or columns
to order by and the other would require hitting multiple machines.  Two
column families avoids this and avoids any problems with choosing OPP.

On Thu, Dec 9, 2010 at 2:26 PM, Aaron Morton <aa...@thelastpickle.com>wrote:

> Am assuming you have one matrix and you know the dimensions. Also as you
> say the most important queries are to get an entire column or an entire row.
>
> I would consider using a standard CF for the Columns and one for the Rows.
>  The key for each would be the col / row number, each cassandra column name
> would be the id of the other dimension and the value whatever you want.
>
> - when storing the data update both the Column and Row CF
> - reading a whole row/col would be simply reading from the appropriate CF.
> - reading an intersection is a get_slice to either col or row CF using the
> column_names field to identify the other dimension.
>
> You would not need secondary indexes to serve these queries.
>
> Hope that helps.
> Aaron
>
> On 10 Dec, 2010,at 07:02 AM, Sébastien Druon <sdr...@spotuse.com> wrote:
>
> I mean if I have secondary indexes. Apparently they are calculated in the
> background...
>
> On 9 December 2010 18:33, David Boxenhorn <da...@lookin2.com> wrote:
>
>> What do you mean by indexing?
>>
>>
>> On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon <sdr...@spotuse.com>wrote:
>>
>>> Thanks a lot for the answer
>>>
>>> What about the indexing when adding a new element? Is it incremental?
>>>
>>> Thanks again
>>>
>>>
>>>
>>> On 9 December 2010 14:38, David Boxenhorn <da...@lookin2.com> wrote:
>>>
>>>> How about a regular CF where keys are n...@n ?
>>>>
>>>> Then, getting a matrix row would be the same cost as getting a matrix
>>>> column (N gets), and it would be very easy to add element N+1.
>>>>
>>>>
>>>>
>>>> On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon <sdr...@spotuse.com>wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> For a specific case, we are thinking about representing a N to N
>>>>> relationship with a NxN Matrix in Cassandra.
>>>>> The relations will be only between a subset of elements, so the Matrix
>>>>> will mostly contain empty elements.
>>>>>
>>>>> We have a set of questions concerning this:
>>>>> - what is the best way to represent this matrix? what would have the
>>>>> best performance in reading? in writing?
>>>>>   . a super column family with n column families, with n columns each
>>>>>   . a column family with n columns and n lines
>>>>>
>>>>> In the second case, we would need to extract 2 kinds of information:
>>>>> - all the relations for a line: this should be no specific problem;
>>>>> - all the relations for a column: in that case we would need an index
>>>>> for the columns, right? and then get all the lines where the value of the
>>>>> column in question is not null... is it the correct way to do?
>>>>> When using indexes, say we want to add another element N+1. What impact
>>>>> in terms of time would it have on the indexation job?
>>>>>
>>>>> Thanks a lot for the answers,
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Sébastien Druon
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to