Dear cassandra users,

We have the following structure in MySql:

documents->[doc_id(primary key), title, description]
elements->[element_id(primary key), doc_id(index), title, description]

Notation: table name->[column1(key or index), column2, …]

We want to transfer the data to Cassandra.

Each document can contain a large number of elements (between 1 and 100k+)

We have two requirements:
a) Load all elements for a given doc_id quickly
b) Update the value of one individual element quickly


We were thinking on the following cassandra configurations:

Option A

documents->[doc_id(primary key), title, description, elements] (elements
could be a SET or a TEXT, each time new elements are added (they are never
removed) we would append it to this column)
elements->[element_id(primary key), title, description]

Loading a document:

 a) Load document with given <doc_id> and get all element ids
    SELECT * from documents where doc_id=‘id’

 b) Load all elements with the given ids
    SELECT * FROM elements where element_id IN (ids loaded from query a)


Option B

documents->[doc_id(primary key), title, description]
elements->[element_id(primary key), doc_id(secondary index), title,
description]

Loading a document:
 a) SELECT * from elements where doc_id=‘id’


Neither solutions doesn’t seem to be good, in Option A, even if we are
querying by Primary keys, the second query will have 100k+ primary key id’s
in the WHERE clause, and the second solution looks like an anti pattern in
cassandra.

Could anyone give any advice how would we create a model for our use case?

Thank you in advance,
Zoltan.

Reply via email to