Querying by (doc_id and element_id ) OR just by (element_id) is fine, but the real question is, will it be efficient to query 100k+ primary keys in the elements table? e.g.
SELECT * FROM elements WHERE element_id IN (element_id1, element_id2, element_id3, .... element_id100K+) ? The elements_id is a primary key. Thank you? On Sun, Mar 26, 2017 at 11:35 PM, Matija Gobec <matija0...@gmail.com> wrote: > Have one table hold document metadata (doc_id, title, description, ...) > and have another table elements where partition key is doc_id and > clustering key is element_id. > Only problem here is if you need to query and/or update element just by > element_id but I don't know your queries up front. > > On Sun, Mar 26, 2017 at 10:16 PM, Zoltan Lorincz <zol...@gmail.com> wrote: > >> Dear cassandra users, >> >> We have the following structure in MySql: >> >> documents->[doc_id(primary key), title, description] >> elements->[element_id(primary key), doc_id(index), title, description] >> >> Notation: table name->[column1(key or index), column2, …] >> >> We want to transfer the data to Cassandra. >> >> Each document can contain a large number of elements (between 1 and >> 100k+) >> >> We have two requirements: >> a) Load all elements for a given doc_id quickly >> b) Update the value of one individual element quickly >> >> >> We were thinking on the following cassandra configurations: >> >> Option A >> >> documents->[doc_id(primary key), title, description, elements] (elements >> could be a SET or a TEXT, each time new elements are added (they are never >> removed) we would append it to this column) >> elements->[element_id(primary key), title, description] >> >> Loading a document: >> >> a) Load document with given <doc_id> and get all element ids >> SELECT * from documents where doc_id=‘id’ >> >> b) Load all elements with the given ids >> SELECT * FROM elements where element_id IN (ids loaded from query a) >> >> >> Option B >> >> documents->[doc_id(primary key), title, description] >> elements->[element_id(primary key), doc_id(secondary index), title, >> description] >> >> Loading a document: >> a) SELECT * from elements where doc_id=‘id’ >> >> >> Neither solutions doesn’t seem to be good, in Option A, even if we are >> querying by Primary keys, the second query will have 100k+ primary key id’s >> in the WHERE clause, and the second solution looks like an anti pattern in >> cassandra. >> >> Could anyone give any advice how would we create a model for our use case? >> >> Thank you in advance, >> Zoltan. >> >> >