The normal approach is denormalization to a materialized view (the traditional thinking of it not the new 3.0 feature coming out), which is also true of using an RBMS at scale (joins across all data sets get expensive once you start having to shard across different servers).
The simplistic idea is to build your tables to map your queries (or API calls if you want to be more advanced in your thinking) 1 for 1 with the following sets of constraints: 1. All data needed to satisfy that query exists in this table even if it also exists somewhere else. 2. The partition key (first part of the primary key) matching the where clause you'd like to use on this table for that query. 3. The clustering column defines order inside a given partition key. 4. Partitions should not be "too fat". This is a more advanced topic. Imagine the case of a user profile, I may choose to store all changes of that user profile in a "profile history" table. It would probably look like CREATE TABLE user_profile_history ( user_id uuid, ts timestamp, change text, PRIMARY KEY( user_id, ts)) In this case I'll get a partition key of user_id, a timestamp of when the change occurred as the clustering column giving me an implied order of ascending, and the change in a text field. Practical considerations for updating and keeping these tables in sync are myriad. Starting out it's probably easiest to have one table be the "source of truth" and all other views derived off that single source of truth, either at write time, or with batches running throughout the day, or if latency is a concern streaming (you can do streaming and batching for a particularly potent combination). This pattern I see frequently and have pushed it's use to good effect many times. The following blog provides some great introductory ideas on data modeling http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling On Fri, Sep 4, 2015 at 4:15 PM, srinivas s <ssaitalat...@gmail.com> wrote: > > 1. Hi All, > > > > 1. I am trying to model RDBMS joins into cassandra. As I am new to > cassandra, I need your help/suggestion on this. Below is the information > regarding the query: > 2. > 3. I have a query in RDBMS as follows: > 4. > 5. select t3.name from Table1 t1, Table2 t2, Table3 t3, Table4 t4 > where > 6. t2.cust_id = 3 and t4.sid = t1.sid and t1.colid = t2.colid and > t4.cid = t3.cid > 7. > 8. > 9. Now, trying to make a shimilar query in cassandra: > 10. > 11. As per my learning experience in Cassandra, I got the below 2 > solutions: (as cassandra does not support joins) > 12. > 13. ****Solution 1:***** > 14. > 15. 1) Fetch all the records with t2.cust_id = 3 > 16. 2) Now again run another query that will do the condition t3.sid = > t1.sid on the results returned from point 1. > 17. 3) continue the same for all the conditions. > 18. > 19. Drawbacks with this approach: > 20. > 21. For each join, I have to do a network call to fetch the details. > Also, it will take more time..as I am running multiple conditions > 22. > 23. > 24. ****Solution 2: ***** > 25. > 26. 1) Create a map table for every possible join. > 27. > 28. Drawbacks with this aproach: > 29. > 30. I think, this is not a right approach. So join to table (map > table) mapping idea is not right. > 31. > 32. pastebin link for the same: http://pastebin.com/FRAyihPT > 33. Please suggest me on this. > > > > -- Thanks, Ryan Svihla