Re: Re: Data model storage optimization

2018-07-30 Thread James Shaw
considering: row size large or not update a lot or not - update is insert actually read heavy or not overall read performance if row size large , you may consider table:user_detail , add column id in all tables. In application side, merge/join by id. But paid read price, 2nd query to user_de

Fwd: Re: Data model storage optimization

2018-07-29 Thread onmstester onmstester
How many rows in average per partition? around 10K. Let me get this straight : You are bifurcating your partitions on either email or username , essentially potentially doubling the data because you don’t have a way to manage a central system of record of users ? We are just analyzing output log

Re: Data model storage optimization

2018-07-29 Thread Rahul Singh
How many rows in average per partition? Let me get this straight : You are bifurcating your partitions on either email or username , essentially potentially doubling the data because you don’t have a way to manage a central system of record of users ? I would do this: (my opinion) Migrate to a

Re: Data Model Suggestion Required

2017-07-11 Thread Siddharth Prakash Singh
Thanks Jeff for suggestions. On Mon, Jul 10, 2017 at 9:50 PM Jeff Jirsa wrote: > > > On 2017-07-10 07:13 (-0700), Siddharth Prakash Singh > wrote: > > I am planning to build a user activity timeline. Users on our system > > generates different kind of activity. For example - Search some product

Re: Data Model Suggestion Required

2017-07-10 Thread Jeff Jirsa
On 2017-07-10 07:13 (-0700), Siddharth Prakash Singh wrote: > I am planning to build a user activity timeline. Users on our system > generates different kind of activity. For example - Search some product, > Calling our sales team, Marking favourite etc. > Now I would like to generate timeline

Re: Data model suggestions

2015-04-27 Thread Laing, Michael
ndra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__auto_snapshot >> >> >> >> >> >> *From:* Ali Akhtar [mailto:ali.rac...@gmail.com] >> *Sent:* Sunday, April 26, 2015 10:31 PM >> >> *To:* user@cassandra.apache.org >> *Subjec

Re: Data model suggestions

2015-04-27 Thread Ali Akhtar
...@gmail.com] > *Sent:* Sunday, April 26, 2015 10:31 PM > > *To:* user@cassandra.apache.org > *Subject:* Re: Data model suggestions > > > > Thanks Peer. I like the approach you're suggesting. > > > > Why do you recommend truncating the last active table rat

RE: Data model suggestions

2015-04-26 Thread Peer, Oded
/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__auto_snapshot From: Ali Akhtar [mailto:ali.rac...@gmail.com] Sent: Sunday, April 26, 2015 10:31 PM To: user@cassandra.apache.org Subject: Re: Data model suggestions Thanks Peer. I like the approach you're sugge

Re: Data model suggestions

2015-04-26 Thread Ali Akhtar
Sharma [mailto:narendra.sha...@gmail.com] > *Sent:* Friday, April 24, 2015 6:53 AM > *To:* user@cassandra.apache.org > *Subject:* Re: Data model suggestions > > > > I think one table say record should be good. The primary key is record id. > This will ensure good distribution. > Just update

Re: Data model suggestions

2015-04-26 Thread Shahab Yunus
ting > doesn’t create automatic snapshots. > > > > > > *From:* Narendra Sharma [mailto:narendra.sha...@gmail.com] > *Sent:* Friday, April 24, 2015 6:53 AM > *To:* user@cassandra.apache.org > *Subject:* Re: Data model suggestions > > > > I think one table say record

RE: Data model suggestions

2015-04-26 Thread Peer, Oded
snapshots. From: Narendra Sharma [mailto:narendra.sha...@gmail.com] Sent: Friday, April 24, 2015 6:53 AM To: user@cassandra.apache.org Subject: Re: Data model suggestions I think one table say record should be good. The primary key is record id. This will ensure good distribution. Just update

Re: Data model suggestions

2015-04-23 Thread Narendra Sharma
I think one table say record should be good. The primary key is record id. This will ensure good distribution. Just update the active attribute to true or false. For range query on active vs archive records maintain 2 indexes or try secondary index. On Apr 23, 2015 1:32 PM, "Ali Akhtar" wrote: >

Re: Data model suggestions

2015-04-23 Thread Ali Akhtar
Good point about the range selects. I think they can be made to work with limits, though. Or, since the active records will never usually be > 500k, the ids may just be cached in memory. Most of the time, during reads, the queries will just consist of select * where primaryKey = someValue . One ro

Re: Data model suggestions

2015-04-23 Thread Manoj Khangaonkar
Hi, If your external API returns active records, that means I am guessing you need to do a select * on the active table to figure out which records in the table are no longer active. You might be aware that range selects based on partition key will timeout in cassandra. They can however be made t

Re: Data model suggestions

2015-04-23 Thread Ali Akhtar
That's returned by the external API we're querying. We query them for active records, if a previous active record isn't included in the results, that means its time to archive that record. On Thu, Apr 23, 2015 at 9:20 PM, Manoj Khangaonkar wrote: > Hi, > > How do you determine if the record is n

Re: Data model suggestions

2015-04-23 Thread Manoj Khangaonkar
Hi, How do you determine if the record is no longer active ? Is it a perioidic process that goes through every record and checks when the last update happened ? regards On Thu, Apr 23, 2015 at 8:09 AM, Ali Akhtar wrote: > Hey all, > > We are working on moving a mysql based application to Cassa

Re: Data model for streaming a large table in real time.

2014-06-08 Thread Kevin Burton
load balancing... Sequential writes can cause hot spots... > Uneven load balancing for multiple tables” > > -- Jack Krupansky > > *From:* Kevin Burton > *Sent:* Saturday, June 7, 2014 1:27 PM > *To:* user@cassandra.apache.org > *Subject:* Re: Data model for streaming a

Re: Data model for streaming a large table in real time.

2014-06-08 Thread Jack Krupansky
balancing for multiple tables” -- Jack Krupansky From: Kevin Burton Sent: Saturday, June 7, 2014 1:27 PM To: user@cassandra.apache.org Subject: Re: Data model for streaming a large table in real time. I just checked the source and in 2.1.0 it's not deprecated. So it *might* be *

Re: Data model for streaming a large table in real time.

2014-06-08 Thread Robert Stupp
You do not Need RAID0 for data. Let C* do striping over data disks. And maybe CL ANY/ONE might be sufficient for your writes. > Am 08.06.2014 um 06:15 schrieb Kevin Burton : > > we're using containers for other reasons, not just cassandra. > > Tightly constraining resources means we don't hav

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Kevin Burton
we're using containers for other reasons, not just cassandra. Tightly constraining resources means we don't have to worry about cassandra , the JVM , or Linux doing something silly and using too many resources and taking down the whole box. On Sat, Jun 7, 2014 at 8:25 PM, Colin Clark wrote: >

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Colin Clark
You won't need containers - running one instance of Cassandra in that configuration will hum along quite nicely and will make use of the cores and memory. I'd forget the raid anyway and just mount the disks separately (jbod) -- Colin 320-221-9531 On Jun 7, 2014, at 10:02 PM, Kevin Burton wrote

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Colin Clark
Write Consistency Level + Read Consistency Level > Replication Factor ensure your reads will read consistently and having 3 nodes lets you achieve redundancy in event of node failure. So writing with CL of local quorum and reading with CL of local quorum (2+2>3) with replication factor of 3 ensure

Re: Data model for streaming a large table in real time.

2014-06-07 Thread James Campbell
This is a basic question, but having heard that advice before, I'm curious about why the minimum recommended replication factor is three? Certainly additional redundancy, and, I believe, a minimum threshold for paxos. Are there other reasons? On Jun 7, 2014 10:52 PM, Colin wrote: To have any r

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Kevin Burton
Right now I'm just putting everything together as a proof of concept… so just two cheap replicas for now. And it's at 1/1th of the load. If we lose data it's ok :) I think our config will be 2-3x 400GB SSDs in RAID0 , 3 replicas, 16 cores, probably 48-64GB of RAM each box. Just one datacent

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Colin
To have any redundancy in the system, start with at least 3 nodes and a replication factor of 3. Try to have at least 8 cores, 32 gig ram, and separate disks for log and data. Will you be replicating data across data centers? -- Colin 320-221-9531 > On Jun 7, 2014, at 9:40 PM, Kevin Burton w

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Kevin Burton
Oh.. To start with we're going to use from 2-10 nodes.. I think we're going to take the original strategy and just to use 100 buckets .. 0-99… then the timestamp under that.. I think it should be fine and won't require an ordered partitioner. :) Thanks! On Sat, Jun 7, 2014 at 7:38 PM, Colin Cl

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Colin Clark
With 100 nodes, that ingestion rate is actually quite low and I don't think you'd need another column in the partition key. You seem to be set in your current direction. Let us know how it works out. -- Colin 320-221-9531 On Jun 7, 2014, at 9:18 PM, Kevin Burton wrote: What's 'source' ? You

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Kevin Burton
What's 'source' ? You mean like the URL? If source too random it's going to yield too many buckets. Ingestion rates are fairly high but not insane. About 4M inserts per hour.. from 5-10GB… On Sat, Jun 7, 2014 at 7:13 PM, Colin Clark wrote: > Not if you add another column to the partition key

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Colin Clark
Not if you add another column to the partition key; source for example. I would really try to stay away from the ordered partitioner if at all possible. What ingestion rates are you expecting, in size and speed. -- Colin 320-221-9531 On Jun 7, 2014, at 9:05 PM, Kevin Burton wrote: Thanks fo

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Kevin Burton
Thanks for the feedback on this btw.. .it's helpful. My notes below. On Sat, Jun 7, 2014 at 5:14 PM, Colin Clark wrote: > No, you're not-the partition key will get distributed across the cluster > if you're using random or murmur. > Yes… I'm aware. But in practice this is how it will work… I

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Colin Clark
No, you're not-the partition key will get distributed across the cluster if you're using random or murmur. You could also ensure that by adding another column, like source to ensure distribution. (Add the seconds to the partition key, not the clustering columns) I can almost guarantee that if you

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Kevin Burton
well you could add milliseconds, at best you're still bottlenecking most of your writes one one box.. maybe 2-3 if there are ones that are lagging. Anyway.. I think using 100 buckets is probably fine.. Kevin On Sat, Jun 7, 2014 at 2:45 PM, Colin wrote: > The add seconds to the bucket. Also,

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Colin
The add seconds to the bucket. Also, the data will get cached-it's not going to hit disk on every read. Look at the key cache settings on the table. Also, in 2.1 you have even more control over caching. -- Colin 320-221-9531 > On Jun 7, 2014, at 4:30 PM, Kevin Burton wrote: > > >> On Sat

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Kevin Burton
On Sat, Jun 7, 2014 at 1:34 PM, Colin wrote: > Maybe it makes sense to describe what you're trying to accomplish in more > detail. > > Essentially , I'm appending writes of recent data by our crawler and sending that data to our customers. They need to sync to up to date writes…we need to get th

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Colin
Maybe it makes sense to describe what you're trying to accomplish in more detail. A common bucketing approach is along the lines of year, month, day, hour, minute, etc and then use a timeuuid as a cluster column. Depending upon the semantics of the transport protocol you plan on utilizing, e

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Kevin Burton
Another way around this is to have a separate table storing the number of buckets. This way if you have too few buckets, you can just increase them in the future. Of course, the older data will still have too few buckets :-( On Sat, Jun 7, 2014 at 11:09 AM, Kevin Burton wrote: > > On Sat, Jun

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Kevin Burton
On Sat, Jun 7, 2014 at 10:41 AM, Colin Clark wrote: > It's an anti-pattern and there are better ways to do this. > > Entirely possible :) It would be nice to have a document with a bunch of common cassandra design patterns. I've been trying to track down a pattern for this and a lot of this is

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Colin Clark
It's an anti-pattern and there are better ways to do this. I have implemented the paging algorithm you've described using wide rows and bucketing. This approach is a more efficient utilization of Cassandra's built in wholesome goodness. Also, I wouldn't let any number of clients (huge) connect d

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Kevin Burton
I just checked the source and in 2.1.0 it's not deprecated. So it *might* be *being* deprecated but I haven't seen anything stating that. On Sat, Jun 7, 2014 at 8:03 AM, Colin wrote: > I believe Byteorderedpartitioner is being deprecated and for good reason. > I would look at what you could a

Re: Data model for streaming a large table in real time.

2014-06-07 Thread DuyHai Doan
"One node would take all the load, followed by the next node" --> with this design, you are not exploiting all the power of the cluster. If only one node takes all the load at a time, what is the point having 20 or 10 nodes ? You'd better off using limited wide row with bucketing to achieve this

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Colin
I believe Byteorderedpartitioner is being deprecated and for good reason. I would look at what you could achieve by using wide rows and murmur3partitioner. -- Colin 320-221-9531 > On Jun 6, 2014, at 5:27 PM, Kevin Burton wrote: > > We have the requirement to have clients read from our tabl

Re: Data model for boolean attributes

2014-03-22 Thread James Rothering
Hi Duy: The compound partition key seems perfect, but you say that pagination isn't possible with it: why is that? Regards, James On Sat, Mar 22, 2014 at 10:40 AM, DuyHai Doan wrote: > Ben > > > > When you say beware of the cardinality, do you think that the > cardinality is too low in this

Re: Data model for boolean attributes

2014-03-22 Thread DuyHai Doan
Ben > When you say beware of the cardinality, do you think that the cardinality is too low in this instance? Secondary indexes in C* are distributed across all the nodes containing actual data so somehow it helps avoiding hot spots. However, since there are only 2 values for your boolean flag, e

Re: Data model for boolean attributes

2014-03-21 Thread Ben Hood
Hey Duy Hai, On Fri, Mar 21, 2014 at 7:34 PM, DuyHai Doan wrote: > Your previous "select * from x where flag = true;" translate into: > > SELECT * FROM x WHERE id=... AND flag = true > > Of course, you'll need to provide the id in any case. This is an interesting option, though this app needs

Re: Data model for boolean attributes

2014-03-21 Thread Ben Hood
On Sat, Mar 22, 2014 at 3:32 AM, Ben Hood <0x6e6...@gmail.com> wrote: > Also a very good point. The main query paths the app needs to support are: > > select * from x where flag=true and id = ? and timestamp >= ? and timestamp > <= ? > select * from x where flag=false and id = ? and timestamp >= ?

Re: Data model for boolean attributes

2014-03-21 Thread Ben Hood
On Sat, Mar 22, 2014 at 1:31 AM, Laing, Michael wrote: > Whoops now there are only 2 partition keys! Not good if you have any > reasonable number of rows... Yes, this column family will have a large number of rows. > I monitor partition sizes and shard enough to keep them reasonable in this > so

Re: Data model for boolean attributes

2014-03-21 Thread Laing, Michael
Of course what you really want is this: create table x( id text, timestamp timeuuid, flag boolean, // other fields primary key (flag, id, timestamp) ) Whoops now there are only 2 partition keys! Not good if you have any reasonable number of rows... Faced with a situation like this (alt

Re: Data model for boolean attributes

2014-03-21 Thread DuyHai Doan
Hello Ben Try the following alternative with composite partition key to encode the dual states of the boolean: create table x( id text, flag boolean, timestamp timeuuid, // other fields primary key (*(id,flag)* timestamp) ) Your previous "select * from x where flag = true;" transla

Re: data model question : finding out the n most recent changes items

2013-07-11 Thread Eric Stevens
I think there is not an extremely simple solution to your problem. You will probably need to use multiple tables to get the view you need. One keyed just by file UUID, which tracks some basic metadata about the file including the last modified time. Another as a materialized view of the most rece

Re: data model question : finding out the n most recent changes items

2013-07-11 Thread Jimmy Lin
t; > > > > -Original Message- > From: y2k...@gmail.com on behalf of Jimmy Lin > Sent: Thu 11-Jul-13 13:09 > To: user@cassandra.apache.org > Subject: Re: data model question : finding out the n most recent changes > items > > what I mean is, I really just w

RE: data model question : finding out the n most recent changes items

2013-07-11 Thread Lohith Samaga M
-Original Message- From: y2k...@gmail.com on behalf of Jimmy Lin Sent: Thu 11-Jul-13 13:09 To: user@cassandra.apache.org Subject: Re: data model question : finding out the n most recent changes items what I mean is, I really just want the last modified date instead of series of timestamp and still

Re: data model question : finding out the n most recent changes items

2013-07-11 Thread Jimmy Lin
what I mean is, I really just want the last modified date instead of series of timestamp and still able to sort or order by it. (maybe I should rephrase my question as how to sort or order by last modified column in a row) CREATE TABLE user_file ( user_id uuid, modified_date timest

Re: data model question : finding out the n most recent changes items

2013-07-11 Thread aaron morton
What you described this sounds like the most appropriate: CREATE TABLE user_file ( user_id uuid, modified_date timestamp, file_id timeuuid, PRIMARY KEY(user_id, modified_date) ); If you normally need more information about the file then either store that as addit

Re: Data model for financial time series

2013-06-29 Thread Oleksandr Petrov
You can refer to the Data Modelling guide here: http://clojurecassandra.info/articles/data_modelling.html It includes several things you've mentioned (namely, range queries and dynamic tables). Also, it seems that it'd be useful for you to use indexes, and performing filtering (for things related

Re: Data model for financial time series

2013-06-07 Thread Jake Luciani
We have built a similar system, you can ready about our data model in CQL3 here: http://www.slideshare.net/carlyeks/nyc-big-tech-day-2013 We are going to be presenting a similar talk next week at the cassandra summit. On Fri, Jun 7, 2013 at 12:34 PM, Davide Anastasia < davide.anasta...@qualityc

Re: Data Model and Query

2013-04-05 Thread Hiller, Dean
for you though. Dean From: aaron morton mailto:aa...@thelastpickle.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Date: Friday, April 5, 2013 10:59 AM To: "user@cassandra.apache.org<mailto:use

Re: Data Model and Query

2013-04-05 Thread aaron morton
> Whats the recommendation on querying a data model like StartDate > “X” and > counter > “Y” . > > it's not possible. If you are using secondary indexes you have to have an equals clause in the statement. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @a

Re: data model to store large volume syslog

2013-03-13 Thread Aaron Turner
On Wed, Mar 13, 2013 at 4:23 AM, Mohan L wrote: > > > On Fri, Mar 8, 2013 at 9:42 PM, aaron morton > wrote: >> >> > 1). create a column family 'cfrawlog' which stores raw log as received. >> > row key could be 'ddmmhh'(new row is added for each hour or less), each >> > 'column name' is uuid w

Re: data model to store large volume syslog

2013-03-13 Thread Mohan L
On Fri, Mar 8, 2013 at 9:42 PM, aaron morton wrote: > > 1). create a column family 'cfrawlog' which stores raw log as received. > row key could be 'ddmmhh'(new row is added for each hour or less), each > 'column name' is uuid with 'value' is raw log data. Since we are also going > to use this

Re: data model to store large volume syslog

2013-03-08 Thread aaron morton
> 1). create a column family 'cfrawlog' which stores raw log as received. row > key could be 'ddmmhh'(new row is added for each hour or less), each > 'column name' is uuid with 'value' is raw log data. Since we are also going > to use this log for forensics purpose, so it will help us to hav

RE: data model to store large volume syslog

2013-03-07 Thread moshe.kranc
Row key based on hour will create hot spots for write - for an entire hour, all the writes will be going to the same node, i.e., the node where the row resides. You need to come up with a row key that distributes writes evenly across all your C* nodes, e.g., time concatenated with a sequence cou

Re: data model advice needed

2013-02-28 Thread Michal Michalski
13 19:12 To: user@cassandra.apache.org Subject: Re: data model advice needed One possibility would be to use dynamic columns, with each column name being a composite made from a timestamp, and the value of each containing serialized json of the details. The host could be the key. Then you could

RE: data model advice needed

2013-02-27 Thread Sloot, Hans-Peter
19:12 To: user@cassandra.apache.org Subject: Re: data model advice needed One possibility would be to use dynamic columns, with each column name being a composite made from a timestamp, and the value of each containing serialized json of the details. The host could be the key. Then you could slice

Re: data model advice needed

2013-02-27 Thread Hiller, Dean
There are many different patterns in noSQL with 90% being different than an RDBMS. Check out this page for some things to get you thinking http://buffalosw.com/wiki/Patterns-Page/ If you ever consider playorm and you can figure out how to partition your data(perhaps by month), you can do querie

Re: data model advice needed

2013-02-27 Thread kadey
One possibility would be to use dynamic columns, with each column name being a composite made from a timestamp, and the value of each containing serialized json of the details. The host could be the key. Then you could slice the data by column name. Ken - Original Message - Fro

Re: Data Model - Additional Column Families or one CF?

2013-02-26 Thread Edward Capriolo
February 26, 2013 12:27 AM >> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" >> mailto:user@cassandra.apache.org>> >> Subject: Re: Data Model - Additional Column Families or one CF? >> >> Aaron, >> >> Would 50 CF

Re: Data Model - Additional Column Families or one CF?

2013-02-26 Thread Javier Sotelo
e.org>" < > user@cassandra.apache.org<mailto:user@cassandra.apache.org>> > Date: Tuesday, February 26, 2013 12:27 AM > To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" < > user@cassandra.apache.org<mailto:user@cassandra.apache.org&

Re: Data Model - Additional Column Families or one CF?

2013-02-26 Thread Hiller, Dean
;user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: Re: Data Model - Additional Column Families or one CF? Aaron, Would 50 CFs be pushing it? According to http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-an

Re: Data Model - Additional Column Families or one CF?

2013-02-26 Thread Hiller, Dean
ruary 26, 2013 12:27 AM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: Re: Data Model - Additional Column Families or one CF? Aaron, Would 50 CFs be pushing it? According to http://www.datastax.com/dev/blog/wh

Re: Data Model - Additional Column Families or one CF?

2013-02-26 Thread Raman
Greetings! Thank you very much sharing your insight and experience. I am trying to migrate a normalized Schema -- 1 TB database. The data is hierarchical... child entities carry foreign keys to the parent entities. There are several instances like ShapeTable, Circle, Square, Rectangle etc...

Re: Data Model - Additional Column Families or one CF?

2013-02-25 Thread Javier Sotelo
Aaron, Would 50 CFs be pushing it? According to http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management, "This has been tested to work across hundreds or even thousands of ColumnFamilies." What is the bottleneck, IO? Thanks, Javier On Sun, Feb 24,

Re: Data Model - Additional Column Families or one CF?

2013-02-24 Thread Adam Venturella
Thanks Aaron, this was a big help! — Sent from Mailbox for iPhone On Thu, Feb 21, 2013 at 9:27 AM, aaron morton wrote: > If you have a limited / known number (say < 30) of types, I would create a > CF for each of them. > If the number of types is unknown or very large I would have one CF with

Re: Data Model - Additional Column Families or one CF?

2013-02-21 Thread aaron morton
If you have a limited / known number (say < 30) of types, I would create a CF for each of them. If the number of types is unknown or very large I would have one CF with the row key you described. Generally I avoid data models that require new CF's as the data grows. Additionally having diffe

Re: Data Model Review

2012-12-20 Thread Adam Venturella
In the case without CQL3, where I would use composite columns, I see how this sort of lines up with what CQL3 is doing. I don't have the ability to use CQL3 as I am using pycassa for my client, so that leaves me with CompositeColumns Under composite columns, I would have 1 row, which would be sto

Re: Data Model Review

2012-12-18 Thread aaron morton
> I have heard it best to try and avoid the use of super columns for now. Yup. Your model makes sense. If you are creating the CF using the cassandra-cli you will probably want to reverse order the column names see http://thelastpickle.com/2011/10/03/Reverse-Comparators/ If you want to use CQ

Re: Data Model - Consistency question

2012-09-19 Thread Hiller, Dean
Thinking a little more on your issue, you can also do that in playroom as OneToMany is represented with a few columns in the owning table/entity unlike JPA and RDBMS. Ie. Student.java { List - These course primary keys are saved one per column in the student's row } Course.java { List - These

Re: Data Model - Consistency question

2012-09-19 Thread Hiller, Dean
Yes, this scenario can occur(even with quorum writes/reads as you are dealing with different rows) as one write may be complete and the other not while someone else is reading from the cluster. Generally though, you can do read repair when you read it in ;). Ie. See if things are inconsistent

Re: Data Model

2012-09-14 Thread Hiller, Dean
lt;mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Date: Friday, September 14, 2012 3:00 AM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: Re: Data Model Consider a course_students col fami

Re: Data Model

2012-09-14 Thread aaron morton
> Consider a course_students col family which gives a list of students for a > course I would use two CF's: Course CF: * Each row is one course * Columns are the properties and values of the course CourseEnrolements CF * Each row is one course * Column name is th

Re: Data Model

2012-09-13 Thread Michael Morris
I'm fairly new to Cassandra myself, but had to solve a similar problem. If ordering of the student number values is not important to you, you can store them as UTF8 values (Ascii would work too, may be a better choice?), and the resulting columns would be sorted by the lexical ordering of the nume

Re: Data Model

2012-09-13 Thread Soumya Acharya
I just started learning Cassandra any suggestion where to start with ?? Thanks Soumya On Thu, Sep 13, 2012 at 10:54 AM, Roshni Rajagopal < roshni_rajago...@hotmail.com> wrote: > I want to learn how we can model a mix of static and dynamic columns in > a family. > > Consider a course_students c

Re: Data model question, storing Queue Message

2012-04-30 Thread aaron morton
> Isn't kafka too young for production using purpose ? The best way to advance the project is to use it and contribute your experience and time. btw, checking out kafka is a great idea. There are people around having Fun Times with Kafka in production Cheers - Aaron Morton Fre

Re: Data model question, storing Queue Message

2012-04-30 Thread Morgan Segalis
Isn't kafka too young for production using purpose ? Clearly that would fit much better my needs but I can't afford early stage project not ready for production. Is it ? Le 30 avr. 2012 à 14:28, samal a écrit : > > > On Mon, Apr 30, 2012 at 5:52 PM, Morgan Segalis wrote: > Hi Samal, > > Th

Re: Data model question, storing Queue Message

2012-04-30 Thread samal
On Mon, Apr 30, 2012 at 5:52 PM, Morgan Segalis wrote: > Hi Samal, > > Thanks for the TTL feature, I wasn't aware of it's existence. > > Day's partitioning will be less wider than month partitionning (about 30 > times less give or take ;-) ) > Per day it should have something like 100 000 message

Re: Data model question, storing Queue Message

2012-04-30 Thread Morgan Segalis
Hi Samal, Thanks for the TTL feature, I wasn't aware of it's existence. Day's partitioning will be less wider than month partitionning (about 30 times less give or take ;-) ) Per day it should have something like 100 000 messages stored, most of it would be retrieved so deleted before the TTL f

Re: Data model question, storing Queue Message

2012-04-30 Thread samal
On Mon, Apr 30, 2012 at 4:25 PM, Morgan Segalis wrote: > Hi Aaron, > > Thank you for your answer, I was beginning to think that my question would > never be answered ;-) > > Actually, this is what I was going for, except one thing, instead of > partitioning row per month, I though about partition

Re: Data model question, storing Queue Message

2012-04-30 Thread Morgan Segalis
Hi Aaron, Thank you for your answer, I was beginning to think that my question would never be answered ;-) Actually, this is what I was going for, except one thing, instead of partitioning row per month, I though about partitioning per day, like that everyday I launch the cleaning tool, and it

Re: Data model question, storing Queue Message

2012-04-29 Thread aaron morton
Message Queue is often not a great use case for Cassandra. For information on how to handle high delete workloads see http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra It hard to create a model without some idea of the data load, but I would suggest you start with: CF: Us

Re: data model question

2012-03-12 Thread Tamar Fraenkel
Thanks! Better than mine, as it considered later additions of services! Will update my code, Thanks *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Mon, Mar 12, 2012 at 11:

Re: data model question

2012-03-12 Thread Sasha Dolgy
Alternate would be to add another row to your user CF specific for Facebook ids. Column ID would be the Facebook identifier and value would be your internal uuid. Consider when you want to add another service like twitter. Will you then add another CF per service or just another row specific now

Re: data model question

2012-03-12 Thread aaron morton
In this case, where you know the query upfront, I add a custom secondary index using another CF to support the query. It's a little easier here because the data wont change. UserLookupCF (using composite types for the key value) row_key: e.g. "facebook:12345" or "twitter:12345" col_name : e.g

Re: data model question

2012-03-11 Thread Tamar Fraenkel
Hi! Thanks for the response. >From what I read, secondary indices are good only for columns with few possible values. Is this a good fit for my case? I have unique facebook id for every user. Thanks *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com

Re: data model question

2012-03-11 Thread Marcel Steinbach
Either you do that or you could think about using a secondary index on the fb user name in your primary cf. See http://www.datastax.com/docs/1.0/ddl/indexes Cheers Am 11.03.2012 um 09:51 schrieb Tamar Fraenkel : Hi! I need some advise: I have user CF, which has a UUID key which is my internal u

Re: data model advice

2012-02-24 Thread Brandon Williams
On Fri, Feb 24, 2012 at 10:46 AM, David Leimbach wrote: > > > On Thu, Feb 23, 2012 at 7:54 PM, Martin Arrowsmith > wrote: >> >> Hi Franc, >> >> Or, you can consider using composite columns. It is not recommended to use >> Super Columns anymore. > > > Yes, but why?  Is it because composite columns

Re: data model advice

2012-02-24 Thread David Leimbach
On Thu, Feb 23, 2012 at 7:54 PM, Martin Arrowsmith < arrowsmith.mar...@gmail.com> wrote: > Hi Franc, > > Or, you can consider using composite columns. It is not recommended to use > Super Columns anymore. > Yes, but why? Is it because composite columns effectively replace and simplify similar mo

Re: data model advice

2012-02-23 Thread Franc Carter
On Fri, Feb 24, 2012 at 2:54 PM, Martin Arrowsmith < arrowsmith.mar...@gmail.com> wrote: > Hi Franc, > > Or, you can consider using composite columns. It is not recommended to use > Super Columns anymore. > > Best wishes, > On first read it would seem that there is fair bit of overhead with compo

Re: data model advice

2012-02-23 Thread Franc Carter
On Fri, Feb 24, 2012 at 2:54 PM, Martin Arrowsmith < arrowsmith.mar...@gmail.com> wrote: > Hi Franc, > > Or, you can consider using composite columns. It is not recommended to use > Super Columns anymore. > Thanks, I'll look in to composite columns cheers > > Best wishes, > > Martin > > > On

Re: data model advice

2012-02-23 Thread Martin Arrowsmith
Hi Franc, Or, you can consider using composite columns. It is not recommended to use Super Columns anymore. Best wishes, Martin On Thu, Feb 23, 2012 at 7:51 PM, Indranath Ghosh wrote: > How about using a composite row key like the following: > > Entity.Day1.TypeA: {col1:val1, col2:val2, . . .

Re: data model advice

2012-02-23 Thread Indranath Ghosh
How about using a composite row key like the following: Entity.Day1.TypeA: {col1:val1, col2:val2, . . . } Entity.Day1.TypeB: {col1:val1, col2:val2, . . . } . . Entity.DayN.TypeA: {col1:val1, col2:val2, . . . } Entity.DayN.TypeB: {col1:val1, col2:val2, . . . } It is better to avoid super columns..

Re: data model with composite columns

2012-02-02 Thread Deno Vichas
this is what i thought. thanks for clarifying. On 2/2/2012 10:44 PM, aaron morton wrote: Short answer is no. The slightly longer answer is nope. All column names in a CF are compared using the same comparator. You will need to create a new CF. Cheers. - Aaron Morton Freelan

  1   2   >