The reason to break it up is that the information will then be on different servers, so you can have server 1 spending time retrieving row 1, while you have server 2 retrieving row 2, and server 3 retrieving row 3... So instead of getting 3000 things from one server, you get 1000 from 3 servers in parallel...
________________________________ From: Yang [mailto:teddyyyy...@gmail.com] Sent: Wednesday, June 29, 2011 12:07 AM To: user@cassandra.apache.org Subject: Re: custom reconciling columns? ok, here is the profiling result. I think this is consistent (having been trying to recover how to effectively use yourkit ...) see attached picture since I actually do not use the thrift interface, but just directly use the thrift.CassandraServer and run my code in the same JVM as cassandra, and was running the whole thing on a single box, there is no message serialization/deserialization cost. but more columns did add on to more time. the time was spent in the ConcurrentSkipListMap operations that implement the memtable. regarding breaking up the row, I'm not sure it would reduce my run time, since our requirement is to read the entire rolling window history (we already have the TTL enabled , so the history is limited to a certain length, but it is quite long: over 1000 , in some cases, can be 5000 or more ) . I think accessing roughly 1000 items is not an uncommon requirement for many applications. in our case, each column has about 30 bytes of data, besides the meta data such as ttl, timestamp. at history length of 3000, the read takes about 12ms (remember this is completely in-memory, no disk access) I just took a look at the expiring column logic, it looks that the expiration does not come into play until when the CassandraServer.internal_get()===>thriftifyColumns() gets called. so the above memtable access time is still spent. yes, then breaking up the row is going to be helpful, but only to the degree of preventing accessing expired columns (btw ---- if this is actually built into cassandra code it would be nicer, so instead of spending multiple key lookups, I locate to the row once, and then within the row, there are different "generation" buckets, so those old generation buckets that are beyond expiration are not read ); currently just accessing the 3000 live columns is already quite slow. I'm trying to see whether there are some easy magic bullets for a drop-in replacement for concurrentSkipListMap... Yang On Tue, Jun 28, 2011 at 4:18 PM, Nate McCall <n...@datastax.com> wrote: I agree with Aaron's suggestion on data model and query here. Since there is a time component, you can split the row on a fixed duration for a given user, so the row key would become userId_[timestamp rounded to day]. This provides you an easy way to roll up the information for the date ranges you need since the key suffix can be created without a read. This also benefits from spreading the read load over the cluster instead of just the replicas since you have 30 rows in this case instead of one. On Tue, Jun 28, 2011 at 5:55 PM, aaron morton <aa...@thelastpickle.com> wrote: > Can you provide some more info: > - how big are the rows, e.g. number of columns and column size ? > - how much data are you asking for ? > - what sort of read query are you using ? > - what sort of numbers are you seeing ? > - are you deleting columns or using TTL ? > I would consider issues with the data churn, data model and query before > looking at serialisation. > Cheers > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > On 29 Jun 2011, at 10:37, Yang wrote: > > I can see that as my user history grows, the reads time proportionally ( or > faster than linear) grows. > if my business requirements ask me to keep a month's history for each user, > it could become too slow.----- I was suspecting that it's actually the > serializing and deserializing that's taking time (I can definitely it's cpu > bound) > > > On Tue, Jun 28, 2011 at 3:04 PM, aaron morton <aa...@thelastpickle.com> > wrote: >> >> There is no facility to do custom reconciliation for a column. An append >> style operation would run into many of the same problems as the Counter >> type, e.g. not every node may get an append and there is a chance for lost >> appends unless you go to all the trouble Counter's do. >> >> I would go with using a row for the user and columns for each item. Then >> you can have fast no look writes. >> >> What problems are you seeing with the reads ? >> >> Cheers >> >> >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 29 Jun 2011, at 04:20, Yang wrote: >> >> > for example, if I have an application that needs to read off a user >> > browsing history, and I model the user ID as the key, >> > and the history data within the row. with current approach, I could >> > model each visit as a column, >> > the possible issue is that *possibly* (I'm still doing a lot of >> > profiling on this to verify) that a lot of time is spent on serialization >> > into the message and out of the >> > message, plus I do not need the full features provided by the column : >> > for example I do not need a timestamp on each visit, etc, >> > so it might be faster to put the entire history in a blob, and each >> > visit only takes up a few bytes in the blob, and >> > my code manipulates the blob. >> > >> > problem is, I still need to avoid the read-before-write, so I send only >> > the latest visit, and let cassandra do the reconcile, which appends the >> > visit to the blob, so this needs custom reconcile behavior. >> > >> > is there a way to incorporate such custom reconcile under current code >> > framework? (I see custom sorting, but no custom reconcile) >> > >> > thanks >> > yang >> > > >