> > 2) > I'm doing batch writes to the database (pulling data from multiple resources > and put them together). I wish to know if there's some better methods to > improve the writing efficiency since it's just about the same speed as MySQL, > when writing sequentially. Seems like the commitlog requires a huge mount > of disk IO comparing with my test machine can afford. Have a look at http://www.datastax.com/dev/blog/bulk-loading
> 3) > In my case, each row is read randomly with the same chance. I have around > 0.5M rows in total. Can you provide some practical advices on optimizing > the row cache and key cache? I can use up to 8 gig of memory on test > machines. If your data set small enough to fit in memory ? . You may also be interested in the row_cache_provider setting for column families, see the CLI help for create column family and the IRowCacheProvider interface. You can replace the caching strategy if you want to. Hope that helps. ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 16/08/2011, at 12:44 PM, Yi Yang wrote: > Dear all, > > I wanna report my use case, and have a discussion with you guys. > > I'm currently working on my second Cassandra project. I got into somehow a > unique use case: storing traditional, relational data set into Cassandra > datastore, it's a dataset of int and float numbers, no more strings, no more > other data and the column names are much longer than the value itself. > Besides, row-key is the md-5 hash ver3 UUID of some other data. > > 1) > I did some workaround to make it save some disk space however it still takes > approximately 12-15x more disk space than MySQL. I looked into Cassandra > SSTable internal, did some optimizing on selecting better data serializer and > also hashed the column name into one byte. That made the current database > having ~6x overhead on disk space comparing with MySQL, which I think it > might be acceptable. > > I'm currently interested into CASSANDRA-674 and will also test CASSANDRA-47 > in the coming days. I'll keep you updated on my testing. But I'm willing > to hear your idea on saving disk space. > > 2) > I'm doing batch writes to the database (pulling data from multiple resources > and put them together). I wish to know if there's some better methods to > improve the writing efficiency since it's just about the same speed as MySQL, > when writing sequentially. Seems like the commitlog requires a huge mount > of disk IO comparing with my test machine can afford. > > 3) > In my case, each row is read randomly with the same chance. I have around > 0.5M rows in total. Can you provide some practical advices on optimizing > the row cache and key cache? I can use up to 8 gig of memory on test > machines. > > Thanks for your help. > > > Best, > > Steve > >