Re: Cassandra for numerical data set

aaron morton Tue, 16 Aug 2011 14:27:53 -0700

> 
> 2)
> I'm doing batch writes to the database (pulling data from multiple resources 
> and put them together).   I wish to know if there's some better methods to 
> improve the writing efficiency since it's just about the same speed as MySQL, 
> when writing sequentially.   Seems like the commitlog requires a huge mount 
> of disk IO comparing with my test machine can afford.
Have a look at http://www.datastax.com/dev/blog/bulk-loading


> 3)
> In my case, each row is read randomly with the same chance.   I have around 
> 0.5M rows in total.   Can you provide some practical advices on optimizing 
> the row cache and key cache?   I can use up to 8 gig of memory on test 
> machines.
If your data set small enough to fit in memory ? . You may also be interested 
in the row_cache_provider setting for column families, see the CLI help for 
create column family and the IRowCacheProvider interface. You can replace the 
caching strategy if you want to.  

Hope that helps. 

 
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 16/08/2011, at 12:44 PM, Yi Yang wrote:

> Dear all,
> 
> I wanna report my use case, and have a discussion with you guys.
> 
> I'm currently working on my second Cassandra project.   I got into somehow a 
> unique use case: storing traditional, relational data set into Cassandra 
> datastore, it's a dataset of int and float numbers, no more strings, no more 
> other data and the column names are much longer than the value itself.   
> Besides, row-key is the md-5 hash ver3 UUID of some other data.
> 
> 1)
> I did some workaround to make it save some disk space however it still takes 
> approximately 12-15x more disk space than MySQL.   I looked into Cassandra 
> SSTable internal, did some optimizing on selecting better data serializer and 
> also hashed the column name into one byte.   That made the current database 
> having ~6x overhead on disk space comparing with MySQL, which I think it 
> might be acceptable.
> 
> I'm currently interested into CASSANDRA-674 and will also test CASSANDRA-47 
> in the coming days.   I'll keep you updated on my testing.   But I'm willing 
> to hear your idea on saving disk space.
> 
> 2)
> I'm doing batch writes to the database (pulling data from multiple resources 
> and put them together).   I wish to know if there's some better methods to 
> improve the writing efficiency since it's just about the same speed as MySQL, 
> when writing sequentially.   Seems like the commitlog requires a huge mount 
> of disk IO comparing with my test machine can afford.
> 
> 3)
> In my case, each row is read randomly with the same chance.   I have around 
> 0.5M rows in total.   Can you provide some practical advices on optimizing 
> the row cache and key cache?   I can use up to 8 gig of memory on test 
> machines.
> 
> Thanks for your help.
> 
> 
> Best,
> 
> Steve
> 
>

Re: Cassandra for numerical data set

Reply via email to