Yeah, we're processing item similarities. So we are writing single columns at a time. Although we do batch these into 400 mutations before sending to Cassy. We currently perform almost 2 billion calculations that then write almost 4 billion columns.
Once all similarities are calculated, we just grab a slice per item and create a denormalised vector of similar items (trimmed down to topN and only those above a certain threshold). This makes lookup super fast as we only get one column from cassandra. So we just want to optimise the crunching and storing phase as that's a O(n^2) complexity problem. The quicker we can make that the quicker the whole process works. I'm going to try disabling minor compactions as a start. > is the loading disk or cpu or network bound? cpu is at 40% free only one cassy node on the same box as the processor for now so no network traffic so I think it's disk access. Will find out for sure tomorrow after the current test runs. Thanks, Paul. On Thu, Aug 18, 2011 at 2:23 PM, Jake Luciani <jak...@gmail.com> wrote: > Are you writing lots of tiny rows or a few very large rows, are you > batching mutations? is the loading disk or cpu or network bound? > > -Jake > > On Thu, Aug 18, 2011 at 7:08 AM, Paul Loy <ketera...@gmail.com> wrote: > >> Hi All, >> >> I have a program that crunches through around 3 billion calculations. We >> store the result of each of these in cassandra to later query once in order >> to create some vectors. Our processing is limited by Cassandra now, rather >> than the calculations themselves. >> >> I was wondering what settings I can change to increase the write >> throughput. Perhaps disabling all caching, etc, as I won't be able to keep >> it all in memory anyway and only want to query the results once. >> >> Any thoughts would be appreciated, >> >> Paul. >> >> -- >> --------------------------------------------- >> Paul Loy >> p...@keteracel.com >> http://uk.linkedin.com/in/paulloy >> > > > > -- > http://twitter.com/tjake > -- --------------------------------------------- Paul Loy p...@keteracel.com http://uk.linkedin.com/in/paulloy