Using multiple client threads (w/ pooled thrift connections) will be even better than mutating really large chunks at a time.
On Tue, May 11, 2010 at 4:16 AM, David Boxenhorn <da...@lookin2.com> wrote: > Turns out the problem is with batch mutate. I mutate chunks 100 times > bigger, it goes 100 times faster. > > Now I have a problem with running out of memory sometimes.... > > On Mon, May 10, 2010 at 8:17 PM, B. Todd Burruss <bburr...@real.com> wrote: >> >> have you put your commit log on a disk by itself? not a logical partition >> shared by oracle or cassandra "data". this will make a difference, as you >> don't want the cassandra commit logs competing with other OS and oracle >> I/O. look in storage-conf.xml and see if you can move this. >> >> also check "MemtableThroughputInMB". if you are doing a _lot_ of writes >> you probably want to jack this up a bunch to get through the migration, then >> put it back down for normal operation. the default out of the box is too >> low i believe. >> >> On 05/10/2010 02:05 AM, David Boxenhorn wrote: >> >> I read something like 80,000 rows from Oracle and write them to Cassandra >> in chunks of 1000 rows - so I'm supposedly working to Cassandra's strength >> and Oracle's weakness. >> >> Reading 1000 rows from Oracle is "instantaneous", writing them takes maybe >> 30 seconds. Not too much data per row, maybe 1K. >> >> >> >> On Mon, May 10, 2010 at 11:48 AM, Ran Tavory <ran...@gmail.com> wrote: >>> >>> Hector uses tsocket. not sure what you mean by "buffered" - is that >>> framed? Hector by default does not use framed. >>> The code is here if you'd like to have a >>> look http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/CassandraClientFactory.java#L77 >>> However, I find it hard to believe that the actual connection is the >>> slowing factor. >>> Roughly speaking, cassandra is fast on writes and slow on reads. Exact >>> numbers are per-scenario so it's hard to say, but if you only write and >>> objects are small then from my experience you should expect a few k writes >>> per second on a single host. How much do you see? >>> There are many configuration factors and they all depend on expected >>> usage and available h/w. >>> >>> On Mon, May 10, 2010 at 11:27 AM, vd <vineetdan...@gmail.com> wrote: >>>> >>>> What is the complete code string you are using to connect with cassandra >>>> from Java code >>>> >>>> >>>> >>>> On Mon, May 10, 2010 at 1:49 PM, David Boxenhorn <da...@lookin2.com> >>>> wrote: >>>>> >>>>> I don't know what "TSocket or the buffered one" means. Maybe I should >>>>> know? >>>>> >>>>> I'm using Hector. Does that explain anything? >>>>> >>>>> On Mon, May 10, 2010 at 11:15 AM, vd <vineetdan...@gmail.com> wrote: >>>>>> >>>>>> Hi >>>>>> >>>>>> what is it that you are using to connect with cassnadra TSocket or the >>>>>> buffered one ? >>>>>> >>>>>> >>>>>> ____________________________________ >>>>>> >>>>>> _______________________________________ >>>>>> >>>>>> >>>>>> >>>>>> On Mon, May 10, 2010 at 1:29 PM, David Boxenhorn <da...@lookin2.com> >>>>>> wrote: >>>>>>> >>>>>>> I'm running Java on the client, jdbc queries on Oracle, Hector on >>>>>>> Cassandra. >>>>>>> >>>>>>> The Cassandra and Oracle database designs are radically different, as >>>>>>> you might guess. >>>>>>> >>>>>>> I have no doubt that Cassandra can be tuned, in a multiple-server >>>>>>> cluster, to have superior throughput (that's why I'm doing it!). But for >>>>>>> now, it's really frustrating my development effort that Cassandra is so >>>>>>> slow. Can't I get it up to twice as slow as Oracle in my configuration? >>>>>>> >>>>>>> On Mon, May 10, 2010 at 10:47 AM, vd <vineetdan...@gmail.com> wrote: >>>>>>>> >>>>>>>> Hi David >>>>>>>> >>>>>>>> If I may ask...how do you plan to import data from oracle to >>>>>>>> cassandra ? >>>>>>>> As answer AFAIK cassandra's true ability comes into play when >>>>>>>> running on more than one machine...and please share how you are making >>>>>>>> comparisons like on writes or reads from cassandra. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________ >>>>>>>> _______________________________________ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, May 10, 2010 at 1:04 PM, David Boxenhorn <da...@lookin2.com> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> I'm running Oracle and Cassandra on my machine, trying to import my >>>>>>>>> data to Cassandra from Oracle. >>>>>>>>> >>>>>>>>> In my configuration Oracle is about ten times faster than >>>>>>>>> Cassandra. Cassandra has out-of-the-box tuning. >>>>>>>>> >>>>>>>>> I am new to Cassandra. How do I begin trying to tune it? >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >> > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com