Re: ColumnFamilyOutputFormat problem

2011-08-17 Thread Jian Fang
Thanks. I turned on the log for Cassandra and the batch mutation was not called at all. Seems have to dig into the API code myself. On Tue, Aug 16, 2011 at 7:25 PM, aaron morton wrote: > I suggested turning up the logging to see if the server processed a > batch_mutate call. This is done from the

Re: ColumnFamilyOutputFormat problem

2011-08-16 Thread aaron morton
I suggested turning up the logging to see if the server processed a batch_mutate call. This is done from the CassandraServer class (https://github.com/apache/cassandra/blob/cassandra-0.8.4/src/java/org/apache/cassandra/thrift/CassandraServer.java#L531) , not the CFOF. The first step will be to

Re: ColumnFamilyOutputFormat problem

2011-08-16 Thread Jian Fang
If you look at the source code and you will find there is no log message in the ColumnFamilyOutputFormat class and the related classes. How to trace the problem then? No one actually got this working? On Thu, Aug 11, 2011 at 6:10 PM, aaron morton wrote: > Turn the logging up in cassandra or your

Re: ColumnFamilyOutputFormat problem

2011-08-11 Thread aaron morton
Turn the logging up in cassandra or your MR job and make sure the batch_mutation is sent. Sounds like it's not. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 12 Aug 2011, at 07:22, Jian Fang wrote: > 53 seconds included the ma

Re: ColumnFamilyOutputFormat problem

2011-08-11 Thread Jian Fang
53 seconds included the map phase to read and process the input file. The records were updated at the end of the reduce phase. I checked the sales ranks in the update file and the sales ranks in the Cassandra, they are different and thus, the records were not actually updated. I remember I run th

Re: ColumnFamilyOutputFormat problem

2011-08-10 Thread aaron morton
I'm a simple guy. My first step would be see if the expected data is in the data base, if not what's missing. 2.5M updates / 3 nodes = 833,333 per node 833,333 / 53 seconds = 15,723 per second 1 / 15,723 = 0.6 seconds / 0.06 milliseconds per mutation sounds reasonable to me. check th

Re: ColumnFamilyOutputFormat problem

2011-08-10 Thread Jian Fang
There are data and each Cassandra cluster node holds about 100G. From the application point of view, if I run the job twice with the same input file, i.e., the sales rank update file, then I should see a much smaller number of products, whose rank change exceeds the threshold, in the output file fo

Re: ColumnFamilyOutputFormat problem

2011-08-10 Thread aaron morton
> Seems the data are not actually written to Cassandra. Before jumping into the Hadoop side of things are you saying there is no data in Cassandra ? Can you retrieve any using the CLI ? Take a look at cfstats on each node to see the estimated record count. Cheers - Aaron Mor

ColumnFamilyOutputFormat problem

2011-08-10 Thread Jian Fang
Hi, I am using Cassandra 0.8.2 with Hadoop 0.20.2. My application read a file and then write about 2.5 million records to Cassandra. I used ColumnFamilyOutputFormat to write to Cassandra. My Cassandra cluster has three nodes with one Hadoop task tracker on each node. The wired problem is that I on