Thanks. I turned on the log for Cassandra and the batch mutation was not
called at all. Seems have to dig into the API code myself.
On Tue, Aug 16, 2011 at 7:25 PM, aaron morton wrote:
> I suggested turning up the logging to see if the server processed a
> batch_mutate call. This is done from the
I suggested turning up the logging to see if the server processed a
batch_mutate call. This is done from the CassandraServer class
(https://github.com/apache/cassandra/blob/cassandra-0.8.4/src/java/org/apache/cassandra/thrift/CassandraServer.java#L531)
, not the CFOF.
The first step will be to
If you look at the source code and you will find there is no log message in
the ColumnFamilyOutputFormat class and the related classes.
How to trace the problem then? No one actually got this working?
On Thu, Aug 11, 2011 at 6:10 PM, aaron morton wrote:
> Turn the logging up in cassandra or your
Turn the logging up in cassandra or your MR job and make sure the
batch_mutation is sent. Sounds like it's not.
Cheers
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com
On 12 Aug 2011, at 07:22, Jian Fang wrote:
> 53 seconds included the ma
53 seconds included the map phase to read and process the input file. The
records were updated at the end of the reduce phase.
I checked the sales ranks in the update file and the sales ranks in the
Cassandra, they are different and thus, the records
were not actually updated.
I remember I run th
I'm a simple guy. My first step would be see if the expected data is in the
data base, if not what's missing.
2.5M updates / 3 nodes = 833,333 per node
833,333 / 53 seconds = 15,723 per second
1 / 15,723 = 0.6 seconds / 0.06 milliseconds per mutation
sounds reasonable to me.
check th
There are data and each Cassandra cluster node holds about 100G. From the
application point of view, if I run the job twice with the same input file,
i.e., the sales rank update file, then I should see a much smaller number of
products, whose rank change exceeds the threshold, in the output file fo
> Seems the data are not actually written to Cassandra.
Before jumping into the Hadoop side of things are you saying there is no data
in Cassandra ? Can you retrieve any using the CLI ? Take a look at cfstats on
each node to see the estimated record count.
Cheers
-
Aaron Mor
Hi,
I am using Cassandra 0.8.2 with Hadoop 0.20.2. My application read a file
and then write about 2.5 million records
to Cassandra. I used ColumnFamilyOutputFormat to write to Cassandra. My
Cassandra cluster has three nodes with
one Hadoop task tracker on each node. The wired problem is that I on