I did not set the consistency level because I didn't find this option in the
ConfigHelper class. I guess it should use level one by default.

Actually, I only twisted the word count example a bit. Here is the code
snippet,

        getConf().set(CONF_COLUMN_NAME, columnName);

        Job job = new Job(getConf(), KEYSPACE);
        job.setJarByClass(WorkIdFinder.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(ReducerToFilesystem.class);
        job.setReducerClass(ReducerToFilesystem.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileOutputFormat.setOutputPath(job, new Path(OUTPUT_PATH_PREFIX +
columnFamily));

        job.setInputFormatClass(ColumnFamilyInputFormat.class);

        ConfigHelper.setRpcPort(job.getConfiguration(), "9260");
        ConfigHelper.setInitialAddress(job.getConfiguration(),
"dnjsrcha01");
        ConfigHelper.setPartitioner(job.getConfiguration(),
"org.apache.cassandra.dht.RandomPartitioner");
        ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE,
columnFamily);
        ConfigHelper.setRangeBatchSize(job.getConfiguration(), batchSize);
        SlicePredicate predicate = new
SlicePredicate().setColumn_names(Arrays.asList(ByteBufferUtil.bytes(columnName)));
        ConfigHelper.setInputSlicePredicate(job.getConfiguration(),
predicate);

Yes, I have one task tracker running on each Cassandra node.

Thanks,

John


On Thu, Jul 28, 2011 at 3:51 PM, Jeremy Hanna <jeremy.hanna1...@gmail.com>wrote:

> Just wondering - what consistency level are you using for hadoop reads?
>  Also, do you have task trackers running on the cassandra nodes so that
> reads will be local?
>
> On Jul 28, 2011, at 2:46 PM, Jian Fang wrote:
>
> > I changed the rpc_timeout_in_ms to 30000 and 40000, then changed the
> cassandra.range.batch.size from 4096 to 1024,
> > but still 40% tasks got timeout exceptions.
> >
> > Not sure if this is caused by Cassandra speed performance (8G heap size
> for about 100G of data) or the way how the Cassandra-hadoop integration
> > is implemented. I rarely saw any timeout exceptions when I use hector to
> get back data.
> >
> > Thanks,
> >
> > John
> >
> > On Thu, Jul 28, 2011 at 12:45 PM, Jian Fang <
> jian.fang.subscr...@gmail.com> wrote:
> >
> > My current setting is 10000. I will try 30000.
> >
> > Thanks,
> >
> > John
> >
> > On Thu, Jul 28, 2011 at 12:39 PM, Jeremy Hanna <
> jeremy.hanna1...@gmail.com> wrote:
> > See http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting - I
> would probably start with setting your rpc_timeout_in_ms to something like
> 30000.
> >
> > On Jul 28, 2011, at 11:09 AM, Jian Fang wrote:
> >
> > > Hi,
> > >
> > > I run Cassandra 0.8.2 and hadoop 0.20.2 on three nodes, each node
> includes a Cassandra instance and a hadoop data node.
> > > I created a simple hadoop job to scan a Cassandra column value in a
> column family and write it to a file system if it meets some conditions.
> > > I keep getting the following timeout exceptions. Is this caused by my
> settings in Cassandra? Or how could I change the timeout value on the
> > > Cassandra Hadoop API to get around this problem?
> > >
> > >
> > > 11/07/28 12:02:47 INFO mapred.JobClient: Task Id :
> attempt_201107281151_0001_m_000052_0, Status : FAILED
> > > java.lang.RuntimeException: TimedOutException()
> > >     at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:265)
> > >     at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:279)
> > >     at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:177)
> > >     at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
> > >     at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
> > >     at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:136)
> > >     at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
> > >     at
> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
> > >     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> > >     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> > >     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> > >     at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > > Caused by: TimedOutException()
> > >     at
> org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12590)
> > >     at
> org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:762)
> > >     at
> org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:734)
> > >     at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:243)
> > >     ... 11 more
> > >
> > > Thanks in advance,
> > >
> > > John
> >
> >
> >
>
>

Reply via email to