This line always returns "0" because the key ByteBuffer has already been read from.
startToken = partitioner.getTokenFactory().toString(partitioner.getToken(Iterables.getLast(rows).key)); I was able to get it to work by using .mark() and .reset() on the buffer. I'll log a bug, but confused as to why no one else is running into this. -Ben On Wed, Aug 29, 2012 at 12:32 PM, Ben Frank <b...@airlust.com> wrote: > Hey all, > I'm having an issue using ColumnFamilyInputFormat in an hadoop job. > The mappers spin out of control and just keep reading records over and > over, never getting to the end. I have CF with wide rows (although none is > past about 5 at the columns at the moment), I've tried setting wide rows to > both true and false. If I turn on debugging, I get what seems like strange > input splits created (see the -1): > > hadoop.ColumnFamilyInputFormat: partitioner is > org.apache.cassandra.dht.RandomPartitioner@203727c5 > hadoop.ColumnFamilyInputFormat: adding > ColumnFamilySplit((127605887595351923798765477786913079296, '-1] @[cass1, > cass2, cass3]) > hadoop.ColumnFamilyInputFormat: adding ColumnFamilySplit((-1, '0] @[cass1, > cass2, cass3]) > hadoop.ColumnFamilyInputFormat: adding ColumnFamilySplit((0, > '42535295865117307932921825928971026432] @[cass2, cass3, cass4]) > hadoop.ColumnFamilyInputFormat: adding > ColumnFamilySplit((42535295865117307932921825928971026432, > '85070591730234615865843651857942052864] @[cass3, cass4, cass1]) > hadoop.ColumnFamilyInputFormat: adding > ColumnFamilySplit((85070591730234615865843651857942052864, > '127605887595351923798765477786913079296] @[cass4, cass1, cass2]) > > If I debug in eclipse (with widerows=false) is see that this call in > ColumnFamilyRecordReader.StaticRowIterator.maybeInit() is setting > startToken to -1: > > startToken = partitioner.getTokenFactory().toString(partitioner > .getToken(Iterables.getLast(rows).key)); > > I'm using cassandra 1.1.2 with a 4 node cluster, a replication factor of 3 > and hadoop 0.20.1, here's the output of nodetool ring: > > Address DC Rack Status State Load > Effective-Ownership Token > > > 127605887595351923798765477786913079296 > > 129.19.63.126 datacenter1 rack1 Up Normal 46.91 GB > 75.00% 0 > > 129.19.63.127 datacenter1 rack1 Up Normal 49.45 GB > 75.00% 42535295865117307932921825928971026432 > > 129.19.63.128 datacenter1 rack1 Up Normal 43.19 GB > 75.00% 85070591730234615865843651857942052864 > > 129.19.63.129 datacenter1 rack1 Up Normal 46.9 GB > 75.00% 127605887595351923798765477786913079296 > > Anyone have any idea what's going on here, I'm assuming the splits are > wrong so I'm going to focus on seeing what's up with that, anything else I > should look at ? > > -Ben >