---------------------------------------- > From: johnlu...@hotmail.com > To: user@cassandra.apache.org > Subject: RE: cassandra hadoop reducer writing to CQL3 - primary key - must it > be text type? > Date: Wed, 9 Oct 2013 09:40:06 -0400 > > software versions : apache-cassandra-2.0.1 hadoop-2.1.0-beta > > I have been experimenting with using hadoop for a map/reduce operation on > cassandra, > outputting to the CqlOutputFormat.class. > I based my first program fairly closely on the famous WordCount example in > examples/hadoop_cql3_word_count > except --- I set my output colfamily to have a bigint primary key : > > CREATE TABLE archive_recordids ( recordid bigint , count_num bigint, PRIMARY > KEY (recordid)) > > and simply tried setting this key as one of the keys in the output map : > > keys.put("recordid", ByteBufferUtil.bytes(recordid.longValue())); > > but it always failed with a strange error : > > java.io.IOException: InvalidRequestException(why:Key may not be empty) > I managed to get a little bit further and my M/R program now runs to completion with output to the colfamily with bigint primary key and actually does manage to UPDATE a row.
query: String query = "UPDATE " + keyspace + "." + OUTPUT_COLUMN_FAMILY + " SET count_num = ? "; reduce method : public void reduce(LongWritable writableRecid, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException { Long sum = 0L; Long recordid = writableRecid.get(); List<ByteBuffer> vbles = null; byte[] longByterray = new byte[8]; for(int i= 0; i < 8; i++) { longByterray[i] = (byte)(recordid>> (i * 8)); } ByteBuffer recordIdByteBuf = ByteBuffer.allocate(8); recordIdByteBuf.wrap(longByterray); keys.put("recordid", recordIdByteBuf); ... context.write(keys, vbles); } and my logger output does show it outputting maps containing what appear to be valid keys e.g. writing key : 0x4700000000407826 , hasarray ? : Y there are about 74 mappings in the final reducer output, each with a different numeric record key. but after the program completes, there is just one single row in the columnfamily with a rowkey of 0 (zero). SELECT * FROM archive_recordids LIMIT 999999999; recordid | count_num ----------+----------- 0 | 2 (1 rows) I guess it is something relating to the way my code is wrapping along value into the ByteBuffer or maybe the way the ByteBuffer is being allocated. As far as I can tell, the ByteBuffer needs to be populated in exactly the same way as a thrift application would populate a ByteBuffer for a bigint key -- does anyone know how to do that or point me to an example that works? Thanks John > > Cheers, John