RE: cassandra hadoop reducer writing to CQL3 - primary key - must it be text type?

John Lumby Wed, 09 Oct 2013 15:34:06 -0700

----------------------------------------
> From: johnlu...@hotmail.com
> To: user@cassandra.apache.org
> Subject: RE: cassandra hadoop reducer writing to CQL3 - primary key - must it 
> be text type?
> Date: Wed, 9 Oct 2013 09:40:06 -0400
>
>     software versions : apache-cassandra-2.0.1    hadoop-2.1.0-beta
>
> I have been experimenting with using hadoop for a map/reduce operation on 
> cassandra,
> outputting to the CqlOutputFormat.class.
> I based my first program fairly closely on the famous WordCount example in
> examples/hadoop_cql3_word_count
> except --- I set my output colfamily to have a bigint primary key :
>
> CREATE TABLE archive_recordids ( recordid bigint , count_num bigint, PRIMARY 
> KEY (recordid))
>
> and simply tried setting this key as one of the keys in the output map :
>
>          keys.put("recordid", ByteBufferUtil.bytes(recordid.longValue()));
>
> but it always failed with a strange error :
>
> java.io.IOException: InvalidRequestException(why:Key may not be empty)
>
I managed to get a little bit further and my M/R program now runs to completion
with output to the colfamily with bigint primary key and actually does manage
to UPDATE a row.


query:

     String query = "UPDATE " + keyspace + "." + OUTPUT_COLUMN_FAMILY + " SET 
count_num = ? ";

reduce method :

        public void reduce(LongWritable writableRecid, Iterable<LongWritable> 
values, Context context) throws IOException, InterruptedException
        {
            Long sum = 0L;
            Long recordid = writableRecid.get();
            List<ByteBuffer> vbles = null;
            byte[] longByterray = new byte[8];
            for(int i= 0; i < 8; i++) {
                longByterray[i] = (byte)(recordid>> (i * 8));
            }  
            ByteBuffer recordIdByteBuf = ByteBuffer.allocate(8);
            recordIdByteBuf.wrap(longByterray);
            keys.put("recordid", recordIdByteBuf);
                      ...
            context.write(keys, vbles);
        }

and my logger output does show it outputting maps containing
what appear to be valid keys e.g.

writing key : 0x4700000000407826 , hasarray ? : Y

there are about 74 mappings in the final reducer output,
each with a different numeric record key.

but after the program completes,   there is just one single row in the 
columnfamily
with a rowkey of 0 (zero).

SELECT * FROM archive_recordids LIMIT 999999999;

 recordid | count_num
----------+-----------
        0 |         2

(1 rows)


I guess it is something relating to the way my code is wrapping along value 
into the ByteBuffer
or maybe the way the ByteBuffer is being allocated.    As far as I can tell,
the ByteBuffer needs to be populated in exactly the same way as a thrift 
application
would populate a ByteBuffer for a bigint key  --   does anyone know how to do 
that
or point me to an example that works?

Thanks   John


>
> Cheers,   John

RE: cassandra hadoop reducer writing to CQL3 - primary key - must it be text type?

Reply via email to