Re: MultiInput/MultiGet CF in MapReduce

2013-03-31 Thread aaron morton
> If I would use client.get_slice ( key). My rowkey is '20130314' from Index > Table. > Q1) How to know for rowkey '20130314' is in which Token Range & EndPoint. Calculate the MD5 hash of the key and find the token range that contains it. This is what is used internally https://github.com/apac

Re: MultiInput/MultiGet CF in MapReduce

2013-03-29 Thread Alicia Leong
This is the current flow for ColumnFamilyInputFormat. Please correct me If I'm wrong 1) In ColumnFamilyInputFormat, Get all nodes token ranges using * client.describe_ring* 2) Get CfSplit using *client.describe_splits_ex *with the token range 2) new ColumnFamilySplit with start range, end range a

Re: MultiInput/MultiGet CF in MapReduce

2013-03-29 Thread Edward Capriolo
You can use the output of describe_ring along with partitioner information to determine which nodes data lives on. On Fri, Mar 29, 2013 at 12:33 PM, Alicia Leong wrote: > Hi All > > I’m thinking to do in this way. > > 1) 1) get_slice ( MMDDHH ) from Index Table. > > 2) 2) With th

Re: MultiInput/MultiGet CF in MapReduce

2013-03-29 Thread Alicia Leong
Hi All I’m thinking to do in this way. 1) 1) get_slice ( MMDDHH ) from Index Table. 2) 2) With the returned list of ROWKEYs 3) 3) Pass it to multiget_slice ( keys …) But my questions is how to ensure ‘Data Locality’ ?? On Tue, Mar 19, 2013 at 3:33 PM, aaron morton wrot

Re: MultiInput/MultiGet CF in MapReduce

2013-03-19 Thread aaron morton
I would be looking at Hive or Pig, rather than writing the MapReduce. There is an example in the source cassandra distribution, or you can look at Data Stax Enterprise to start playing with Hive. Typically with hadoop queries you want to query a lot of data, if you are only querying a few row

MultiInput/MultiGet CF in MapReduce

2013-03-17 Thread Alicia Leong
Hi All I have 2 tables Data Table - RowKey: 1 => (column=name, value=apple) RowKey: 2 => (column=name, value=orange) RowKey: 3 => (column=name, value=banana) RowKey: 4 => (column=name, value=mango) Index Table (MMDDHH) RowKey: