This is the current flow for ColumnFamilyInputFormat. Please correct me If I'm wrong
1) In ColumnFamilyInputFormat, Get all nodes token ranges using * client.describe_ring* 2) Get CfSplit using *client.describe_splits_ex *with the token range 2) new ColumnFamilySplit with start range, end range and endpoint 3) In ColumnFamilyRecordReader, will query *client.get_range_slices* with the start range & end range of the ColumnFamilySplit at endpoint (datanode) If I would use *client.get_slice* ( key). My rowkey is '20130314' from Index Table. Q1) How to know for rowkey '20130314' is in which Token Range & EndPoint. Even though I manage to find out the Token Range & EndPoint. Is the available Thrift API, that I can pass the ( ByteBuffer key, KeyRange range ) Likes merge of client.get_slice & client.get_range_slices Thanks On Sat, Mar 30, 2013 at 7:53 AM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > You can use the output of describe_ring along with partitioner information > to determine which nodes data lives on. > > > On Fri, Mar 29, 2013 at 12:33 PM, Alicia Leong <lccali...@gmail.com>wrote: > >> Hi All >> >> I’m thinking to do in this way. >> >> 1) 1) get_slice ( YYYYMMDDHH ) from Index Table. >> >> 2) 2) With the returned list of ROWKEYs >> >> 3) 3) Pass it to multiget_slice ( keys …) >> >> >> >> But my questions is how to ensure ‘Data Locality’ ?? >> >> >> On Tue, Mar 19, 2013 at 3:33 PM, aaron morton <aa...@thelastpickle.com>wrote: >> >>> I would be looking at Hive or Pig, rather than writing the MapReduce. >>> >>> There is an example in the source cassandra distribution, or you can >>> look at Data Stax Enterprise to start playing with Hive. >>> >>> Typically with hadoop queries you want to query a lot of data, if you >>> are only querying a few rows consider writing the code in your favourite >>> language. >>> >>> Cheers >>> >>> ----------------- >>> Aaron Morton >>> Freelance Cassandra Consultant >>> New Zealand >>> >>> @aaronmorton >>> http://www.thelastpickle.com >>> >>> On 18/03/2013, at 1:29 PM, Alicia Leong <lccali...@gmail.com> wrote: >>> >>> Hi All >>> >>> I have 2 tables >>> >>> Data Table >>> ----------------- >>> RowKey: 1 >>> => (column=name, value=apple) >>> RowKey: 2 >>> => (column=name, value=orange) >>> RowKey: 3 >>> => (column=name, value=banana) >>> RowKey: 4 >>> => (column=name, value=mango) >>> >>> >>> Index Table (YYYYMMDDHH) >>> ------------------------------------------------ >>> RowKey: 2013030114 >>> => (column=1, value=) >>> => (column=2, value=) >>> => (column=3, value=) >>> RowKey: 2013030115 >>> => (column=4, value=) >>> >>> >>> I would like to know, how to implement below in MapReduce >>> 1) first query the Index Table by RowKey: 2013030114 >>> 2) then pass the Index Table column names (1,2,3) to query the Data >>> Table >>> >>> Thanks in advance. >>> >>> >>> >> >