Re: MultiInput/MultiGet CF in MapReduce

Alicia Leong Fri, 29 Mar 2013 22:16:17 -0700

This is the current flow for ColumnFamilyInputFormat.  Please correct me If
I'm wrong


1) In ColumnFamilyInputFormat, Get all nodes token ranges using *
client.describe_ring*
2) Get CfSplit using *client.describe_splits_ex *with the token range
2) new ColumnFamilySplit with start range, end range and endpoint
3) In ColumnFamilyRecordReader, will query *client.get_range_slices* with
the start range & end range of the ColumnFamilySplit at endpoint (datanode)


If I would use *client.get_slice* ( key).  My rowkey is '20130314'  from
Index Table.
Q1) How to know for rowkey '20130314' is in which Token Range & EndPoint.
Even though I manage to find out the Token Range & EndPoint.
Is the available Thrift API, that I can pass the ( ByteBuffer key, KeyRange
range )  Likes merge of client.get_slice & client.get_range_slices


Thanks



On Sat, Mar 30, 2013 at 7:53 AM, Edward Capriolo <edlinuxg...@gmail.com>wrote:

> You can use the output of describe_ring along with partitioner information
> to determine which nodes data lives on.
>
>
> On Fri, Mar 29, 2013 at 12:33 PM, Alicia Leong <lccali...@gmail.com>wrote:
>
>> Hi All
>>
>> I’m thinking to do in this way.
>>
>> 1)      1) get_slice ( YYYYMMDDHH )  from Index Table.
>>
>> 2)      2) With the returned list of ROWKEYs
>>
>> 3)      3) Pass it to multiget_slice ( keys …)
>>
>>
>>
>> But my questions is how to ensure ‘Data Locality’  ??
>>
>>
>> On Tue, Mar 19, 2013 at 3:33 PM, aaron morton <aa...@thelastpickle.com>wrote:
>>
>>> I would be looking at Hive or Pig, rather than writing the MapReduce.
>>>
>>> There is an example in the source cassandra distribution, or you can
>>> look at Data Stax Enterprise to start playing with Hive.
>>>
>>> Typically with hadoop queries you want to query a lot of data, if you
>>> are only querying a few rows consider writing the code in your favourite
>>> language.
>>>
>>> Cheers
>>>
>>>    -----------------
>>> Aaron Morton
>>> Freelance Cassandra Consultant
>>> New Zealand
>>>
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 18/03/2013, at 1:29 PM, Alicia Leong <lccali...@gmail.com> wrote:
>>>
>>> Hi All
>>>
>>> I have 2 tables
>>>
>>> Data Table
>>> -----------------
>>> RowKey: 1
>>> => (column=name, value=apple)
>>> RowKey: 2
>>> => (column=name, value=orange)
>>> RowKey: 3
>>> => (column=name, value=banana)
>>> RowKey: 4
>>> => (column=name, value=mango)
>>>
>>>
>>> Index Table (YYYYMMDDHH)
>>> ------------------------------------------------
>>> RowKey: 2013030114
>>> => (column=1, value=)
>>> => (column=2, value=)
>>> => (column=3, value=)
>>> RowKey: 2013030115
>>> => (column=4, value=)
>>>
>>>
>>> I would like to know, how to implement below in MapReduce
>>> 1) first query the Index Table by RowKey: 2013030114
>>> 2) then pass the Index Table column names  (1,2,3) to query the Data
>>> Table
>>>
>>> Thanks in advance.
>>>
>>>
>>>
>>
>

Re: MultiInput/MultiGet CF in MapReduce

Reply via email to