> If I would use client.get_slice ( key). My rowkey is '20130314' from Index
> Table.
> Q1) How to know for rowkey '20130314' is in which Token Range & EndPoint.
Calculate the MD5 hash of the key and find the token range that contains it.
This is what is used internally
https://github.com/apac
This is the current flow for ColumnFamilyInputFormat. Please correct me If
I'm wrong
1) In ColumnFamilyInputFormat, Get all nodes token ranges using *
client.describe_ring*
2) Get CfSplit using *client.describe_splits_ex *with the token range
2) new ColumnFamilySplit with start range, end range a
You can use the output of describe_ring along with partitioner information
to determine which nodes data lives on.
On Fri, Mar 29, 2013 at 12:33 PM, Alicia Leong wrote:
> Hi All
>
> I’m thinking to do in this way.
>
> 1) 1) get_slice ( MMDDHH ) from Index Table.
>
> 2) 2) With th
Hi All
I’m thinking to do in this way.
1) 1) get_slice ( MMDDHH ) from Index Table.
2) 2) With the returned list of ROWKEYs
3) 3) Pass it to multiget_slice ( keys …)
But my questions is how to ensure ‘Data Locality’ ??
On Tue, Mar 19, 2013 at 3:33 PM, aaron morton wrot
I would be looking at Hive or Pig, rather than writing the MapReduce.
There is an example in the source cassandra distribution, or you can look at
Data Stax Enterprise to start playing with Hive.
Typically with hadoop queries you want to query a lot of data, if you are only
querying a few row
Hi All
I have 2 tables
Data Table
-
RowKey: 1
=> (column=name, value=apple)
RowKey: 2
=> (column=name, value=orange)
RowKey: 3
=> (column=name, value=banana)
RowKey: 4
=> (column=name, value=mango)
Index Table (MMDDHH)
RowKey: