> If I would use client.get_slice ( key). My rowkey is '20130314' from Index > Table. > Q1) How to know for rowkey '20130314' is in which Token Range & EndPoint. Calculate the MD5 hash of the key and find the token range that contains it. This is what is used internally https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/FBUtilities.java#L239
Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 30/03/2013, at 10:45 AM, Alicia Leong <lccali...@gmail.com> wrote: > This is the current flow for ColumnFamilyInputFormat. Please correct me If > I'm wrong > > 1) In ColumnFamilyInputFormat, Get all nodes token ranges using > client.describe_ring > 2) Get CfSplit using client.describe_splits_ex with the token range > 2) new ColumnFamilySplit with start range, end range and endpoint > 3) In ColumnFamilyRecordReader, will query client.get_range_slices with the > start range & end range of the ColumnFamilySplit at endpoint (datanode) > > > If I would use client.get_slice ( key). My rowkey is '20130314' from Index > Table. > Q1) How to know for rowkey '20130314' is in which Token Range & EndPoint. > Even though I manage to find out the Token Range & EndPoint. > Is the available Thrift API, that I can pass the ( ByteBuffer key, KeyRange > range ) Likes merge of client.get_slice & client.get_range_slices > > > Thanks > > > > On Sat, Mar 30, 2013 at 7:53 AM, Edward Capriolo <edlinuxg...@gmail.com> > wrote: > You can use the output of describe_ring along with partitioner information to > determine which nodes data lives on. > > > On Fri, Mar 29, 2013 at 12:33 PM, Alicia Leong <lccali...@gmail.com> wrote: > Hi All > I’m thinking to do in this way. > > 1) 1) get_slice ( YYYYMMDDHH ) from Index Table. > > 2) 2) With the returned list of ROWKEYs > > 3) 3) Pass it to multiget_slice ( keys …) > > > But my questions is how to ensure ‘Data Locality’ ?? > > > > On Tue, Mar 19, 2013 at 3:33 PM, aaron morton <aa...@thelastpickle.com> wrote: > I would be looking at Hive or Pig, rather than writing the MapReduce. > > There is an example in the source cassandra distribution, or you can look at > Data Stax Enterprise to start playing with Hive. > > Typically with hadoop queries you want to query a lot of data, if you are > only querying a few rows consider writing the code in your favourite > language. > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 18/03/2013, at 1:29 PM, Alicia Leong <lccali...@gmail.com> wrote: > >> Hi All >> >> I have 2 tables >> >> Data Table >> ----------------- >> RowKey: 1 >> => (column=name, value=apple) >> RowKey: 2 >> => (column=name, value=orange) >> RowKey: 3 >> => (column=name, value=banana) >> RowKey: 4 >> => (column=name, value=mango) >> >> >> Index Table (YYYYMMDDHH) >> ------------------------------------------------ >> RowKey: 2013030114 >> => (column=1, value=) >> => (column=2, value=) >> => (column=3, value=) >> RowKey: 2013030115 >> => (column=4, value=) >> >> >> I would like to know, how to implement below in MapReduce >> 1) first query the Index Table by RowKey: 2013030114 >> 2) then pass the Index Table column names (1,2,3) to query the Data Table >> >> Thanks in advance. > > > >