Would really appreciate any help on this. On Thu, Sep 22, 2011 at 11:34 PM, Tharindu Mathew <mcclou...@gmail.com>wrote:
> Hi, > > I managed to modify the Hadoop-Cassandra integration to start with a column > of a CF used for indexing. In the map phase, I get keys from different CFs > and get the row I need. So this all works fine, for a single node. :) > > I'd like to effectively identify a set of nodes for a set of rows and get > them efficiently into Hadoop. So my initial design was something like this. > > Have a new operation in the thrift interface that allows us to do, > > Map<(CF+key), List<endpoints>> client.get_endpoints ( List<CF+keys>) > > Functionality would be similar to node tools#getEndpoints. > > And, then when processing we can get the relevant endpoint relevant to each > CF and key, through this without querying for node for each and every key. > If the key is not found in the endpoint (due to node been added/ displaced > while processing), only then we calculate the relevant end point again. > > I'd like to ask from the cassandra devs whether this method sounds the best > way to do this or to point out any improvements/ flaws in the way I'm > approaching this? > > Thanks in advance. > > -- > Regards, > > Tharindu > > blog: http://mackiemathew.com/ > > -- Regards, Tharindu blog: http://mackiemathew.com/