Re: Implementing a input format that splits according to column size

Brandon Williams Mon, 12 Sep 2011 15:44:53 -0700

On Mon, Sep 12, 2011 at 1:54 PM, Tharindu Mathew <mcclou...@gmail.com> wrote:
> Thanks Brandon for the clarification.
>
> I'd like to support a use case where an index is built in a row in a CF.


If you're just _building_ the row, the current state of things will
work just fine.  The trouble starts when you need to read it via
hadoop.

> So, as a starting point for a query, a known row with a larger number of
> columns will have to be selected. The split to the hadoop nodes should start
> at that level.

The other problem here is if you want 10 nodes to operate on the row
and have RF=3, you're losing locality for 7 of the nodes.  If the task
is heavily CPU-bound this is probably ok, otherwise it may be that
only using 3 nodes is better (since they will have a local replica.)

> Is this a common use case?

I'm not entirely sure what it is you want to do yet, but maybe I
answered it above.

-Brandon

Re: Implementing a input format that splits according to column size

Reply via email to