Re: Coprocessors and batch processing

lars hofhansl Thu, 11 Aug 2011 10:56:36 -0700

Thanks Himanshu,

but that is not quite what I meant.

Yes, a batch operation is broken up in "chunks" per regionserver and then the 
chunks are shipped to the individual regionservers.
But then there is no way to interact with those chunks at the regionserver 
through coprocessors(as a whole).

What I want to do is to look at the entire chunk at each regionserver and then 
do some other bulk operation based on that.
Currently I only get pre/post hooks for single rows, and no way to group these 
together later (other than just waiting for a little bit and let
work accumulate).

Say I have a client request with (say) 1000 puts, and let's also say that there 
are 5 region server, and each happens to host exactly 1/5th of the rowkeys, so 
each region server gets a chunk of 200 puts.
Now a coprocessor might have logic that affect another table (for example for 
naive 2ndary indexing). At the coprocessor level I can get an 

HTableInterface from the environment and now I want to do a batch put of 200 
rows (of course those will be broken up per region server again, etc).
Currently I can't do that, because there are only single "row" pre/post hooks, 
and no way to determine when all operations of a request are done. The end 
result is that I have to do 200 single row puts, one in each call to pre or 
post hooks.

Does that make sense?

-- Lars

________________________________
From: Himanshu Vashishtha <[email protected]>
To: [email protected]; lars hofhansl <[email protected]>
Sent: Wednesday, August 10, 2011 11:21 PM
Subject: Re: Coprocessors and batch processing

Client side batch processing is done at RegionServer level, i.e., all Action
objects are grouped together per RS basis and send in one RPC. Once the
batch arrives at a RS, it gets distributed across corresponding Regions, and
these Action objects are processed, one by one. This include Coprocessor's
Exec objects too.
So, a coprocessor is working at a "Region" level granularity.

If you want to take some action (process bunch of rows of another table from
a CP), one can get a HTable instance from Environment instance of a
Coprocessor, and use the same mechanism as used by the client side.
Will that help in your use-case?

Thanks,
Himanshu

On Wed, Aug 10, 2011 at 11:46 PM, lars hofhansl <[email protected]> wrote:

> Here's another coprocessor question...
>
> From the client we batch operations in order to reduce the number of round
> trips.
> Currently there is no way (that I can find) to make use of those batches in
> coprocessors.
>
> This is an issue when, for example, sets of puts and gets are (partially)
> forwarded to another table by the coprocessor.
> Right now this would need to use many single puts/deletes/gets from the
> various {pre|post}{put|delete|get} hooks.
>
> There is no useful demarcation; other than maybe waiting a few miliseconds,
> which is awkward.
>
>
> Of course this forwarding could be done directly from the client, put then
> what's the point of coprocessors?
>
> I guess there could either be a {pre|post}Multi on RegionObserver (although
> HRegionServer.multi does a lot of munging).
> Or maybe a general {pre|post}Request with no arguments - in which case it
> would be at least possible to write code in the coprocessor
> to collect the puts/deletes/etc through the normal single
> prePut/preDelete/etc hooks and then batch-process them in postRequest().
>
> -- Lars
>

Re: Coprocessors and batch processing

Reply via email to