w.r.t. ScanType, here is the logic used by DefaultCompactor:
ScanType scanType =
request.isAllFiles() ? ScanType.COMPACT_DROP_DELETES : ScanType.
COMPACT_RETAIN_DELETES;
BTW ScanType is currently marked InterfaceAudience.Private
Should it be marked LimitedPrivate ?
Cheers
On Fri, Jan 9, 2015 at 12:19 PM, Gary Helmling <[email protected]> wrote:
> >
> >
> > 2) is more expensive than 1).
> > I'm wondering if we could use Compaction Coprocessor for 2)? HBaseHUT
> > needs to be able to grab N rows and merge them into 1, delete those N
> rows,
> > and just write that 1 new row. This N could be several thousand rows.
> > Could Compaction Coprocessor really be used for that?
> >
> >
> It would depend on the details. If you're simply aggregating the data into
> one row, and:
> * the thousands of rows are contiguous in the scan
> * you can somehow incrementally update or emit the new row that you want to
> create so that you don't need to retain all the old rows in memory
> * the new row you want to emit would sort sequentially into the same
> position
>
> Then overriding the scanner used for compaction could be a good solution.
> This would allow you to transform the cells emitted during compaction,
> including dropping the cells from the old rows and emitting new
> (transformed) cells for the new row.
>
>
> > Also, would that come into play during minor or major compactions or
> both?
> >
> >
> You can distinguish between them in your coprocessor hooks based on
> ScanType. So up to you.
>