[
https://issues.apache.org/jira/browse/HBASE-15484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15656513#comment-15656513
]
Phil Yang edited comment on HBASE-15484 at 11/11/16 8:43 AM:
-------------------------------------------------------------
For caching we had some discussion in HBASE-16987 and HBASE-16973. Using
size/time limit is more direct than setCache for users because usually they
setLimit because they want to limit size/time, and now by default we set cache
to max_value.
Paging in cell level is a possible scene. It is different from "limit" which
Duo mentions because limit means we can stop and close the scanner, but batch
means we should pause and wait next call. Since we have size/time limit at
server side, a large row will not result in OOM at server even users don't
setBatch. If users indeed need setBatch to limit the max number of cells for
one Result returns, I think we can keep setBatch interface but change it to a
client-only logic. In server we only consider size/time limit, and if we return
more than batch cells, we can cache the rest of them in client? By this
changing, we can decrease the number of RPC requests without OOM/Timeout risk.
[~stack] [~carp84] [~mantonov] FYI, you also had some ideas about scanning in
HBASE-16973 :) Thanks.
was (Author: yangzhe1991):
For caching we had some discussion in HBASE-16987 and HBASE-16973. Using
size/time limit is more direct than setCache for users because usually they
setLimit because they want to limit size/time, and now by default we set cache
to max_value.
Paging in cell level is a possible scene. It is different from "limit" which
Duo mentions because limit means we can stop and close the scanner, but batch
means we should pause and wait next call. Since we have size/time limit at
server side, a large row will not result in OOM at server even users don't
setBatch. If users indeed need setBatch to limit the max number of cells for
one Result returns, I think we can keep setBatch interface but change it to a
client-only logic. In server we only consider size/time limit, and if we return
more than batch cells, we can cache them in client? By this changing, we can
decrease the number of RPC requests without OOM/Timeout risk.
[~stack] [~carp84] [~mantonov] FYI, you also had some ideas about scanning in
HBASE-16973 :) Thanks.
> Correct the semantic of batch and partial
> -----------------------------------------
>
> Key: HBASE-15484
> URL: https://issues.apache.org/jira/browse/HBASE-15484
> Project: HBase
> Issue Type: Bug
> Affects Versions: 1.2.0, 1.1.3
> Reporter: Phil Yang
> Assignee: Phil Yang
> Fix For: 2.0.0
>
> Attachments: HBASE-15484-v1.patch, HBASE-15484-v2.patch,
> HBASE-15484-v3.patch, HBASE-15484-v4.patch
>
>
> Follow-up to HBASE-15325, as discussed, the meaning of setBatch and
> setAllowPartialResults should not be same. We should not regard setBatch as
> setAllowPartialResults.
> And isPartial should be define accurately.
> (Considering getBatch==MaxInt if we don't setBatch.) If
> result.rawcells.length<scan.getBatch && result is not the last part of this
> row, isPartial==true, otherwise isPartial == false. So if user don't
> setAllowPartialResults(true), isPartial should always be false.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)