[jira] [Commented] (HBASE-18026) ProtobufUtil seems to do extra array copying

Anoop Sam John (JIRA) Sat, 13 May 2017 00:17:50 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-18026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16009186#comment-16009186
 ]


Anoop Sam John commented on HBASE-18026:
----------------------------------------

Well the concern was mainly abt the master branch patch.  I have been working 
in the PB area in master branch.  We have gone to the new version of PB there. 
Also when a request bytes been build up into a PB messages, we work with PB  
CIS.  Latest versions of PB allow to declare the incoming request bytes as 
Immutable.  If so , when we traverse over the bytes and make diff ByteString, 
PB wont do any copy and exactly refer to only those bytes for a ByteString (eg: 
 Row PB object)  Instead it will be a BoundedLiteralBS where we will have ref 
to byte[] and offset and length.  In such cases, the above assumptions wont be 
correct.  Well I realized now that the patch changed 
org.apache.hadoop.hbase.protobuf.ProtobufUtil but in master branch we use, 
org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil mainly (Specially for this 
Get req convert etc).  But the change as such is dangerous. 
I did not check branch-1 based code path.  When we make CIS from bytes, I dont 
think we declare it as Immutable there. Means when PB create ByteString, it 
will do a copy internally and as u said the byte [] will be 0 offset and 
required length only.   Also 
{code}
 public static byte[] zeroCopyGetBytes(final ByteString buf) {
    if (buf instanceof LiteralByteString) {
      return ((LiteralByteString) buf).bytes;
    }
    throw new UnsupportedOperationException("Need a LiteralByteString, got a "
                                            + buf.getClass().getName());
  }
{code}
So iff it is LiteralByteString the same underlying byte[] been returned.  Here 
the offset will be 0 and length is correct.  But any other type of ByteString , 
it throws Exception. May be on safe side we should just do toByteArray() for 
such cases as well?

So functional wise the patch in any branch might not have been affected. In 
master the PB working was is diff as I said but the change is not in a used 
file.  In older branches, the change might be ok as the PB usage is that way 
there.

> ProtobufUtil seems to do extra array copying
> --------------------------------------------
>
>                 Key: HBASE-18026
>                 URL: https://issues.apache.org/jira/browse/HBASE-18026
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 1.3.2
>            Reporter: Vincent Poon
>            Assignee: Vincent Poon
>            Priority: Minor
>             Fix For: 2.0.0, 1.4.0, 1.2.6, 1.3.2, 1.1.11
>
>         Attachments: HBASE-18026.branch-1.v1.patch, 
> HBASE-18026.master.v1.patch
>
>
> In ProtobufUtil, the protobuf fields are copied into an array using 
> toByteArray().  These are then passed into the KeyValue constructor which 
> does another copy.
> It seems like we can avoid a copy here by using 
> HBaseZeroCopyByteString#zeroCopyGetBytes() ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-18026) ProtobufUtil seems to do extra array copying

Reply via email to