[ 
https://issues.apache.org/jira/browse/IMPALA-13648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18016391#comment-18016391
 ] 

ASF subversion and git services commented on IMPALA-13648:
----------------------------------------------------------

Commit b4ad04788676a324cae9427bc6b2f54cd69c2890 in impala's branch 
refs/heads/master from Gabriella Gyorgyevics
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b4ad04788 ]

IMPALA-13648: Implement a decoder and an encoder for Byte Stream Split encoding

The decoder can read one, or multiple values at a time from the given
buffer. When reading multiple values at a time, they could be read with
a stride.

The encoder adds values one by one, until there are no more values to
add, or the output given couldn't fit any more. Optionally, the encoder
can recieve a buffer already filled with values. The encoding happens
upon calling `FinalizePage()`

Both the encoder and decoder can be used with either a template size_t
value, or a value given in the constructor. This value is the size of
the type to be coded in bytes.
* The template option is more optimized, but it only supports 4 and 8
byte types.
* The constructor option is less optimized, but it can recieve any
number as the byte size.
To use the constructor passed number, set the number passed in the
template to 0, otherwise pass the number of bytes in the template.

Note, that neither the encoder, nor the decoder are integrated with
Impala yet, so reading or writing data with byte stream split encoding
is not yet possible.

-------------------------------- Tests ---------------------------------

Created decoder tests for
* basic functionality,
* decoding values one by one
* decoding values in batch
* decoding values combining the previous two
* the stride feature
* skipping a number of values

Created encoder tests for
* basic functionality
* putting values in one by one
* giving the encoder a prepopulated buffer
* finalizing the page

Created two-way tests for the following cases:
* encoding then decoding one by one
* encoding then decoding in batch
* encoding then decoding with stride
* decoding one by one then encoding
* decoding in batch then encoding
* decoding with stride then encoding

Each of these tests is run on a data set of up to 200 values.

These tests are run on every supported type.

Change-Id: I71755d992536d70a22b8fdbfee1afce3aec81c26
Reviewed-on: http://gerrit.cloudera.org:8080/23239
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Implement a decoder and an encoder for Byte Stream Split encoding
> -----------------------------------------------------------------
>
>                 Key: IMPALA-13648
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13648
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Daniel Becker
>            Assignee: Gyorgyevics Gabriella
>            Priority: Major
>
> Implement a decoder and an encoder for Parquet Byte Stream Split encoding.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to