Gabriella Gyorgyevics has uploaded a new patch set (#27). ( 
http://gerrit.cloudera.org:8080/22165 )

Change subject: IMPALA-13648: Implement a decoder and an encoder for Byte 
Stream Split encoding
......................................................................

IMPALA-13648: Implement a decoder and an encoder for Byte Stream Split encoding

The decoder can read one, or multiple values at a time from the given
buffer. When reading multiple values at a time, they could be read with
a stride.

The encoder adds values one by one, until there are no more values to
add, or the output given couldn't fit any more. The encoding happens
upon calling `FinalizePage()`

Both the encoder and decoder can be used with either a template size_t
value, or a value given in the constructor. This value is the size of
the type to be coded in bytes.
* The template option is more optimized, but it only supports 4 and 8
byte types.
* The constructor option is less optimized, but it can recieve any
number as the byte size.
To use the constructor passed number, set the number passed in the
template to 0, otherwise pass the number of bytes in the template.

Note, that neither the encoder, nor the decoder are integrated with
Impala yet, so reading or writing data with byte stream split encoding
is not yet possible.

Created decoder tests for
* basic functionality,
* decoding values one by one
* decoding values in batch
* decoding values combining the previous two
* the stride feature
* skipping a number of values

Created encoder tests for
* basic functionality
* putting values in one by one
* finalizing the page

Created two-way tests for the following cases:
* encoding then decoding one by one
* encoding then decoding in batch
* encoding then decoding with stride
* decoding one by one then encoding
* decoding in batch then encoding
* decoding with stride then encoding

Each of these tests is run on a data set of up to 200
values.

These tests are run on every supported type.

Change-Id: Icea60894ae22b8ddb7616aeda6d69012cc69972c
---
M be/src/exec/parquet/CMakeLists.txt
A be/src/exec/parquet/parquet-byte-stream-split-coder-test-data.h
A be/src/exec/parquet/parquet-byte-stream-split-decoder.cc
A be/src/exec/parquet/parquet-byte-stream-split-decoder.h
A be/src/exec/parquet/parquet-byte-stream-split-encoder.cc
A be/src/exec/parquet/parquet-byte-stream-split-encoder.h
A be/src/exec/parquet/parquet-byte-stream-split-test.cc
7 files changed, 2,304 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/22165/27
--
To view, visit http://gerrit.cloudera.org:8080/22165
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icea60894ae22b8ddb7616aeda6d69012cc69972c
Gerrit-Change-Number: 22165
Gerrit-PatchSet: 27
Gerrit-Owner: Gabriella Gyorgyevics <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Gabriella Gyorgyevics <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>

Reply via email to