Gabriella Gyorgyevics has uploaded a new patch set (#27). ( http://gerrit.cloudera.org:8080/22165 )
Change subject: IMPALA-13648: Implement a decoder and an encoder for Byte Stream Split encoding ...................................................................... IMPALA-13648: Implement a decoder and an encoder for Byte Stream Split encoding The decoder can read one, or multiple values at a time from the given buffer. When reading multiple values at a time, they could be read with a stride. The encoder adds values one by one, until there are no more values to add, or the output given couldn't fit any more. The encoding happens upon calling `FinalizePage()` Both the encoder and decoder can be used with either a template size_t value, or a value given in the constructor. This value is the size of the type to be coded in bytes. * The template option is more optimized, but it only supports 4 and 8 byte types. * The constructor option is less optimized, but it can recieve any number as the byte size. To use the constructor passed number, set the number passed in the template to 0, otherwise pass the number of bytes in the template. Note, that neither the encoder, nor the decoder are integrated with Impala yet, so reading or writing data with byte stream split encoding is not yet possible. Created decoder tests for * basic functionality, * decoding values one by one * decoding values in batch * decoding values combining the previous two * the stride feature * skipping a number of values Created encoder tests for * basic functionality * putting values in one by one * finalizing the page Created two-way tests for the following cases: * encoding then decoding one by one * encoding then decoding in batch * encoding then decoding with stride * decoding one by one then encoding * decoding in batch then encoding * decoding with stride then encoding Each of these tests is run on a data set of up to 200 values. These tests are run on every supported type. Change-Id: Icea60894ae22b8ddb7616aeda6d69012cc69972c --- M be/src/exec/parquet/CMakeLists.txt A be/src/exec/parquet/parquet-byte-stream-split-coder-test-data.h A be/src/exec/parquet/parquet-byte-stream-split-decoder.cc A be/src/exec/parquet/parquet-byte-stream-split-decoder.h A be/src/exec/parquet/parquet-byte-stream-split-encoder.cc A be/src/exec/parquet/parquet-byte-stream-split-encoder.h A be/src/exec/parquet/parquet-byte-stream-split-test.cc 7 files changed, 2,304 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/22165/27 -- To view, visit http://gerrit.cloudera.org:8080/22165 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icea60894ae22b8ddb7616aeda6d69012cc69972c Gerrit-Change-Number: 22165 Gerrit-PatchSet: 27 Gerrit-Owner: Gabriella Gyorgyevics <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Gabriella Gyorgyevics <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
