----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31800/#review75583 -----------------------------------------------------------
I just have one major question in the vectorization code. Otherwise this looks great! Thanks Sergio! ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java <https://reviews.apache.org/r/31800/#comment122781> If the incoming objects aren't necessarily writables, why doesn't this require cases for `values[i] instanceof Double`, `Boolean`, `Float`, `Integer`, and `Long`? ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/Repeated.java <https://reviews.apache.org/r/31800/#comment122773> Is this not used? ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java <https://reviews.apache.org/r/31800/#comment122776> Should this be named `arrObjects`? - Ryan Blue On March 6, 2015, 9:18 a.m., Sergio Pena wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/31800/ > ----------------------------------------------------------- > > (Updated March 6, 2015, 9:18 a.m.) > > > Review request for hive, Ryan Blue and cheng xu. > > > Bugs: HIVE-9658 > https://issues.apache.org/jira/browse/HIVE-9658 > > > Repository: hive-git > > > Description > ------- > > This patch bypasses primitive java objects to hive object inspectors without > using primitive Writable objects. > It helps to reduce memory usage. > > I did not bypass other complex objects, such as binaries, decimal and > date/timestamp, because their Writable objects are needed in other parts of > the code, > and creating them later takes more ops/s to do it. Better save time at the > beginning. > > > Diffs > ----- > > > itests/hive-jmh/src/main/java/org/apache/hive/benchmark/storage/ColumnarStorageBench.java > 4f6985cd13017ce37f4f0c100b16a27aa5b02f8b > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java > c915f728fc9b27da0fabefab5d8f5faa53640b78 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java > 0391229723cc3ecef551fa44b8456b0d2ac93fb5 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/VectorizedParquetInputFormat.java > d7edd52614771857d1b21971a66894841c248ef9 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ConverterParent.java > 6ff6b473c9f1867bc14bb597094ddb92487cc954 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/DataWritableRecordConverter.java > a43661eb54ba29692c07c264584b5aecf648ef99 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java > 377e362979156b8d52d103192b22bd7f19fa683b > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveCollectionConverter.java > f1c8b6f13718b37f590263e5b35ed6c327f5cf4f > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveGroupConverter.java > c6d03a19029d5bcc86b998dd7a8609973648c103 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveStructConverter.java > f95d15eddc21bc432fa53572de5756751a13341a > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/Repeated.java > ee57b31dac53d99af0c5a520f51102796ca32fd3 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java > 47cd68200d3be9260aa35385d0dade74d7dc215d > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java > 6dc85faecabd59dfc616e908926c1f6b6db372de > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java > 49bf1c5325833993f4c09efdf1546af560783c28 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java > bb066afd38aea6b2eb119b0f8ec8d00af57dc187 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/DeepParquetHiveMapInspector.java > 143d72e76502d4877e8208181d9743259051dcea > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveArrayInspector.java > bde0dcbb3978ba47b15ae2c9bbe2f87ed3984ab1 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java > 7fd5e9612d4e3c9bf3b816bc48dbdbe59fb8a5a8 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/StandardParquetHiveMapInspector.java > 22250b30a14d52907fb22d4f44b93c7633c6a89e > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetByteInspector.java > 864f56292fa4856df155f546064e4a6732cc663f > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetShortInspector.java > 39f265777c7e164382117e3902c3b6e491295f70 > > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java > 3a476731e31bf38822f0d530f0aea2eadb675a49 > > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestArrayCompatibility.java > d45d8eeb9e8a61f254098ab15d0305fc71152abd > > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java > 8f03c5b403332f7b36b2271a2246a0fc90b3bfba > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapStructures.java > 3c7401ffbe88ce66b96f9cceab4e9c3d6267f8fe > > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapredParquetInputFormat.java > 1a54bf5797efd5859c9e665bcc7134168e5d193f > serde/src/java/org/apache/hadoop/hive/serde2/io/HiveArrayWritable.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/31800/diff/ > > > Testing > ------- > > Some performance tests were done to validate this. > > Schema: > int,double,boolean,string,array<int>,map<string,string>,struct<a:int,b:int> > > - JMH (Microbenchmarks) calls on parquet reads. > > Before: 579 ops/s > After: 651 ops/s > > - YourKit Java Profiler to measure memory objects recorded. > Reading 20,000 random rows (10 times) > > Before: > Objects recorded: 1,863,610 > Objects size: 42,373,808 > Total memory usage: 29% > > After: > Objects recorded: 1,596,804 > Objects size: 34,192,832 > Total memory usage: 24% > > All tests were run multiple times to get same results. > > > Thanks, > > Sergio Pena > >