-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31800/#review75583
-----------------------------------------------------------


I just have one major question in the vectorization code. Otherwise this looks 
great! Thanks Sergio!


ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
<https://reviews.apache.org/r/31800/#comment122781>

    If the incoming objects aren't necessarily writables, why doesn't this 
require cases for `values[i] instanceof Double`, `Boolean`, `Float`, `Integer`, 
and `Long`?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/Repeated.java
<https://reviews.apache.org/r/31800/#comment122773>

    Is this not used?



ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java
<https://reviews.apache.org/r/31800/#comment122776>

    Should this be named `arrObjects`?


- Ryan Blue


On March 6, 2015, 9:18 a.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31800/
> -----------------------------------------------------------
> 
> (Updated March 6, 2015, 9:18 a.m.)
> 
> 
> Review request for hive, Ryan Blue and cheng xu.
> 
> 
> Bugs: HIVE-9658
>     https://issues.apache.org/jira/browse/HIVE-9658
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch bypasses primitive java objects to hive object inspectors without 
> using primitive Writable objects.
> It helps to reduce memory usage.
> 
> I did not bypass other complex objects, such as binaries, decimal and 
> date/timestamp, because their Writable objects are needed in other parts of 
> the code,
> and creating them later takes more ops/s to do it. Better save time at the 
> beginning.
> 
> 
> Diffs
> -----
> 
>   
> itests/hive-jmh/src/main/java/org/apache/hive/benchmark/storage/ColumnarStorageBench.java
>  4f6985cd13017ce37f4f0c100b16a27aa5b02f8b 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
>  c915f728fc9b27da0fabefab5d8f5faa53640b78 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java
>  0391229723cc3ecef551fa44b8456b0d2ac93fb5 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/VectorizedParquetInputFormat.java
>  d7edd52614771857d1b21971a66894841c248ef9 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ConverterParent.java 
> 6ff6b473c9f1867bc14bb597094ddb92487cc954 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/DataWritableRecordConverter.java
>  a43661eb54ba29692c07c264584b5aecf648ef99 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java 
> 377e362979156b8d52d103192b22bd7f19fa683b 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveCollectionConverter.java
>  f1c8b6f13718b37f590263e5b35ed6c327f5cf4f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveGroupConverter.java
>  c6d03a19029d5bcc86b998dd7a8609973648c103 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveStructConverter.java
>  f95d15eddc21bc432fa53572de5756751a13341a 
>   ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/Repeated.java 
> ee57b31dac53d99af0c5a520f51102796ca32fd3 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
>  47cd68200d3be9260aa35385d0dade74d7dc215d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java
>  6dc85faecabd59dfc616e908926c1f6b6db372de 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java
>  49bf1c5325833993f4c09efdf1546af560783c28 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java
>  bb066afd38aea6b2eb119b0f8ec8d00af57dc187 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/DeepParquetHiveMapInspector.java
>  143d72e76502d4877e8208181d9743259051dcea 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveArrayInspector.java
>  bde0dcbb3978ba47b15ae2c9bbe2f87ed3984ab1 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java 
> 7fd5e9612d4e3c9bf3b816bc48dbdbe59fb8a5a8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/StandardParquetHiveMapInspector.java
>  22250b30a14d52907fb22d4f44b93c7633c6a89e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetByteInspector.java
>  864f56292fa4856df155f546064e4a6732cc663f 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetShortInspector.java
>  39f265777c7e164382117e3902c3b6e491295f70 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/AbstractTestParquetDirect.java
>  3a476731e31bf38822f0d530f0aea2eadb675a49 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestArrayCompatibility.java 
> d45d8eeb9e8a61f254098ab15d0305fc71152abd 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestDataWritableWriter.java 
> 8f03c5b403332f7b36b2271a2246a0fc90b3bfba 
>   ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapStructures.java 
> 3c7401ffbe88ce66b96f9cceab4e9c3d6267f8fe 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapredParquetInputFormat.java
>  1a54bf5797efd5859c9e665bcc7134168e5d193f 
>   serde/src/java/org/apache/hadoop/hive/serde2/io/HiveArrayWritable.java 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/31800/diff/
> 
> 
> Testing
> -------
> 
> Some performance tests were done to validate this.
> 
> Schema: 
> int,double,boolean,string,array<int>,map<string,string>,struct<a:int,b:int>
>   
> - JMH (Microbenchmarks) calls on parquet reads.
>   
>   Before: 579 ops/s
>   After:  651 ops/s
> 
> - YourKit Java Profiler to measure memory objects recorded.
>   Reading 20,000 random rows (10 times)
>   
>   Before:
>      Objects recorded:   1,863,610
>      Objects size:       42,373,808
>      Total memory usage: 29%
>      
>   After:
>      Objects recorded:   1,596,804
>      Objects size:       34,192,832
>      Total memory usage: 24%
> 
> All tests were run multiple times to get same results.
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>

Reply via email to