I've run some tests with some real and some synthetic parquet data with nested 
columns with and without the hive metastore on our Spark 1.5, 1.6 and 2.0 
versions. I haven't seen any unexpected performance surprises, except that 
Spark 2.0 now does schema inference across all files in a partitioned parquet 
metastore table. Granted, you aren't using a metastore table, but maybe Spark 
does that for partitioned non-metastore tables as well.

Michael

> On Jul 20, 2016, at 2:16 PM, Maciej Bryński <mac...@brynski.pl> wrote:
> 
> @Michael,
> I answered in Jira and could repeat here.
> I think that my problem is unrelated to Hive, because I'm using read.parquet 
> method.
> I also attached some VisualVM snapshots to SPARK-16321 (I think I should 
> merge both issues)
> And code profiling suggest bottleneck when reading parquet file.
> 
> I wonder if there are any other benchmarks related to parquet performance.
> 
> Regards,
> -- 
> Maciek Bryński


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to