Hi
Could it be due to GC ? I read it may happen if your program starts with
a small heap. What are your -Xms and -Xmx values ?
Print GC stats with -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
Guillaume
Hello spark users and developers!
I am using hdfs + spark sql + hive schema + p
Hey Sean and spark users!
Thanks for reply. I try -Xcomp right now and start time was about few
minutes (as expected), but I got first query slow as before:
Oct 10, 2014 3:03:41 PM INFO: parquet.hadoop.InternalParquetRecordReader:
Assembled and processed 1568899 records from 30 columns in 12897 ms
You could try setting "-Xcomp" for executors to force JIT compilation
upfront. I don't know if it's a good idea overall but might show
whether the upfront compilation really helps. I doubt it.
However is this almost surely due to caching somewhere, in Spark SQL
or HDFS? I really doubt hotspot make
Hello spark users and developers!
I am using hdfs + spark sql + hive schema + parquet as storage format. I
have lot of parquet files - one files fits one hdfs block for one day. The
strange thing is very slow first query for spark sql.
To reproduce situation I use only one core and I have 97sec f