The file size is very small (< 1M). The stage launches every time i call: -- sqlContext.read.parquet(path_to_file)
These are the parquet specific configurations I set: -- spark.sql.parquet.filterPushdown: true spark.sql.parquet.mergeSchema: true Thanks, J. On Sat, May 7, 2016 at 4:20 PM, Ashish Dubey <ashish....@gmail.com> wrote: > How big is your file and can you also share the code snippet > > > On Saturday, May 7, 2016, Johnny W. <jzw.ser...@gmail.com> wrote: > >> hi spark-user, >> >> I am using Spark 1.6.0. When I call sqlCtx.read.parquet to create a >> dataframe from a parquet data source with a single parquet file, it yields >> a stage with lots of small tasks. It seems the number of tasks depends on >> how many executors I have instead of how many parquet files/partitions I >> have. Actually, it launches 5 tasks on each executor. >> >> This behavior is quite strange, and may have potential issue if there is >> a slow executor. What is this "parquet" stage for? and why it launches 5 >> tasks on each executor? >> >> Thanks, >> J. >> >