Re: sqlCtx.read.parquet yields lots of small tasks

Johnny W. Sun, 08 May 2016 00:10:42 -0700

The file size is very small (< 1M). The stage launches every time i call:
--
sqlContext.read.parquet(path_to_file)


These are the parquet specific configurations I set:
--
spark.sql.parquet.filterPushdown: true
spark.sql.parquet.mergeSchema: true

Thanks,
J.

On Sat, May 7, 2016 at 4:20 PM, Ashish Dubey <ashish....@gmail.com> wrote:

> How big is your file and can you also share the code snippet
>
>
> On Saturday, May 7, 2016, Johnny W. <jzw.ser...@gmail.com> wrote:
>
>> hi spark-user,
>>
>> I am using Spark 1.6.0. When I call sqlCtx.read.parquet to create a
>> dataframe from a parquet data source with a single parquet file, it yields
>> a stage with lots of small tasks. It seems the number of tasks depends on
>> how many executors I have instead of how many parquet files/partitions I
>> have. Actually, it launches 5 tasks on each executor.
>>
>> This behavior is quite strange, and may have potential issue if there is
>> a slow executor. What is this "parquet" stage for? and why it launches 5
>> tasks on each executor?
>>
>> Thanks,
>> J.
>>
>

Re: sqlCtx.read.parquet yields lots of small tasks

Reply via email to