On 16 Nov 2017, at 10:22, Michael Shtelma <mshte...@gmail.com> wrote:
> you call repartition(1) before starting processing your files. This
> will ensure that you end up with just one partition.

One question and one remark:

Q) val ds = sqlContext.read.parquet(path).repartition(1)

Am I absolutely sure that my file here is read by a single executor and that no 
data shuffling takes place afterwards to get that single partition?

R) This approach did not work for me.

    val ds = sqlContext.read.parquet(path).repartition(1)
    
    // ds on a single partition

    ds.createOrReplaceTempView("ds")

    val result = sqlContext.sql("... from ds")

    // result on 166 partitions... How to force the processing on a
    // single executor?

    result.write.csv(...)

    // 166 files :-/

Jeroen


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to