Dear Sparkers, A while back, I asked how to process non-splittable files in parallel, one file per executor. Vadim's suggested "scheduling within an application" approach worked out beautifully.
I am now facing the 'opposite' problem: - I have a bunch of parquet files to process - Once processed I need to output a /single/ file for each input file - When I read a parquet file, it gets partitioned over several executors - If I want a single output file, I would need to coalesce(1) with potential performance issues. Since my files are relatively small, a single file could be handled by a single executor, and several files could be read in parallel, one for each executor. My question is: how to force my parquet file to be read by a single executor, without repartitioning or coalescing of course. Regards, Jeroen --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org