Hi all,

Many Spark users in my company are asking for a way to control the number
of output files in Spark SQL. There are use cases to either reduce or
increase the number. The users prefer not to use function *repartition*(n)
or *coalesce*(n, shuffle) that require them to write and deploy
Scala/Java/Python code.

Could we introduce a query hint for this purpose (similar to Broadcast Join
Hints)?

    /*+ *COALESCE*(n, shuffle) */

In general, is query hint is the best way to bring DF functionality to SQL
without extending SQL syntax? Any suggestion is highly appreciated.

This requirement is not the same as SPARK-6221 that asked for auto-merging
output files.

Thanks,
John Zhuge

Reply via email to