Hi,
... Most of our jobs end up with a shuffle stage based on a partition
column value before writing into a parquet, and most of the time we have
data skewness in partitions
Have you considered the causes of these recurring issues and some potential
alternative strategies?
1.
- Tunin
Hi Neil,
Thanks for putting this together. +1 to the proposal of enhancing the
console sink further. I think it will help new users understand some of the
streaming/micro-batch semantics a bit better in Spark.
Agree with not having verbose mode enabled by default. I think step 1
described above s
I am generally a +1 on this as we can use this information in our docs to
demonstrate certains concepts to potential users.
I am in agreement with other reviewers that we should keep the existing
default behavior of the console sink. This new style of output should be
enabled behind a flag.
As f