Re: Shuffle write and read phase optimizations for parquet+zstd write

2024-02-08 Thread Mich Talebzadeh
Hi, ... Most of our jobs end up with a shuffle stage based on a partition column value before writing into a parquet, and most of the time we have data skewness in partitions Have you considered the causes of these recurring issues and some potential alternative strategies? 1. - Tunin

Re: Enhanced Console Sink for Structured Streaming

2024-02-08 Thread Anish Shrigondekar
Hi Neil, Thanks for putting this together. +1 to the proposal of enhancing the console sink further. I think it will help new users understand some of the streaming/micro-batch semantics a bit better in Spark. Agree with not having verbose mode enabled by default. I think step 1 described above s

Re: Enhanced Console Sink for Structured Streaming

2024-02-08 Thread Jerry Peng
I am generally a +1 on this as we can use this information in our docs to demonstrate certains concepts to potential users. I am in agreement with other reviewers that we should keep the existing default behavior of the console sink. This new style of output should be enabled behind a flag. As f