Hi, thanks for starting a discussion about data skew! I agree, it's a important issue that can cause a lot of problems. I'll have a look at your proposal and add comments soon.
Thanks, Fabian 2015-10-15 12:24 GMT+02:00 Li, Chengxiang <chengxiang...@intel.com>: > Dear all, > In many real world use case, data are nature to be skewed. For example, in > social network, famous people get much more "follow" than others, a hot > tweet would be transferred millions of times. and the purchased records of > normal product can never compared to hot products. While at the same time, > Flink runtime assume that all tasks consume same size resources, this's not > always true. Skew data handling try to make skewed data fit into Flink's > runtime. > I write a proposal about skew data handling in Flink, you can read it at > https://docs.google.com/document/d/1ma060BUlhXDqeFmviEO7Io4CXLKgrAXIfeDYldvZsKI/edit?usp=sharing > . > Any comments and feedback are welcome, you can comment on the google doc, > or reply this email thread directly. > > Thanks > Chengxiang >