Re: What's the root cause of not supporting multiple aggregations in structured streaming?

2020-11-26 Thread Etienne Chauchot
ct, even across the major versions along the features/APIs. It is great for end users to migrate the version easily, but also blocks devs to fix the bad design once it ships. I'm the one complaining about these issues in the dev list, and I don't see willingness to correct 

Re: What's the root cause of not supporting multiple aggregations in structured streaming?

2020-09-04 Thread Etienne Chauchot
c when I took a look at 3.0 preview 2 searching for this particular feature. And regarding the workaround, I'm not sure it meets my needs as it will add delays and also may mess up with watermarks. Best Etienne Chauchot On 04/09/2020 08:06, Jungtaek Lim wrote: Unfortunately I don

Re: What's the root cause of not supporting multiple aggregations in structured streaming?

2020-08-31 Thread Etienne Chauchot
Hi all, I'm also very interested in this feature but the PR is open since January 2019 and was not updated. It raised a design discussion around watermarks and a design doc was written (https://docs.google.com/document/d/1IAH9UQJPUiUCLd7H6dazRK2k1szDX38SnM6GVNZYvUo/edit#heading=h.npkueh4bbkz1)

Re: CombinePerKey and GroupByKey

2019-03-01 Thread Etienne Chauchot
That's good to know Thanks Etienne Le jeudi 28 février 2019 à 10:05 -0800, Reynold Xin a écrit : > This should be fine. Dataset.groupByKey is a logical operation, not a > physical one (as in Spark wouldn’t always > materialize all the groups in memory). > On Thu, Feb 28, 2019 a

CombinePerKey and GroupByKey

2019-02-28 Thread Etienne Chauchot
Hi all, I'm migrating RDD pipelines to Dataset and I saw that Combine.PerKey is no more there in the Dataset API. So, I translated it to: KeyValueGroupedDataset> groupedDataset = keyedDataset.groupByKey(KVHelpers.extractKey(), EncoderHelpers.genericEncoder()); Dataset> combinedDataset =

Re: [Apache Beam] Custom DataSourceV2 instanciation: parameters passing and Encoders

2018-12-18 Thread Etienne Chauchot
Hi everyone, Does anyone have comments on this question? CCing user ML ThanksEtienne Le mardi 11 décembre 2018 à 19:02 +0100, Etienne Chauchot a écrit : > Hi Spark guys, > I'm Etienne Chauchot and I'm a committer on the Apache Beam project. > We have what we call runners.

[Apache Beam] Custom DataSourceV2 instanciation: parameters passing and Encoders

2018-12-11 Thread Etienne Chauchot
Hi Spark guys, I'm Etienne Chauchot and I'm a committer on the Apache Beam project. We have what we call runners. They are pieces of software that translate pipelines written using Beam API into pipelines that use native execution engine API. Currently, the Spark runner uses old RDD