Re: Achieving MapState Equivalent in Dataflow Runner

2020-02-14 Thread Ziyad Muhammed
Thanks Luke! I will try that. Best Ziyad On Thu, Feb 13, 2020 at 6:16 PM Luke Cwik wrote: > If the map/set can fit in memory then you can use a value state containing > a Java Map/Set. > > On Thu, Feb 13, 2020 at 5:05 AM Ziyad Muhammed wrote: > >> Hi All >> >> I'm developing a beam pipeline t

Aggregate-and-join in Beam

2020-02-14 Thread Pawel Kordek
Hello I am working on a relatively simple streaming data pipeline. Imagine the elements of PCollection represent something like sensor readings (with event timestamps): Reading(id: String, value: Int) For each incoming element I want to have at the end of the pipeline: Reading(id: String, val

Re: Aggregate-and-join in Beam

2020-02-14 Thread Paweł Kordek
I've just read this message again and realized that the requirement "in an hour preceding the timestamp" is not feasible nor necessary, for this part I will use sliding windows, let's say by 5 minutes. The rest of the question still stands, I still haven't wrapped my head around it. The importan

Re: Aggregate-and-join in Beam

2020-02-14 Thread Luke Cwik
So you have your input PCollection containing KV that has been windowed for lets say sliding windows 5 mins and you have your PCollection after the CombinePerKey stream containing KV that has also been windowed using sliding windows of 5 mins. You can use CoGroupByKey[1] to join these two PCollect

Re: GCS numShards doubt

2020-02-14 Thread Luke Cwik
Prefer to never specify num shards since this allows the runner the greatest flexibility in how it executes and is the most performant as well. Increasing num shards enables more workers to do the work in parallel but there is no guarantee that it will be significantly faster since you could have

Re: GCS numShards doubt

2020-02-14 Thread Robert Bradshaw
To let Dataflow choose the optimal number shards and maximize performance, it's often significantly better to simply leave it unspecified. A higher numShards only helps if you have at least that many workers. On Thu, Feb 13, 2020 at 10:24 PM vivek chaurasiya wrote: > > hi folks, I have this in co

Re: Beam SQL Nested Rows are flatten by Calcite

2020-02-14 Thread Rui Wang
Calcite has improved to reconstruct ROW back in the output. See [1]. Beam need to update Calcite dependency to > 1.21 to adopt that. [1]: https://jira.apache.org/jira/browse/CALCITE-3138 -Rui On Thu, Feb 13, 2020 at 9:05 PM Talat Uyarer wrote: > Hi, > > I am trying to Beam SQL. But somethin

Re: Beam SQL Nested Rows are flatten by Calcite

2020-02-14 Thread Talat Uyarer
Thank you for your response. I saw it and applied patch on calcite 1.20. However I realized BeamCalRel does not generate right code [1]to turn back Beam types. I am working on that now. Please let me know if apache beam support nested row types but I miss it. [1] https://github.com/apache/beam/bl

Re: Beam SQL Nested Rows are flatten by Calcite

2020-02-14 Thread Rui Wang
Nested row types might be less well supported (r.g. Row) because they were flattened before anyway. -Rui On Fri, Feb 14, 2020 at 12:14 PM Talat Uyarer wrote: > Thank you for your response. > I saw it and applied patch on calcite 1.20. However I realized BeamCalRel > does not generate right cod

Re: Apache Beam with Hive

2020-02-14 Thread Tomo Suzuki
I see the Beam dependency 2.15 is a bit outdated. Would you try 2.19.0? On Thu, Feb 13, 2020 at 3:52 AM Gershi, Noam wrote: > > Hi > Pom attached. > > -Original Message- > From: [google.com] Tomo Suzuki > Sent: Wednesday, February 12, 2020 4:56 PM > To: user@beam.apache.org > Subject: Re

Re: Beam SQL Nested Rows are flatten by Calcite

2020-02-14 Thread Talat Uyarer
Do you mean they were flattened before by calcite or Does beam flatten them too ? On Fri, Feb 14, 2020 at 1:21 PM Rui Wang wrote: > Nested row types might be less well supported (r.g. Row) because they were > flattened before anyway. > > > -Rui > > On Fri, Feb 14, 2020 at 12:14 PM Talat Uyarer