Contributing to Beam

2019-05-03 Thread Shehzaad Nakhoda
Hello I’m hoping to work with Rueven Lax (Google) on some enhancements and existing issues. I would appreciate the ability to create and assign tickets to myself. My JIRA ID is shehzaadn. Thanks in advance! -- [image: VentureDive] *Shehzaad Nakhoda* Chief Technology Officer shehz...@ventured

Re: Better naming for runner specific options

2019-05-03 Thread Reza Rokni
Great point Lukasz, worker machine could be relevant to multiple runners. Perhaps for parameters that could have multiple runner relevance, the doc could be rephrased to reflect its potential multiple uses. For example change the help information to start with a generic reference " worker type on

Re: [ANNOUNCE] New committer announcement: Udi Meiri

2019-05-03 Thread Heejong Lee
Congratulations! On Fri, May 3, 2019 at 3:53 PM Reza Rokni wrote: > Congratulations ! > > *From: *Reuven Lax > *Date: *Sat, 4 May 2019, 06:42 > *To: *dev > > Thank you! >> >> On Fri, May 3, 2019 at 3:15 PM Ankur Goenka wrote: >> >>> Congratulations Udi! >>> >>> On Fri, May 3, 2019 at 3:00 PM C

Kotlin iterator error

2019-05-03 Thread Ankur Goenka
Hi, A beam user on stackoverflow has posted issue while using kotlin sdk. https://stackoverflow.com/questions/55908999/kotlin-iterable-not-supported-in-apache-beam/55911859#55911859 I am not very familiar with kotlin so can someone please take a look. Thanks, Ankur

Re: [ANNOUNCE] New committer announcement: Udi Meiri

2019-05-03 Thread Reza Rokni
Congratulations ! *From: *Reuven Lax *Date: *Sat, 4 May 2019, 06:42 *To: *dev Thank you! > > On Fri, May 3, 2019 at 3:15 PM Ankur Goenka wrote: > >> Congratulations Udi! >> >> On Fri, May 3, 2019 at 3:00 PM Connell O'Callaghan >> wrote: >> >>> Well done Udi!!! Congratulations and thank you for

Re: [ANNOUNCE] New committer announcement: Udi Meiri

2019-05-03 Thread Rui Wang
Congrats! Thank you for your contributions! -Rui On Fri, May 3, 2019 at 3:45 PM Chamikara Jayalath wrote: > Congrats Udi! > > On Fri, May 3, 2019 at 3:42 PM Reuven Lax wrote: > >> Thank you! >> >> On Fri, May 3, 2019 at 3:15 PM Ankur Goenka wrote: >> >>> Congratulations Udi! >>> >>> On Fri, M

Re: [ANNOUNCE] New committer announcement: Udi Meiri

2019-05-03 Thread Chamikara Jayalath
Congrats Udi! On Fri, May 3, 2019 at 3:42 PM Reuven Lax wrote: > Thank you! > > On Fri, May 3, 2019 at 3:15 PM Ankur Goenka wrote: > >> Congratulations Udi! >> >> On Fri, May 3, 2019 at 3:00 PM Connell O'Callaghan >> wrote: >> >>> Well done Udi!!! Congratulations and thank you for your contrib

Re: [ANNOUNCE] New committer announcement: Udi Meiri

2019-05-03 Thread Reuven Lax
Thank you! On Fri, May 3, 2019 at 3:15 PM Ankur Goenka wrote: > Congratulations Udi! > > On Fri, May 3, 2019 at 3:00 PM Connell O'Callaghan > wrote: > >> Well done Udi!!! Congratulations and thank you for your contributions!!! >> >> Kenn thank you for sharing!!! >> >> On Fri, May 3, 2019 at 2:4

Re: [ANNOUNCE] New committer announcement: Udi Meiri

2019-05-03 Thread Ankur Goenka
Congratulations Udi! On Fri, May 3, 2019 at 3:00 PM Connell O'Callaghan wrote: > Well done Udi!!! Congratulations and thank you for your contributions!!! > > Kenn thank you for sharing!!! > > On Fri, May 3, 2019 at 2:49 PM Yifan Zou wrote: > >> Thanks Udi and congratulations! >> >> On Fri, May

Re: [ANNOUNCE] New committer announcement: Udi Meiri

2019-05-03 Thread Connell O'Callaghan
Well done Udi!!! Congratulations and thank you for your contributions!!! Kenn thank you for sharing!!! On Fri, May 3, 2019 at 2:49 PM Yifan Zou wrote: > Thanks Udi and congratulations! > > On Fri, May 3, 2019 at 2:47 PM Robin Qiu wrote: > >> Congratulations Udi!!! >> >> *From: *Ruoyun Huang >

Re: [ANNOUNCE] New committer announcement: Udi Meiri

2019-05-03 Thread Yifan Zou
Thanks Udi and congratulations! On Fri, May 3, 2019 at 2:47 PM Robin Qiu wrote: > Congratulations Udi!!! > > *From: *Ruoyun Huang > *Date: *Fri, May 3, 2019 at 2:39 PM > *To: * > > Congratulations Udi! >> >> On Fri, May 3, 2019 at 2:30 PM Ahmet Altay wrote: >> >>> Congratulations, Udi! >>> >>

Re: [ANNOUNCE] New committer announcement: Udi Meiri

2019-05-03 Thread Robin Qiu
Congratulations Udi!!! *From: *Ruoyun Huang *Date: *Fri, May 3, 2019 at 2:39 PM *To: * Congratulations Udi! > > On Fri, May 3, 2019 at 2:30 PM Ahmet Altay wrote: > >> Congratulations, Udi! >> >> *From: *Kyle Weaver >> *Date: *Fri, May 3, 2019 at 2:11 PM >> *To: * >> >> Congratulations Udi! I

Re: [ANNOUNCE] New committer announcement: Udi Meiri

2019-05-03 Thread Ruoyun Huang
Congratulations Udi! On Fri, May 3, 2019 at 2:30 PM Ahmet Altay wrote: > Congratulations, Udi! > > *From: *Kyle Weaver > *Date: *Fri, May 3, 2019 at 2:11 PM > *To: * > > Congratulations Udi! I look forward to sending you all my reviews for >> the next month (just kidding :) >> >> Kyle Weaver |

Re: [DISCUSS][SQL] Providing support for DISTINCT aggregations

2019-05-03 Thread Rui Wang
A compromise solution would be using SELECT DISTINCT or GROUP BY to duplicate before apply aggregations. It's two shuffles and works on non floating point columns. The good thing is no code change is needed, but downsides are users need to write more complicated query and floating point data is not

Re: [ANNOUNCE] New committer announcement: Udi Meiri

2019-05-03 Thread Ahmet Altay
Congratulations, Udi! *From: *Kyle Weaver *Date: *Fri, May 3, 2019 at 2:11 PM *To: * Congratulations Udi! I look forward to sending you all my reviews for > the next month (just kidding :) > > Kyle Weaver | Software Engineer | github.com/ibzib | > kcwea...@google.com | +1650203 > > On Fri,

Re: [ANNOUNCE] New committer announcement: Udi Meiri

2019-05-03 Thread Kyle Weaver
Congratulations Udi! I look forward to sending you all my reviews for the next month (just kidding :) Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com | +1650203 On Fri, May 3, 2019 at 1:52 PM Charles Chen wrote: > > Thank you Udi! > > On Fri, May 3, 2019, 1:51 PM Aiz

Re: [ANNOUNCE] New committer announcement: Udi Meiri

2019-05-03 Thread Charles Chen
Thank you Udi! On Fri, May 3, 2019, 1:51 PM Aizhamal Nurmamat kyzy wrote: > Congratulations, Udi! Thank you for all your contributions!!! > > *From: *Pablo Estrada > *Date: *Fri, May 3, 2019 at 1:45 PM > *To: *dev > > Thanks Udi and congrats! >> >> On Fri, May 3, 2019 at 1:44 PM Kenneth Knowles

Re: [ANNOUNCE] New committer announcement: Udi Meiri

2019-05-03 Thread Aizhamal Nurmamat kyzy
Congratulations, Udi! Thank you for all your contributions!!! *From: *Pablo Estrada *Date: *Fri, May 3, 2019 at 1:45 PM *To: *dev Thanks Udi and congrats! > > On Fri, May 3, 2019 at 1:44 PM Kenneth Knowles wrote: > >> Hi all, >> >> Please join me and the rest of the Beam PMC in welcoming a new

Re: [ANNOUNCE] New committer announcement: Udi Meiri

2019-05-03 Thread Pablo Estrada
Thanks Udi and congrats! On Fri, May 3, 2019 at 1:44 PM Kenneth Knowles wrote: > Hi all, > > Please join me and the rest of the Beam PMC in welcoming a new committer: > Udi Meiri. > > Udi has been contributing to Beam since late 2017, starting with HDFS > support in the Python SDK and continuing

[ANNOUNCE] New committer announcement: Udi Meiri

2019-05-03 Thread Kenneth Knowles
Hi all, Please join me and the rest of the Beam PMC in welcoming a new committer: Udi Meiri. Udi has been contributing to Beam since late 2017, starting with HDFS support in the Python SDK and continuing with a ton of Python work. I also will highlight his work on community-building infrastructur

Re: [DISCUSS][SQL] Providing support for DISTINCT aggregations

2019-05-03 Thread Rui Wang
Fair point. It lacks of proper benchmarks for BeamSQL to test performance and scalability of implementations. -Rui On Fri, May 3, 2019 at 12:56 PM Reuven Lax wrote: > Back to the original point: I'm very skeptical of adding something that > does not scale at all. In our experience, users get f

Re: [DISCUSS][SQL] Providing support for DISTINCT aggregations

2019-05-03 Thread Reuven Lax
Back to the original point: I'm very skeptical of adding something that does not scale at all. In our experience, users get far more upset with an advertised feature that doesn't work for them (e.g. their workers OOM) than with a missing feature. Reuven On Fri, May 3, 2019 at 12:41 PM Kenneth Kno

Re: Better naming for runner specific options

2019-05-03 Thread Kenneth Knowles
Even though they are in classes named for specific runners, they are not namespaced. All PipelineOptions exist in a global namespace so they need to be careful to be very precise. It is a good point that even though they may be multiple uses for "machine type" they are probably not going to both h

Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-05-03 Thread Ismaël Mejía
For info both AvroIO ReadAll/ParseAll and TextIO ReadAll deprecations were merged into master today and will be part of 2.13.0. For those working in other SDKs (Python, Go) please pay attention to not implement such transforms (or deprecate them too if already done) to keep the API ideas coherent.

Re: [DISCUSS][SQL] Providing support for DISTINCT aggregations

2019-05-03 Thread Kenneth Knowles
All good points. My version of the two shuffle approach does not work at all. On Fri, May 3, 2019 at 11:38 AM Brian Hulette wrote: > Rui's point about FLOAT/DOUBLE columns is interesting as well. We couldn't > support distinct aggregations on floating point columns with the > two-shuffle approac

Re: kafka client interoperability

2019-05-03 Thread Moorhead,Richard
We attempted a downgrade to beam-sdks-java-io-kafka 2.9 while using 2.10 for the rest and ran into issues. I still see checks to the ConsumerSpel throughout ProducerRecordCoder and I am beginning to think this is a bug. From: Juan Carlos Garcia Sent: Thursday, M

Re: [DISCUSS][SQL] Providing support for DISTINCT aggregations

2019-05-03 Thread Brian Hulette
> As to the distinct aggregations: At the least, these queries should be rejected, not evaluated incorrectly. Absolutely agree. If we don't support DISTINCT aggregations with one of these approaches soon we should reject these queries rather than just treating them as non-distinct queries. > The t

Re: [DISCUSS][SQL] Providing support for DISTINCT aggregations

2019-05-03 Thread Rui Wang
To clarify what I said "So two shuffle approach will lead to two different implementation for tables with and without FLOAT/DOUBLE column.": Basically I wanted to say that two shuffles approach will be an implementation for some cases, and it will co-exist with CombineFn approach. In the feature,

Re: Better naming for runner specific options

2019-05-03 Thread Chamikara Jayalath
Also, we do have runner specific options classes where truly runner specific options can go. https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java https://github.com/apache/beam/blob/master/

Re: [DISCUSS][SQL] Providing support for DISTINCT aggregations

2019-05-03 Thread Rui Wang
> > > As to the distinct aggregations: At the least, these queries should be > rejected, not evaluated incorrectly. > Yes. The least is not to support it, and throws clear message to say no. (current implementation ignores DISTINCT and executes all aggregations as ALL). > The term "stateful Comb

Re: kafka client interoperability

2019-05-03 Thread Juan Carlos Garcia
Downgrade only the KafkaIO module to the version that works for you (also excluding any transient dependency of it) that works for us. JC. Lukasz Cwik schrieb am Do., 2. Mai 2019, 20:05: > +dev > > On Thu, May 2, 2019 at 10:34 AM Moorhead,Richard < > richard.moorhe...@cerner.com> wrote: > >> I

Re: Better naming for runner specific options

2019-05-03 Thread Ahmet Altay
I agree, that is a good point. *From: *Lukasz Cwik *Date: *Fri, May 3, 2019 at 9:37 AM *To: *dev The concept of a machine type isn't necessarily limited to Dataflow. If it > made sense for a runner, they could use AWS/Azure machine types as well. > > On Fri, May 3, 2019 at 9:32 AM Ahmet Altay w

Re: Better naming for runner specific options

2019-05-03 Thread Lukasz Cwik
The concept of a machine type isn't necessarily limited to Dataflow. If it made sense for a runner, they could use AWS/Azure machine types as well. On Fri, May 3, 2019 at 9:32 AM Ahmet Altay wrote: > This idea was discussed in a PR a few months ago, and JIRA was filed as a > follow up [1]. IMO,

Re: Better naming for runner specific options

2019-05-03 Thread Ahmet Altay
This idea was discussed in a PR a few months ago, and JIRA was filed as a follow up [1]. IMO, it makes sense to use a namespace prefix. The primary issue here is that, such a change will very likely be a backward incompatible change and would be hard to do before the next major version. [1] https:

Re: :beam-sdks-java-io-hadoop-input-format:test is extremely flaky

2019-05-03 Thread Alexey Romanenko
FYI: In the end, the module "hadoop-input-format” was removed in favour of using “hadoop-format” instead. > On 29 Apr 2019, at 15:50, Jean-Baptiste Onofré wrote: > > Agree, +1 > > Regards > JB > > On 29/04/2019 15:30, Ismaël Mejía wrote: >> +1 to remove it on this release, this is a maintenan

Re: kafka client interoperability

2019-05-03 Thread Alexey Romanenko
Oops, I see that Richard already created a Jira about that, so I close mine as a duplicate. > On 3 May 2019, at 15:58, Alexey Romanenko wrote: > > Thank you for reporting this. > > Seems like it’s a bug there (since ProducerRecord from kafka-clients:0.10.2.1 > doesn’t support headers), so I

Re: kafka client interoperability

2019-05-03 Thread Alexey Romanenko
Thank you for reporting this. Seems like it’s a bug there (since ProducerRecord from kafka-clients:0.10.2.1 doesn’t support headers), so I created a Jira for that: https://issues.apache.org/jira/browse/BEAM-7217 Unfortunately, I can’t reproduce

Re: [DISCUSS] Performance of Beam compare to "Bare Runner"

2019-05-03 Thread Maximilian Michels
Misread your post. You're saying that Kryo is more efficient that a roundtrip obj->bytes->obj_copy. Still, most types use Flink's serializers which also do the above roundtrip. So I'm not sure this performance advantage holds true for other Flink jobs. On 02.05.19 20:01, Maximilian Michels wro

Re: [DISCUSS] Performance of Beam compare to "Bare Runner"

2019-05-03 Thread Robert Bradshaw
On Fri, May 3, 2019 at 9:29 AM Viliam Durina wrote: > > > you MUST NOT mutate your inputs > I think it's enough to not mutate the inputs after you emit them. From this > follows that when you receive an input, the upstream vertex will not try to > mutate it in parallel. This is what Hazelcast Je

Re: [DISCUSS] Performance of Beam compare to "Bare Runner"

2019-05-03 Thread Viliam Durina
> you MUST NOT mutate your inputs I think it's enough to not mutate the inputs after you emit them. From this follows that when you receive an input, the upstream vertex will not try to mutate it in parallel. This is what Hazelcast Jet expects. We have no option to automatically clone objects after