Re: RabbitMqIO issues and open PRs

2019-10-31 Thread Jean-Baptiste Onofré
Hi, I just provided feedback in the PRs. Let me know if you want to chat about some initial implementation (as I'm the original author of the IO, I remember some discussion in the past ;) ). Regards JB On 31/10/2019 21:38, Daniel Robert wrote: > I'm pretty new to the Beam ecosystem, so apologie

Re: RabbitMqIO issues and open PRs

2019-10-31 Thread Jean-Baptiste Onofré
Hi Daniel, I will review and work on this IO. Sorry for the delay, I wasn't super active on Beam recently, but now back ;) Regards JB On 31/10/2019 21:38, Daniel Robert wrote: > I'm pretty new to the Beam ecosystem, so apologies if this is not the > right forum for this. > > My team has been l

Re: RabbitMqIO issues and open PRs

2019-10-31 Thread Eugene Kirpichov
Regarding review latency, FWIW I'm not super active on Beam these days but I'll be happy to review 1-2 PRs for RabbitMqIO (I'm @jkff). On Thu, Oct 31, 2019 at 8:47 PM Kenneth Knowles wrote: > Yes, thanks for emailing! We very much value sharing your intentions with > the community. For small cha

Re: aggregating over triggered results

2019-10-31 Thread Aaron Dixon
First of all thank you for taking the time on this very clear and helpful message. Much appreciated. >I suppose one could avoid doing any pre-aggregation, and emit all of the events (with reified timestamp) in 60/30-day windows, then have a DoFn that filters on the events and computes each of the

Re: RabbitMqIO issues and open PRs

2019-10-31 Thread Kenneth Knowles
Yes, thanks for emailing! We very much value sharing your intentions with the community. For small changes or fixes, you can just open a PR. For larger changes that could use feedback from the community (versus just the code reviewer) this list is the right place to go. If it is truly complex, a sh

published containers overwrite locally built containers

2019-10-31 Thread Heejong Lee
Hi, happy halloween! I'm looking into failing cross language post commit tests: https://issues.apache.org/jira/browse/BEAM-8534 After a few runs, I've found that published SDK harness containers overwrite locally built containers when do

Re: Strict timer ordering in Samza and Portable Flink Runners

2019-10-31 Thread Kenneth Knowles
It is because Dataflow does not support TestStream, so one test is disabled, and because the other test has only bounded inputs it is run in batch mode. In this case we need to do either: force streaming mode on Dataflow or have an unbounded input. We used to run two Validates Runner suites, where

Re: Triggers still finish and drop all data

2019-10-31 Thread Kenneth Knowles
Opened https://github.com/apache/beam/pull/9960 for this idea. This will alert users to broken pipelines and force them to alter them. Kenn On Thu, Oct 31, 2019 at 2:12 PM Kenneth Knowles wrote: > On Thu, Oct 31, 2019 at 2:11 AM Jan Lukavský wrote: > >> Hi Kenn, >> >> does there still remain s

Re: Triggers still finish and drop all data

2019-10-31 Thread Kenneth Knowles
On Thu, Oct 31, 2019 at 2:11 AM Jan Lukavský wrote: > Hi Kenn, > > does there still remain some use for trigger to finish? If we don't drop > data, would it still be of any use to users? If not, would it be better > to just remove the functionality completely, so that users who use it > (and it w

Re: RabbitMqIO issues and open PRs

2019-10-31 Thread Reuven Lax
I think we would be happy to see improvements and contributions to .this component. Emailing this list is definitely the right first step - it gives anyone with knowledge of the RabbitMqIO component a chance to weigh in. You don't necessarily have to talk to component authors before submitting a PR

RabbitMqIO issues and open PRs

2019-10-31 Thread Daniel Robert
I'm pretty new to the Beam ecosystem, so apologies if this is not the right forum for this. My team has been learning and starting to use Beam for the past few months and have run into myriad problems with the RabbitIO connector for java, aspects of which seem perhaps fundamentally broken or i

Outreachy projects

2019-10-31 Thread Katia Rojas
Hello folks, We notice that these two projects for the Outreachy program didn't get contributions recorded. 1. Extend the Nexmark Benchmarking Suite in Apache Beam to include Python and Portable runners 2. Improve Apache BeamSQL to allow users better write big data processing pipelines We thoug

Re: Rethinking the Flink Runner modes

2019-10-31 Thread Robert Bradshaw
Yes. If someone starts up the job server manually, they would have to manually specify LOOPBACK if they want it. Python's FlinkRunner does not use a pre-configured job server, it starts one up itself (making the default scenario simple). On Thu, Oct 31, 2019 at 10:19 AM Thomas Weise wrote: > > Tr

Re: Strict timer ordering in Samza and Portable Flink Runners

2019-10-31 Thread Jan Lukavský
That is quite strange. The timer ordering tests were quite stable on DirectRunner. Prior to the fix it failed consistently. Dataflow on the other hand seems to consistently pass. On 10/31/19 6:28 PM, Kenneth Knowles wrote: Hmm, classical Dataflow should fail.  - all user timers in a bundle pr

Re: Strict timer ordering in Samza and Portable Flink Runners

2019-10-31 Thread Kenneth Knowles
Hmm, classical Dataflow should fail. - all user timers in a bundle processed first: https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/SimpleParDoFn.java#L353 - processed in a loop that drains the StepContext

Re: Rethinking the Flink Runner modes

2019-10-31 Thread Thomas Weise
True, but it would probably be a good tradeoff to make the default scenario simple (just for FlinkRunner, not for PortableRunner). If someone configures the job server with options, they probably also know how to control the environment? On Thu, Oct 31, 2019 at 9:09 AM Maximilian Michels wrote:

Re: Python SDK timestamp precision

2019-10-31 Thread Robert Bradshaw
On Thu, Oct 31, 2019 at 1:49 AM Jan Lukavský wrote: > > This is quite an interesting idea. In some sense, timestamps become > like interval windows, and window assignment is similar to the window > mapping fns that we use for side inputs. I still think the idea of a > timestmap for an element an

Re: Strict timer ordering in Samza and Portable Flink Runners

2019-10-31 Thread Jan Lukavský
Hi, just today I noticed failures on portable dataflow [1] [2]. "Classical" dataflow seems to pass. Jan [1] https://issues.apache.org/jira/browse/BEAM-8530 [2] https://github.com/apache/beam/pull/9951 On 10/31/19 5:29 PM, Reuven Lax wrote: Have you seen these failures on Dataflow as well? F

Re: Strict timer ordering in Samza and Portable Flink Runners

2019-10-31 Thread Pablo Estrada
I remember +Reza Rokni +Maximilian Michels knowing / talking about this. Best -P. On Thu, Oct 31, 2019 at 9:30 AM Reuven Lax wrote: > Have you seen these failures on Dataflow as well? From code examination I > would expect Dataflow to have some bugs in this area as well (especially if > a time

Re: Strict timer ordering in Samza and Portable Flink Runners

2019-10-31 Thread Reuven Lax
Have you seen these failures on Dataflow as well? From code examination I would expect Dataflow to have some bugs in this area as well (especially if a timer is set while processing a bundle). If the tests are passing on Dataflow this might mean that we need different tests (or it might mean that D

Re: Rethinking the Flink Runner modes

2019-10-31 Thread Maximilian Michels
When the FlinkRunner (python client) sees flink_master as [auto] or [local], then it could set the default environment to LOOPBACK before the pipeline is constructed and provide the loopback environment. Isn't that fully client side controlled? There is the case of a pre-configured job server

Re: why are so many transformation needed for a simple TextIO.write() operation

2019-10-31 Thread Pulasthi Supun Wickramasinghe
Hi Luke, Thanks for the explanation, that does make sense, I was just curious as to why. Best Regards, Pulasthi On Wed, Oct 30, 2019 at 1:40 PM Luke Cwik wrote: > A lot of the logic is around handling various error scenarios. > > You should notice that the majority of that graph is about passi

Re: Rethinking the Flink Runner modes

2019-10-31 Thread Thomas Weise
On Thu, Oct 31, 2019 at 3:55 AM Maximilian Michels wrote: > > Thanks for clarifying. So when I run "./flink my_pipeline.jar" or > > upload the jar via the REST API (and its main method invoked on the > > master) then [auto] reads the config and does the right thing, but if > > I do java my_pipel

Proposal: @RequiresTimeSortedInput

2019-10-31 Thread Jan Lukavský
Hi, as a follow-up from previous design draft, I'd like to promote the document [1] and associated PR [2] to proposal. The PR contains working implementation for:  - non-portable batch flink and batch spark (legacy)  - all non-portable streaming runners that use StatefulDoFnRunner (direct,

Re: Rethinking the Flink Runner modes

2019-10-31 Thread Maximilian Michels
Thanks for clarifying. So when I run "./flink my_pipeline.jar" or upload the jar via the REST API (and its main method invoked on the master) then [auto] reads the config and does the right thing, but if I do java my_pipeline.jar it'll run locally. Correct. Python needs to know even whether t

Re: Triggers still finish and drop all data

2019-10-31 Thread Jan Lukavský
Hi Kenn, does there still remain some use for trigger to finish? If we don't drop data, would it still be of any use to users? If not, would it be better to just remove the functionality completely, so that users who use it (and it will possibly break for them) are aware of it at compile time?

Re: Python SDK timestamp precision

2019-10-31 Thread Jan Lukavský
> I resolve this theoretical issue by working in a space with a single time dimension and zero spatial dimensions. That is, the location of an event is completely determined by a single coordinate in time, and distance and causality are hence well defined for all possible observers :) Yeah, th