Re: PipelineOptions at execution time from DirectRunner

2019-03-21 Thread Pablo Estrada
Thanks Ahmet! These are illustrative explanations. I still wonder about one question: > >> Getting it as pcoll.pipeline.options in the expand(self, pcoll) call is a >> possiblity, but it seems like that's not ideal. Any other suggestions? >> > Is this an appropriate way of obtaining an option tha

Re: PipelineOptions at execution time from DirectRunner

2019-03-21 Thread Ahmet Altay
On Thu, Mar 21, 2019 at 4:20 PM Pablo Estrada wrote: > Hi all, > The DirectRunner does not seem to support RuntimeValueProvider. Is there a > suggestion for DirectRunner pipelines to access arguments passed in as > pipeline options(but not necessarily passed explicitly by users) at > pipeline exe

PipelineOptions at execution time from DirectRunner

2019-03-21 Thread Pablo Estrada
Hi all, The DirectRunner does not seem to support RuntimeValueProvider. Is there a suggestion for DirectRunner pipelines to access arguments passed in as pipeline options(but not necessarily passed explicitly by users) at pipeline execution time? Getting it as pcoll.pipeline.options in the expand(

Re: [spark runner dataset POC] workCount works !

2019-03-21 Thread Kenneth Knowles
Nice milestone! On Thu, Mar 21, 2019 at 10:49 AM Pablo Estrada wrote: > This is pretty cool. Thanks for working on this and for sharing:) > Best > -P. > > On Thu, Mar 21, 2019, 8:18 AM Alexey Romanenko > wrote: > >> Good job! =) >> Congrats to all who was involved to move this forward! >> >> Bt

Re: [spark runner dataset POC] workCount works !

2019-03-21 Thread Pablo Estrada
This is pretty cool. Thanks for working on this and for sharing:) Best -P. On Thu, Mar 21, 2019, 8:18 AM Alexey Romanenko wrote: > Good job! =) > Congrats to all who was involved to move this forward! > > Btw, for all who is interested in a progress of work on this runner, I > wanted to remind t

Re: joda-time dependency version

2019-03-21 Thread Kenneth Knowles
+dev@ I don't know of any special reason we are using an old version. Kenn On Thu, Mar 21, 2019, 09:38 Ismaël Mejía wrote: > Does anyone have any context on why we have such an old version of > Joda time (2.4 released on 2014!) and if there is any possible issue > upgrading it? If not maybe w

Re: [BEAM-6862] Adding pyhamcrest library to python container

2019-03-21 Thread Ahmet Altay
This is fine, thank you for sending a note. In the future, when we are ready to make container releases along with Beam releases, we can have a cleaned up version of this container. On Thu, Mar 21, 2019 at 9:36 AM Mikhail Gryzykhin wrote: > Hi everyone, > > Recently, there was added a test for

Re: Python36/37 not installed on Beam2 and Beam12?

2019-03-21 Thread Mark Liu
Thanks for the efforts! I followed up on INFRA-17335 and slack thread. And also created https://issues.apache.org/jira/browse/INFRA-18070 to track this issue separately. Hope this can help! Mark On Wed, Mar 20, 2019 at 7:43 PM Valentyn Tymofieie

[BEAM-6862] Adding pyhamcrest library to python container

2019-03-21 Thread Mikhail Gryzykhin
Hi everyone, Recently, there was added a test for verifying metrics in python SDK ( PR-8038 ). This PR causes beam_PostCommit_Py_ValCont job to fail due to lack of pyhamcrest libr

Re: KafkaIO Exactly-Once & Flink Runner

2019-03-21 Thread Thomas Weise
Tracked as https://issues.apache.org/jira/browse/BEAM-6879 On Fri, Mar 15, 2019 at 10:13 AM Kenneth Knowles wrote: > Yes, the ParDoPayload has to contain most of the information that is on > DoFnSignature. Everything except the details for feeding the bits to the > Java DoFn. > > Kenn > > On Fr

Re: [Announcement] New Website for Beam Summits

2019-03-21 Thread Alexey Romanenko
Great initiative, thanks for creating this! Btw, are any plans to add there information about previous Beam-related events, especially London Beam summit last year? > On 20 Mar 2019, at 19:30, David Morávek wrote: > > This is great! Thanks for all of the hard work you're putting into this. >

Re: [spark runner dataset POC] workCount works !

2019-03-21 Thread Alexey Romanenko
Good job! =) Congrats to all who was involved to move this forward! Btw, for all who is interested in a progress of work on this runner, I wanted to remind that we have #beam-spark channel on Slack where we discuss all ongoing questions. Feel free to join! Alexey > On 21 Mar 2019, at 15:51, Je

Re: [spark runner dataset POC] workCount works !

2019-03-21 Thread Jean-Baptiste Onofré
Congrats and huge thanks ! (I'm glad to be one of the little "launcher" to this effort ;) ) Regards JB On 21/03/2019 15:47, Ismaël Mejía wrote: This is excellent news. Congrats Etienne, Alexey and the others involved for the great work! On Thu, Mar 21, 2019 at 3:10 PM Etienne Chauchot wrote:

Re: [spark runner dataset POC] workCount works !

2019-03-21 Thread Ismaël Mejía
This is excellent news. Congrats Etienne, Alexey and the others involved for the great work! On Thu, Mar 21, 2019 at 3:10 PM Etienne Chauchot wrote: > > Hi guys, > > We are glad to announce that the spark runner POC that was re-written from > scratch using the structured-streaming framework and

Re: User state cleanup

2019-03-21 Thread Maximilian Michels
Yes, to be extra clear: User state cleanup works correctly in the FlinkRunner. The portable Flink Runner needs a fix to behave the same way. Thanks for opening the issue Thomas. -Max On 21.03.19 14:25, Thomas Weise wrote: Created https://issues.apache.org/jira/browse/BEAM-6876 for the portab

[spark runner dataset POC] workCount works !

2019-03-21 Thread Etienne Chauchot
Hi guys, We are glad to announce that the spark runner POC that was re-written from scratch using the structured-streaming framework and the dataset API can now run WordCount ! It is still embryonic. For now it only runs in batch mode and there is no fancy stuff like state, timer, SDF, metrics,

Re: User state cleanup

2019-03-21 Thread Thomas Weise
Created https://issues.apache.org/jira/browse/BEAM-6876 for the portable Flink runner issue. On Wed, Mar 20, 2019 at 11:10 AM Kenneth Knowles wrote: > > > On Wed, Mar 20, 2019 at 6:23 AM Maximilian Michels wrote: > >> Hi, >> >> I just realized that user state acquired via StateInternals in the

PubsubIO and projectId

2019-03-21 Thread Jan Lukavský
Hi, I have come across an issue using PubsubIO with flink runner. The problem is described at [1]. I also created PR for this: [2], but there are some doubts described in comment in the JIRA issue. Would someone have time to walk through it and/or provide some insights? Thanks,  Jan [1] ht