Re: Issue while submitting python beam pipeline on flink - local

2020-06-01 Thread Maximilian Michels
The logs indicate that you are not running the Docker-based execution but the `LOOPBACK` mode. In this mode the Flink cluster needs to connect to the machine that started the pipeline. That will not be possible unless you are running the Flink cluster on the same machine (we bind to `localhost` whi

RE: Issue while submitting python beam pipeline on flink - local

2020-06-01 Thread Ashish Raghav
Hello Michels, I am following below documentation and doing local testing only. Both job-server and flink cluster are on the same machine. The pipeline too is submitted from the same machine. I will drop the `--environment_type=LOOPBACK` flag and test again. https://beam.apache.org/documentatio

Re: Cross-Language pipeline fails with PROCESS SDK Harness

2020-06-01 Thread Alexey Romanenko
Thanks! It was an issue with a setting virtualenv for a worker console where it should be running. It would be useful to print out such errors with Error level log, I think. > On 29 May 2020, at 18:55, Kyle Weaver wrote: > > That's probably a problem with your worker. You'll need to get additi

Re: Cross-Language pipeline fails with PROCESS SDK Harness

2020-06-01 Thread Kyle Weaver
> It would be useful to print out such errors with Error level log, I think. I agree, using environment_type=PROCESS is difficult enough without hiding the logs by default. I re-opened the issue. On Mon, Jun 1, 2020 at 11:01 AM Alexey Romanenko wrote: > Thanks! It was an issue with a setting *v

Re: Cross-Language pipeline fails with PROCESS SDK Harness

2020-06-01 Thread Chamikara Jayalath
To clarify, is the error resolved with the cross-language transform as well ? If not please file a Jira. On Mon, Jun 1, 2020 at 8:24 AM Kyle Weaver wrote: > > It would be useful to print out such errors with Error level log, I > think. > > I agree, using environment_type=PROCESS is difficult eno

Re: Cross-Language pipeline fails with PROCESS SDK Harness

2020-06-01 Thread Alexey Romanenko
Yes, I tested it with the cross-language transform (Java pipeline with Python external transform). > On 1 Jun 2020, at 17:49, Chamikara Jayalath wrote: > > To clarify, is the error resolved with the cross-language transform as well ? > If not please file a Jira. > > On Mon, Jun 1, 2020 at 8:2

Re: Cross-Language pipeline fails with PROCESS SDK Harness

2020-06-01 Thread Chamikara Jayalath
Great. Thanks. On Mon, Jun 1, 2020 at 9:14 AM Alexey Romanenko wrote: > Yes, I tested it with the cross-language transform (Java pipeline with > Python external transform). > > On 1 Jun 2020, at 17:49, Chamikara Jayalath wrote: > > To clarify, is the error resolved with the cross-language trans

Exceptions PubsubUnbounded source ACK

2020-06-01 Thread KV 59
Hi, I have a Dataflow pipeline with PubSub UnboundedSource, The pipeline transforms the data and writes to another PubSub topic. I have a question regarding exceptions in DoFns. If I chose to ignore an exception processing an element, does it ACK the bundle? Also if I were to just throw the excep

Re: Exceptions PubsubUnbounded source ACK

2020-06-01 Thread Jeff Klukas
Is this a Python or Java pipeline? I'm familiar with PubsubIO in Java, though I expect the behavior in Python is similar. It will ack messages at the first checkpoint step in the pipeline, so the behavior in your case depends on whether there is a GroupByKey operation happening before the exceptio

Re: Exceptions PubsubUnbounded source ACK

2020-06-01 Thread KV 59
Hi Jeff, Thanks for the response. Yes I have a Java pipeline and yes it is a simple transformation. While DoFns work on bundles and if a single element in the bundle fails and we ignore the error on the single element, then the bundle is considered still successfully processed am I correct? Then i

Re: Exceptions PubsubUnbounded source ACK

2020-06-01 Thread Jeff Klukas
Correct. If you ignore the error on the single element, the corresponding PubSub message will be ack'd just like everything else in the bundle. PubsubIO provides no handle for preventing acks per-message. In practice, if you have some messages that cause errors that are not retryable, you may want

Re: Exceptions PubsubUnbounded source ACK

2020-06-01 Thread KV 59
Thanks a lot Jeff On Mon, Jun 1, 2020 at 11:17 AM Jeff Klukas wrote: > Correct. If you ignore the error on the single element, the corresponding > PubSub message will be ack'd just like everything else in the bundle. > PubsubIO provides no handle for preventing acks per-message. > > In practice,

Re: Pipeline Processing Time

2020-06-01 Thread Talat Uyarer
Sorry for the late response. Where does the beam set that timestamp field on element ? Is it set whenever KafkaIO reads that element ? And also I have a windowing function on my pipeline. Does the timestamp field change for any kind of operation ? On pipeline I have the following steps: KafkaIO ->

Re: Pipeline Processing Time

2020-06-01 Thread Luke Cwik
You can configure KafkaIO to use some data from the record as the elements timestamp. See the KafkaIO javadoc around the TimestampPolicy[1], the default is current processing time. You can access the timestamp of the element by adding "org.joda.time.Instant timestamp" as a parameter to your @Proces

Re: Need suggestion/help for use case (usage of the side input pattern and sliding window)

2020-06-01 Thread Luke Cwik
Your allowed lateness is 360 days and since the trigger you have doesn't emit speculative results, you'll have to wait till the watermark advances to the end of windows timestamp + 360 days before something is output from the grouping aggregation/available at the side input. On Sat, May 30, 2020

KafkaIO write in case on topic name present in PCollection

2020-06-01 Thread Mohil Khare
Hello everyone, Does anyone know if it is possible to provide a topic name embedded in a PCollection object to kafkaIO while writing ? We have a use case where we have a team specific kafka topic for eg teamA_topicname, teamB_topicname. >From beam, we create PCollection> and we need to send thi

Re: Need suggestion/help for use case (usage of the side input pattern and sliding window)

2020-06-01 Thread Mohil Khare
Thanks Luke for your reply. I see. I am trying to recall why I added allowedLateness as 360 days. Anyways I will try without that. But do you think the approach I am using to keep getting a running score in a sliding window and then using it as a side input to decorate the main log is correct ? O