Issues running Kafka streaming pipeline in Python

2021-06-01 Thread Alex Koay
Hi all, I have created a simple snippet as such: import apache_beam as beam from apache_beam.io.kafka import ReadFromKafka from apache_beam.options.pipeline_options import PipelineOptions import logging logging.basicConfig(level=logging.WARNING) opts = direct_opts with beam.Pipeline(options=Pip

Re: Issues running Kafka streaming pipeline in Python

2021-06-02 Thread Alex Koay
steps. This is tracked in > https://issues.apache.org/jira/browse/BEAM-11998. > > Thanks, > Cham > > On Wed, Jun 2, 2021 at 8:28 AM Ahmet Altay wrote: > >> /cc @Boyuan Zhang for kafka @Chamikara Jayalath >> for multi language might be able to help. >&g

Re: Issues running Kafka streaming pipeline in Python

2021-06-02 Thread Alex Koay
CC-ing Chamikara as he got omitted from the reply all I did earlier. On Thu, Jun 3, 2021 at 12:43 AM Alex Koay wrote: > Yeah, I figured it wasn't supported correctly on DirectRunner. Stumbled > upon several threads saying so. > > On Dataflow, I've encountered a few dif

Re: Issues running Kafka streaming pipeline in Python

2021-06-02 Thread Alex Koay
ort>. > > Thanks, > Cham > > On Wed, Jun 2, 2021 at 9:45 AM Alex Koay wrote: > >> CC-ing Chamikara as he got omitted from the reply all I did earlier. >> >> On Thu, Jun 3, 2021 at 12:43 AM Alex Koay wrote: >> >>> Yeah, I figured it wasn

Debugging External Transforms on Dataflow (Python)

2021-06-15 Thread Alex Koay
Several questions: 1. Is there any way to set the log level for the Java workers via a Python Dataflow pipeline? 2. What is the easiest way to debug an external transform in Java? My main pipeline code is in Python. 3. Are there any edge cases with the UnboundedSourceWrapperFn SDF that I should

Re: Debugging External Transforms on Dataflow (Python)

2021-06-15 Thread Alex Koay
share the implementation of the source and the pipeline, that > would be really helpful. > > +Lukasz Cwik for awareness. > > On Tue, Jun 15, 2021 at 9:50 AM Chamikara Jayalath > wrote: > >> >> >> On Tue, Jun 15, 2021 at 3:20 AM Alex Koay wrote: >

Re: Debugging External Transforms on Dataflow (Python)

2021-06-16 Thread Alex Koay
ad version even when using 2.24.0 (I'm not entirely sure why this is the case). I feel like I'm almost at the verge of fixing the problem, but at this point I'm still far from it. On Wed, Jun 16, 2021 at 11:24 AM Alex Koay wrote: > 1. I'm building a streaming pipeline. &

Re: Debugging External Transforms on Dataflow (Python)

2021-06-17 Thread Alex Koay
process and instead reusing existing sources over and over I'd be happy to send pull requests to help fix this issue but perhaps will need some direction as to how I should fix this. On Wed, Jun 16, 2021 at 8:32 PM Alex Koay wrote: > Alright, some updates. > > Using DirectRunner hel

Re: Debugging External Transforms on Dataflow (Python)

2021-06-17 Thread Alex Koay
to do two things: > 1) Finalize the current bundle > 2) Schedule the continuation for the checkpoint mark > > Based upon your description it looks like for some reason the runner is > unable to complete the current bundle. > > On Thu, Jun 17, 2021 at 2:48 AM Alex Koay wrote: &g

Re: Debugging External Transforms on Dataflow (Python)

2021-06-24 Thread Alex Koay
> 2: > https://github.com/apache/beam/blob/7494d1450f2fc19344c385435496634e4b28a997/runners/direct-java/src/main/java/org/apache/beam/runners/direct/EvaluationContext.java#L157 > > On Thu, Jun 17, 2021 at 10:42 AM Alex Koay wrote: > >> Could you be referring to this part

Re: Debugging External Transforms on Dataflow (Python)

2021-06-24 Thread Alex Koay
rt of this PR (https://github.com/apache/beam/pull/13592), do you know if there was something behind this? Thanks! Cheers Alex On Thu, Jun 24, 2021 at 4:01 PM Alex Koay wrote: > Good news for anybody who's following this. > I finally had some time today to look into the problem again w

DataflowRunner with External Transform creates UnboundedSourceAsSDFWrapperFn repeatedly

2021-07-11 Thread Alex Koay
Can any Dataflow experts / Googlers enlighten me as to why this happens? I shimmed the FnHarness for a Python pipeline with a Java external transform, and it seems that the ProcessBundleHandler receives different process-bundle-descriptor-%d for the same processor. This leads to the system defeatin

Re: When to use Pipeline object

2021-07-11 Thread Alex Koay
>From my understanding, you need the Pipeline for mainly two things: 1. Marking the start of any processing flows (it serves as the PBegin "PCollection") so any sources that follows it will run. 2. Running / executing / deploying the pipeline -- this happens automatically with the context manager i