How to configure external service for Kafka IO to run the flink job in k8s

2022-06-06 Thread Lydian
Hi Folks, I am trying to set up the Beam environment to run our Python pipeline which reads data from Kafka. According to some previous thread , it seems like the Java SDK harness doesn't work with PROCESS environment_type, and thu

UNIMPLEMENTED method: org.apache.beam.model.fn_execution.v1.BeamFnExternalWorkerPool/StartWorker

2022-09-20 Thread Lydian
ask.java:766) ~[flink-dist_2.12-1.14.5.jar:1.14.5] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575) ~[flink-dist_2.12-1.14.5.jar:1.14.5] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292] ``` Does anyone know how I could fix this issue? Thanks! Sincerely, Lydian Lee

Re: UNIMPLEMENTED method: org.apache.beam.model.fn_execution.v1.BeamFnExternalWorkerPool/StartWorker

2022-09-20 Thread Lydian
lExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ``` Does anyone know which things I would also need to configure? Thanks Sincerely, Lydian Lee On Tue, Sep 20, 2022 at 1:57 AM Jan Lukavský wrote: > Hi Lydian, > > there are two parts involved. > &g

Re: UNIMPLEMENTED method: org.apache.beam.model.fn_execution.v1.BeamFnExternalWorkerPool/StartWorker

2022-09-21 Thread Lydian
uot;/opt/apache/beam/java_boot\"} ``` I am not familiar with Java, would you mind recommend me on how to pack all the dependencies (or how do I find what are the dependencies) in a single jar? Thanks! Sincerely, Lydian Lee On Wed, Sep 21, 2022 at 1:11 AM Jan Lukavský wrote: > Hi Lydi

Re: UNIMPLEMENTED method: org.apache.beam.model.fn_execution.v1.BeamFnExternalWorkerPool/StartWorker

2022-09-23 Thread Lydian
;, ]) ``` which according to the doc is going to build the shadow jar for me, but that also failed with the same error unless I ensure the task manager and the job server are using the same PVC. Sincerely, Lydian Lee On Wed, Sep 21, 2022 at 2:14 PM Lydian wrote: > Yes, the logs is on

ReadFromKafka doesn't progress

2022-09-30 Thread Lydian
also found that there are multiple log lines saying: ``` Resetting offset for partition test_topic-1 to offset 8706. ``` the reset offset keeps the same all the time, which looks like the kafka IO never actually progress? But I also unable to find any other error log as well. Wondering if anyone can help me with this issue ? Thanks Sincerely, Lydian Lee

Re: Java + Python Xlang pipeline

2022-10-11 Thread Lydian
to PROCESS, I don't need to use DinD or DooD at all. You can also find my full settings in: https://stackoverflow.com/a/74035035/19259600 Sincerely, Lydian Lee On Tue, Oct 11, 2022 at 11:34 AM Xiao Ma wrote: > The worker pool means `starting a java or python sdk`, to accept t

Re: KafkaIO not write checkpoints with 0 lens

2022-10-13 Thread Lydian
24:34,597 DEBUG org.apache.hadoop.fs.s3a.S3AFileSystem.Progress [] - PUT flink/expansion-service-example/savepoints/: 0 bytes ``` Sincerely, Lydian Lee On Thu, Oct 13, 2022 at 1:34 PM Lydian wrote: > Hi, > > I have a simple python pipeline running on top of flink runner,

KafkaIO not write checkpoints with 0 lens

2022-10-13 Thread Lydian
.apache.hadoop.fs.s3a.S3AStorageStatistics [] - object_delete_requests += 1 -> 5 2022-10-05 17:24:34,597 DEBUG org.apache.hadoop.fs.s3a.S3AFileSystem.Progress [] - PUT flink/expansion-service-example/savepoints/: 0 bytes ``` Sincerely, Lydian Lee

Re: KafkaIO not write checkpoints with 0 lens

2022-10-14 Thread Lydian
g the same thing and how do I avoid this happening? Thanks! Sincerely, Lydian Lee On Thu, Oct 13, 2022 at 1:35 PM Lydian wrote: > oops, the log seems not formatting well: > ``` > > 2022-10-05 17:05:32,027 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph []

Use s3 as artifacts_dir for portable flink runner?

2022-10-21 Thread Lydian
g my pipeline. ``` However, the job is able to run, but I couldn't find the artifacts in my s3 bucket, and looks like the artifacts are still stored in the default `/tmp/beam-artifact-staging` . Wondering how I can configure to use the s3 as artifact directory? Thanks! Sincerely, Lydian Lee

Local Test Beam Portable Flink Runner in mac

2022-11-14 Thread Lydian
the Java Worker talk to the "Host" with the random port. I know the problem could be easily fixed by setting up network_mode to host. However, we are using Mac for development, and the host network is not supported for Docker on Mac. Wondering if anyone tried the same thing before and if

Re: Python pipeline: unable to read from Kafka in a Flink cluster with the expansion service

2023-01-06 Thread Lydian
ut a day, but it would be totally working well if I add this use_deprecated_read. Sincerely, Lydian Lee On Thu, Jan 5, 2023 at 2:37 AM Florentin Hennecker wrote: > Hello, > > I’ve been trying to write a pipeline in Python with apache-beam==2.38.0, > which starts with a ReadFromKafka

KafkaIo Metrics

2023-01-18 Thread Lydian
ze? Thanks! Sincerely, Lydian Lee

Beam app submit randomly failed with java.lang.ClassNotFoundException: nonapi.io.github.classgraph.utils.URLPathEncoder

2023-01-26 Thread Lydian
ThenScan(Scanner.java:1112) ~[?:?] at io.github.classgraph.Scanner.call(Scanner.java:1146) ~[?:?] at io.github.classgraph.Scanner.call(Scanner.java:83) ~[?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_342] ... 3 more ``` Thanks for helping! Sincerely, Lydian Lee

Backup event from Kafka to S3 in parquet format every minute

2023-02-16 Thread Lydian
afka (Unbounded) to S3. Thanks! btw, I am running with portableRunner with Flink. Beam Version is 2.41.0 (the latest version seems to have the same code) and Flink version is 1.14.5 Sincerely, Lydian Lee

Re: Backup event from Kafka to S3 in parquet format every minute

2023-02-22 Thread Lydian
Sincerely, Lydian Lee On Fri, Feb 17, 2023 at 2:12 AM Pavel Solomin wrote: > For me this use-case worked with the following window definition, which > was a bit of try-and-fail, and I can't claim I got 100% understanding of > windowing logic. > > Here's my java code for

How Beam Pipeline Handle late events

2023-04-18 Thread Lydian
can do similar thing in Beam as well? Would someone provide some code example? Could someone help us debugging this? Thanks! --- * Flink's documentation about late event as side output: https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/windows/#getting-late-data-as-a-side-output Sincerely, Lydian Lee

Re: How Beam Pipeline Handle late events

2023-04-19 Thread Lydian
allel (Beam vs Spark) we are seeing some days with no event > loss and some days with some, but it's always less than 0.05% > > > > > > On Wed, Apr 19, 2023 at 8:07 AM Pavel Solomin > wrote: > >> > >> Hello Lydian, > >> Do you always observe da

Re: [Question] check if pipeline is still running in pipeline runner

2023-07-07 Thread Lydian
which initializes an AppName. This seems promising to me, but it is >> written in Java and I am looking to do it in Python. >> >> >> Is there a way to specify the Flink job name via the Beam Python SDK? Or >> is there a simpler way to know that a particular Beam pipeline is running, >> and therefore not resubmit it? >> >> >> Please let me know if you have any suggestions - either about how to >> execute the approaches I've described or if there's a simpler solution that >> I am overlooking. Thank you for your help! >> >> >> Best, >> >> Adlae D'Orazio >> >> -- Sincerely, Lydian Lee

How to debug ArtifactStagingService ?

2024-01-03 Thread Lydian
the ArtifactStagingService, but I am having trouble identifying the root cause. Wondering if someone would be able to help me figure out how to pull more informative debug logging to fix this issue. Thanks! Sincerely, Lydian Lee

Monitor and limit Beam Harness Memory usage with Flink Runner

2024-02-07 Thread Lydian
runner related to the resource hint. Wondering if that is the best way for us to fix the memory usage or is there any other approach that we can do to avoid the OOM on python task runs on flink runner? Thanks Sincerely, Lydian Lee

Re: Monitor and limit Beam Harness Memory usage with Flink Runner

2024-02-07 Thread Lydian
We are actually in an even older version: 2.41.0 and flink 1.14.5 Sincerely, Lydian Lee On Wed, Feb 7, 2024 at 12:40 PM Valentyn Tymofieiev via user < user@beam.apache.org> wrote: > Hi Lydian, > > note that there was a memory leak in certain versions of Beam: > https://gith

Request to join slack channel

2024-02-20 Thread Lydian Lee
Hi, Can I get the invitation to join slack channel ? The ASF slack seems required invitation to be able to join. Thanks

how to enable debugging mode for python worker harness

2024-03-17 Thread Lydian Lee
5 ``` The full settings is available in: https://github.com/lydian/beam-python-flink-runner-examples Thanks for your help

Re: how to enable debugging mode for python worker harness

2024-03-17 Thread Lydian Lee
the flink task manager. Otherwise, it > won't be able to launch a Docker container for reading kafka messages. > > Cheers, > Jaehyeon > > > On Sun, 17 Mar 2024 at 18:21, Lydian Lee wrote: > >> Hi, >> >> I have an issue when setting up a POC of Pytho

Re: how to enable debugging mode for python worker harness

2024-03-17 Thread Lydian Lee
also runs well, > > Maybe this is related to your Mac setting? > > On Sun, Mar 17, 2024 at 8:34 AM Lydian Lee > wrote: > >> >> Hi, >> >> Just FYI, the similar things works on a different image with the one I >> built using my company’s image as base im

Creating custom metrics with labels (python SDK + Flink Runner)

2024-03-25 Thread Lydian Lee
Hi, I am using beam python SDK with flink runner, and I am trying to add custom labels to the metrics. It seems like the provided function (link ) doesn't allow me to add labels. ``` @staticmethod def co

Re: how to enable debugging mode for python worker harness

2024-03-31 Thread Lydian Lee
one. Thanks! On Mon, Mar 18, 2024 at 6:24 PM XQ Hu via user wrote: > I did not do anything special but ran `docker-compose -f > docker-compose.yaml up` from your repo. > > On Sun, Mar 17, 2024 at 11:38 PM Lydian Lee > wrote: > >> Hi XQ, >> >> The code is sim

Beam dropping events from Kafka after reshuffle ?

2024-09-17 Thread Lydian Lee
Hi, We are using Beam Python SDK with Flink Runner, the Beam version is 2.41.0 and the Flink version is 1.15.4. We have a pipeline that has 2 stages: 1. read from kafka and fixed window for every 1 minute 2. aggregate the data for the past 1 minute and reshuffle so that we have less partition cou

Re: Beam dropping events from Kafka after reshuffle ?

2024-09-18 Thread Lydian Lee
12:33 AM Jan Lukavský wrote: > Hi Lydian, > > in that case, there is only a generic advice you can look into. Reshuffle > is a stateless operation that should not cause dropping data. A GroupByKey > on the other hand is stateful and thus can - when dealing with late data - >

Re: Beam dropping events from Kafka after reshuffle ?

2024-09-18 Thread Lydian Lee
he events into S3 with parquet format ) Thanks! On Wed, Sep 18, 2024 at 3:16 PM Reuven Lax via user wrote: > How are you doing this aggregation? > > On Wed, Sep 18, 2024 at 3:11 PM Lydian Lee > wrote: > >> Hi Jan, >> >> Thanks for the recommendation. In our c

Re: Beam dropping events from Kafka after reshuffle ?

2024-09-23 Thread Lydian Lee
sion="SNAPPY", use_deprecated_int96_timestamps=True ) as parquet_writer: merged_tables = pa.concat_tables(tables) parquet_writer.write_table(merged_tables) On Fri, Sep 20, 2024 at 12:02 AM Jan Lukavský wrote: > Hi Lydian, > > because you do not specify 'timestamp_policy' it

Re: Beam dropping events from Kafka after reshuffle ?

2024-09-23 Thread Lydian Lee
times. The duplicates might > also appear different if the Iterables are slightly different on retries, > especially in the case when Flink restarts a checkpoint. > > Reuven > > On Mon, Sep 23, 2024 at 1:40 PM Lydian Lee > wrote: > >> Hi Jan, >> >> Thanks so

Re: Beam dropping events from Kafka after reshuffle ?

2024-09-19 Thread Lydian Lee
t;Adding 'trigger_processing_time' timestamp" >> beam.Map(lambda event: > window.TimestampedValue(event, time.time())) > > This does not change the assigned timestamp of an element, but creates a > new element which contains processing time. It will not be used for &g

Re: Beam dropping events from Kafka after reshuffle ?

2024-09-17 Thread Lydian Lee
I would love to, but there are some limitations on our ends that the version bump won’t be happened soon. Thus I need to figure out what might be the root cause though. On Tue, Sep 17, 2024 at 11:26 PM Jan Lukavský wrote: > Hi Lydian, > > 2.41.0 is quite old, can you please try curren

Re: Beam dropping events from Kafka after reshuffle ?

2024-09-25 Thread Lydian Lee
ly better to rely on the timestamps > assigned by the source rather than reassigning them, as they should already > be processing timestamps. > > I also have a question for the Beam developers, or anyone who might know: > > Assuming that Lydian does not reassign processing timesta

Unable to run in python-worker-harness after bumpping from 2.41.0 to 2.60.0

2025-01-26 Thread Lydian Lee
Hi, We are trying to bump an old pipeline using flink runner and beam python SDK. The version changes are: - Flink: 1.15.4 -> 1.18.0 - Beam: 2.41.0 -> 2.60.0 - python: 3.9. (no change) However, after bumping the version, we noticed that the python-worker-harness is unable to process function prop

Re: Unable to run in python-worker-harness after bumpping from 2.41.0 to 2.60.0

2025-01-27 Thread Lydian Lee
Hi, It seems like there's also similar issue from others: https://github.com/apache/beam/issues/29683 For me, here's how I launch the job with the following argos: --runner=portableRunner --streaming --environment_type=EXTERNAL --environment_config=localhost:5 --experiments=disable_logging_su

Re: Unable to run in python-worker-harness after bumpping from 2.41.0 to 2.60.0

2025-01-29 Thread Lydian Lee
apache/beam/blob/master/sdks/python/container/boot.go#L253 > . > > It may be due to failure of the Python harness process. > > On Tue, 28 Jan 2025 at 06:22, Lydian Lee wrote: > >> Hi, >> >> It seems like there's also similar issue from others: >> htt