Deployment of beam pipelines on flink cluster

2022-02-17 Thread Sigalit Eliazov
Hi all, We are currently using beam to create a few pipelines, and then deploy them on our on-prem Flink cluster. we have a few questions regarding the automation of the pipelines deployment: We have beam running as a k8s pod which starts a java process for each pipeline and has the flinkrun

KafkaIO consumer rate

2022-04-10 Thread Sigalit Eliazov
Hi all I saw a very low rate when message consuming from kafka in our different jobs. I order to find the bottleneck i created a very simple pipeline that reads string messages from kafka and just prints the output . The pipeline runs over flink cluster with the following setup: 1 task manager, 3

session window question

2022-04-27 Thread Sigalit Eliazov
Hi all i have the following scenario: a. a pipeline that reads messages from kafka and a session window with 1 minute duration. b. groupbykey in order to aggregate the data c. for each 'group' i do some calculation and build a new event to send to kafka. the output of this cycle is key1 - value1

Re: session window question

2022-04-27 Thread Sigalit Eliazov
olding the last value seen per key in > global window. There is an implementation of this approach in > Deduplicate.KeyedValues [1]. > > Jan > > [1] > > https://beam.apache.org/releases/javadoc/2.38.0/org/apache/beam/sdk/transforms/Deduplicate.KeyedValues.html > > On 4/2

KeyedBroadcastProcessFunction

2022-05-15 Thread Sigalit Eliazov
Hello does beam have support for something similar to KeyedBroadcastProcessFunction which exists in flink? I am looking for an option to have broadcast state in beam so it can be shared between different operators Thanks Sigalit

Re: KeyedBroadcastProcessFunction

2022-05-15 Thread Sigalit Eliazov
te was not available for the operator which reads from the scheduler . Thanks Sigalit On Sun, May 15, 2022 at 6:03 PM Reuven Lax wrote: > Beam supports side inputs, which might help you. Can you describe your use > case? > > On Sun, May 15, 2022 at 7:34 AM Sigalit Eliazov > wrote: >

Re: KeyedBroadcastProcessFunction

2022-05-16 Thread Sigalit Eliazov
About 2gb and it should be distributed בתאריך יום א׳, 15 במאי 2022, 19:50, מאת Reuven Lax ‏: > How large is this state? Is it distributed? > > On Sun, May 15, 2022 at 8:12 AM Sigalit Eliazov > wrote: > >> Thanks for your response. >> The use case is 2 pipelines: >

snyc between two pcollection with different windows

2022-07-20 Thread Sigalit Eliazov
Hi all i need some advice regarding windows usage. i am sure this is a very basic question, any guidance will be very appreciated I am using: - unbounded pcollectionA with FixedWindow of 1 minute from which eventually i create state and use it as side input. PCollection> pcollectionA = x.

sink triggers

2022-07-25 Thread Sigalit Eliazov
Hi all, I have a pipeline that reads input from a few sources, combines them and creates a view of the data. I need to send an output to kafka every X minutes. What will be the best way to implement this? Thanks Sigalit

read messages from kakfa: 2 different message types in kafka topic

2022-08-09 Thread Sigalit Eliazov
Hi all we have a single kafka topic which is used to receive 2 different types of messages. These 2 messages are Avro. So when reading messages from kafka i used the GenericRecord KafkaIO.read() .withBootstrapServers(bootstrapServers) .withTopic(topic) .withConsumerConfigUp

Re: read messages from kakfa: 2 different message types in kafka topic

2022-08-09 Thread Sigalit Eliazov
e to implement a similar DeserializerProvider for > Apicurio. You could also try using apicurio.registry.as-confluent, but > that requires to configure your producers accordingly. > > I any case, I suggest you study > ConfluentSchemaRegistryDeserializerProvider. That should lead you a path >

clear State using business logic

2022-11-23 Thread Sigalit Eliazov
Hi all, the flow in our pipeline is: 1. read event X from kafka. open fixed window of 30 sec. 2. read event subscription from kafka. open GlobalWindow and store a state of all subscriptions. 3. match X and Y using key and if there is a match send an event to another kafka topic. (we use the state

RedisIO readKeyPatterns

2023-03-26 Thread Sigalit Eliazov
hi all, i have a pipeline that reads message from kafka, generates pcollection of keys (using Keys.create) and then access to redis with these keys: PCollection> redisOutput = pcollectionKeys.apply(RedisIO.readKeyPatterns().withEndpoint(PipelineUtil.REDIS_HOST, PipelineUtil.REDIS_PORT); When usi

Custom TimestampPolicy using event time

2023-03-27 Thread Sigalit Eliazov
Hi all, I am trying to implement a custom TimestampPolicy and use the event time instead of the processing time the flow is read from kafka using .withTimestampPolicyFactory((tp, previousWatermark) -> new EventTimestampPolicy(previousWatermark)); and the custom class with implementation i saw m

major reduction is performance when using schema registry - KafkaIO

2023-04-09 Thread Sigalit Eliazov
Hello, I am trying to understand the effect of schema registry on our pipeline's performance. In order to do sowe created a very simple pipeline that reads from kafka, runs a simple transformation of adding new field and writes of kafka. the messages are in avro format I ran this pipeline with 3

Re: major reduction is performance when using schema registry - KafkaIO

2023-04-09 Thread Sigalit Eliazov
; How are you using the schema registry? Do you have a code sample? > > On Sun, Apr 9, 2023 at 3:06 AM Sigalit Eliazov > wrote: > >> Hello, >> >> I am trying to understand the effect of schema registry on our pipeline's >> performance. In order to do sowe created

Re: major reduction is performance when using schema registry - KafkaIO

2023-04-13 Thread Sigalit Eliazov
yDeserializerProvider” >> instead of KafkaAvroDeserializer and AvroCoder? >> >> More details and an example how to use is here: >> >> https://beam.apache.org/releases/javadoc/2.46.0/org/apache/beam/sdk/io/kafka/KafkaIO.html >> (go >> to “Use Avro schema with Con

Re: state broadcasting in flink

2023-05-18 Thread Sigalit Eliazov
Hi, i suggest review https://beam.apache.org/blog/timely-processing/ hope this helps Sigalit On Thu, May 18, 2023 at 9:27 AM Zheng Ni wrote: > Hi There, > > Does beam support flink's state broadcasting feature mentioned in the link > below? if yes, is there any beam doc/example available? > >

"No fields were detected for class org.apache.beam.sdk.util.WindowedValue" - flink runner

2023-06-11 Thread Sigalit Eliazov
Hi all, I'm encountering an issue while deploying the pipeline to the Flink runner. The pipeline reads an Avro message from Kafka, applies a global window, performs a simple transformation, and then sends the data back to Kafka. The problem arises when I see the following error in the job manager

Pipeline Stalls at GroupByKey Step

2023-11-16 Thread Sigalit Eliazov
Hi, In our pipeline, we've encountered an issue with the GroupByKey step. After some time of running, it seems that messages are not progressing through the GroupByKey step, causing the pipeline to stall in data processing. To troubleshoot this issue, we added debug logging before and after the Gr

Re: Pipeline Stalls at GroupByKey Step

2023-11-16 Thread Sigalit Eliazov
Sigalit On Fri, Nov 17, 2023 at 4:36 AM Sachin Mittal wrote: > Do you add time stamp to every record you output in > ConvertFromKafkaRecord step or any step before that. > > On Fri, 17 Nov 2023 at 4:07 AM, Sigalit Eliazov > wrote: > >> Hi, >> >> In our pipeline,

usage of dynamic schema in BEAM SQL

2024-01-28 Thread Sigalit Eliazov
Hello, In the upcoming process, we extract Avro messages from Kafka utilizing the Confluent Schema Registry. Our intention is to implement SQL queries on the streaming data. As far as I understand, since I am using the Flink runner, when creating the features PCollection, I must specify the ro

Re: usage of dynamic schema in BEAM SQL

2024-01-28 Thread Sigalit Eliazov
plain the use case a bit more? In order to write a SQL statement > (at least one that doesn't use wildcard selection) you also need to know > the schema ahead of time. What are you trying to accomplish with these > dynamic schemas? > > Reuven > > On Sun, Jan 28, 2024

Beam SQL JOIN with Side Inputs

2024-02-14 Thread Sigalit Eliazov
hi, We are currently working on a use case that involves streaming usage data arriving every 5 minutes, and we have a few dimension tables that undergo changes once a day. Our attempt to implement a join between these tables using Beam SQL encountered a limitation. Specifically, Beam SQL requires

On timer method are not triggred

2024-03-26 Thread Sigalit Eliazov
Hi all We encountered issue with timers starting from version 2.52. We saw that the timers are not triggered. https://github.com/apache/beam/issues/29816 Did someone encounter such problems as well? Thanks Sigalit

Re: On timer method are not triggred

2024-03-27 Thread Sigalit Eliazov
d, Mar 27, 2024 at 9:54 AM Jan Lukavský wrote: > Hi, > > what is your runner, is it Flink as well in the issue? What is the > source of your Pipeline? Do you use some additional flags, e.g. > --experiments? Do you see that using classical or portable runner? > > Jan > >

Re: On timer method are not triggred

2024-03-27 Thread Sigalit Eliazov
we are running with flink runner, i also tested with direct runner. same results On Wed, Mar 27, 2024 at 2:51 PM Sigalit Eliazov wrote: > hi, > this is the pipeline, very simple one > the onTimer is not fired. > We are not using any experimental variables. > > public class

Very high memory consumption using KafkaIO

2024-06-04 Thread Sigalit Eliazov
Hi, We have a pipeline that reads data from kafka using kafkaIO, activates a simple transformation and writes the data back to Kafka. we are using flink runner with the following resources parallelism: 50 Task manager: 5 Memory per each: 20G Cpu: 10 This pipeline should handle around 400K m