Re: Fwd: Can `DataStream`s "fan-in" to a single sink?

2021-09-22 Thread Chesnay Schepler
> The main concern I have is that concurrent calls from the different pipelines might step on each other in some way In your current solution this wouldn't happen because you'd actually end up with 3 separate sink operators. > job1.union(job2).union(job3).addSink(sink) This is the way to go.

Fwd: Can `DataStream`s "fan-in" to a single sink?

2021-09-22 Thread Antony Southworth
Hi Firstly, apologies if I commit any faux-pas, I have never used a mailing list before. At least from Googling, reading Flink docs, and searching the mailing list archives for "fan-in" didn't turn up much so hoping someone can enlighten me here. My use-case is similar to the following: Data

Resource leak would happen if exception thrown when flink redisson

2021-09-22 Thread a773807943
I encountered a problem in the process of integrating Flink and Redisson. When the task encounters abnormalities and keeps retries, it will cause the number of Redis Clients to increase volatility (sometimes the number increases, sometimes the number decreases, but the overall tr

Kafka Partition Discovery

2021-09-22 Thread Mason Chen
Hi all, We are sometimes facing a connection issue with Kafka when a broker restarts ``` java.lang.RuntimeException: org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.runWithPartitio

Re: Not able to avoid Dynamic Class Loading

2021-09-22 Thread Kevin Lam
Sorry for the late reply here, I'm just returning to this now. Interesting re: the avro version, we're using 1.10.0 in our application jar. But maybe this is somehow being clobbered when we try to move it into /lib vs. /usrlib to avoid dynamic class loading. Is it possible that's happening? On Fr

Re: Support ARM architecture

2021-09-22 Thread Robert Metzger
Hi, afaik the only real blocker for ARM support was a rocksdb binary for arm. This has been resolved and is scheduled to be released with 1.14.0: https://issues.apache.org/jira/browse/FLINK-13598 If you have an ARM machine available, you could even help the community in the release verification p

Re: Many S3V4AuthErrorRetryStrategy warn logs while reading/writing from S3

2021-09-22 Thread Robert Metzger
Hey Andreas, This could be related too https://github.com/apache/hadoop/pull/110/files#diff-0a2e55a2f79ea4079eb7b77b0dc3ee562b383076fa0ac168894d50c80a95131dR950 I guess in Flink this would be s3.endpoint: your-endpoint-hostname Where your-endpoint-hostname is a region-specific endpoint, which y

Re: flink rest endpoint creation failure

2021-09-22 Thread Robert Metzger
Hi, Yes, "rest.bind-port" seems to be set to "35485" on the JobManager instance. Can you double check the configuration that is used by Flink? The jobManager is also printing the effective configuration on start up. You'll probably see the value there as well. On Wed, Sep 22, 2021 at 6:48 PM Cur

Support ARM architecture

2021-09-22 Thread Patrick Angeles
Hey all, Trying to follow FLINK-13448. Seems like all the subtasks, save for one on documentation, are completed... does this mean there will be an arm64 binary available in the next release (1.14)?

Many S3V4AuthErrorRetryStrategy warn logs while reading/writing from S3

2021-09-22 Thread Hailu, Andreas
Hi, When reading/writing to and from S3 using the flink-fs-s3-hadoop plugin on 1.11.2, we observe a lot of these WARN log statements in the logs: WARN S3V4AuthErrorRetryStrategy - Attempting to re-send the request to s3.amazonaws.com with AWS V4 authentication. To avoid this warning in the fu

flink rest endpoint creation failure

2021-09-22 Thread Curt Buechter
Hi, I'm getting an error that happens randomly when starting a flink application. For context, this is running in YARN on AWS. This application is one that converts from the Table API to the Stream API, so two flink applications/jobmanagers are trying to start up. I think what happens is that the

pyflink keyed stream checkpoint error

2021-09-22 Thread Curt Buechter
Hi, I'm getting an error after enabling checkpointing in my pyflink application that uses a keyed stream and rocksdb state. Here is the error message: 2021-09-22 16:18:14,408 INFO org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend [] - Closed RocksDB State Backend. Cleaning up Rock

Re: Unbounded Kafka Source

2021-09-22 Thread Robert Cullen
Robert, So removing the setUnbounded(OffsetInitializer.latest) fixed the issue. Thanks! On Wed, Sep 22, 2021 at 11:51 AM Robert Metzger wrote: > Hi, > > What happens if you do not set any boundedness on the KafkaSource? > For a DataStream job in streaming mode, the Kafka source should be > unbo

RE: Unbounded Kafka Source

2021-09-22 Thread Schwalbe Matthias
Hi, If I remember right, this is actually the intended behaviour: In batch mode: .setBounded(…) In streaming mode: source that finishes anyway at set offset: use .setUnbounded(…) In streaming mode: source that never finishes: don’t set a final offset (don’t .setUnbounded(…)) I might be mistake

Re: Unbounded Kafka Source

2021-09-22 Thread Robert Metzger
Hi, What happens if you do not set any boundedness on the KafkaSource? For a DataStream job in streaming mode, the Kafka source should be unbounded. >From reading the code, it seems that setting unbounded(latest) should not trigger the behavior you mention ... but the Flink docs are not clearly w

Unbounded Kafka Source

2021-09-22 Thread Robert Cullen
I have an unbounded kafka source that has records written to it every second. Instead of the job waiting to process the new messages it closes. How do I keep the stream open? KafkaSource dataSource = KafkaSource .builder() .setBootstrapServers(kafkaServer) .setTopics(Array

Re: Flink Performance Issue

2021-09-22 Thread Robert Metzger
Hi Kamaal, I would first suggest understanding the performance bottleneck, before applying any optimizations. Idea 1: Are your CPUs fully utilized? if yes, good, then scaling up will probably help If not, then there's another inefficiency Idea 2: How fast can you get the data into your job, with

Re: Observability tools on top of Flink

2021-09-22 Thread Deepak Sharma
I am interested in learning this fact as well as I need to put in observability in a flink pipeline. On Wed, 22 Sep 2021 at 8:40 PM, Dan Hill wrote: > Hi! > > I saw a recent Medium article >

Can't access Debezium metadata fields in Kafka table

2021-09-22 Thread Harshvardhan Shinde
Hi, I'm trying to access the metadata columns from the debezium source connector as documented here . However I'm getting the following error when I try to select the rows from

Observability tools on top of Flink

2021-09-22 Thread Dan Hill
Hi! I saw a recent Medium article

Re: S3 access permission error

2021-09-22 Thread Harshvardhan Shinde
Hi, I was facing the same issue, the best way to solve this is to use the IAM role (which is the recommended way) instead of the access keys. Hope this helps. On Wed, Sep 22, 2021 at 1:32 PM Yangze Guo wrote: > I'm not an expert on S3. If it is not a credential issue, have you > finish the check

Re: Flink Performance Issue

2021-09-22 Thread Mohammed Kamaal
Hi Arvid, The throughput has decreased further after I removed all the rebalance(). The performance has decreased from 14 minutes for 20K messages to 20 minutes for 20K messages. Below are the tasks that the flink application is performing. I am using keyBy and Window operation. Do you think a

Re: Stream join with (changing) dimension in Kafka

2021-09-22 Thread Caizhi Weng
Hi! What type of time attribute is u_ts? If it is an event time attribute then this query you're running is an event time temporal table join, which will pause outputting records until the watermark from both inputs has risen above the row time of that record. As the dimension table is changing q

Stream join with (changing) dimension in Kafka

2021-09-22 Thread John Smith
Hi, I'm trying to use temporal join in Table API to enrich a stream of pageview events with a slowly changing dimension of user information. The pageview events are in a kafka topic called *pageviews* and the user information are in a kafka topic keyed by *userid* and whenever there is an updated

Re: S3 access permission error

2021-09-22 Thread Yangze Guo
I'm not an expert on S3. If it is not a credential issue, have you finish the checklist of this doc[1]? [1] https://aws.amazon.com/premiumsupport/knowledge-center/emr-s3-403-access-denied/?nc1=h_ls Best, Yangze Guo On Wed, Sep 22, 2021 at 3:39 PM Dhiru wrote: > > > Not sure @yangze ... but ot

Re: S3 access permission error

2021-09-22 Thread Dhiru
Not sure @yangze ...  but other services which are deployed in same places we are able to access s3 bucket, the link which you share are recommended way, if we have access to s3 then we should not pass credentials ? On Wednesday, September 22, 2021, 02:59:05 AM EDT, Yangze Guo wrote: