KeyedCoProcessFunction, processElement1, processElement2, onTimer timeout

2020-09-15 Thread Mazen Ezzeddine
Hey all, I am using the KeyedCoProcessFunction class in Flink DataStream APIs to implement a timeout like use case. The scenario is as follows: I have an input kafka topic and an output Kafka topic, a service reads from the input topic processes it (for variable amount of time) and then publishes

Unknown datum type java.time.Instant: 2020-09-15T07:00:00Z

2020-09-15 Thread Lian Jiang
Hi, i am using avro 1.9.1 + Flink 1.10.1 + Confluent Kafka 5.5. In Intellij, I can see the FlinkKafkaConsumer already deserialized the upstream kafka message. However, I got below error when this message is serialized during pushToOperator. Per the stack trace, the reason is that AvroSerializer is

Disable WAL in RocksDB recovery

2020-09-15 Thread Juha Mynttinen
Hello there, I'd like to bring to discussion a previously discussed topic - disabling WAL in RocksDB recovery. It's clear that WAL is not needed during the process, the reason being that the WAL is never read, so there's no need to write it. AFAIK the last thing that was done with WAL during r

Flink Table SQL, Kafka, partitions and unnecessary shuffling

2020-09-15 Thread Dan Hill
How do I avoid unnecessary reshuffles when using Kafka as input? My keys in Kafka are ~userId. The first few stages do joins that are usually (userId, someOtherKeyId). It makes sense for these joins to stay on the same machine and avoid unnecessary shuffling. What's the best way to avoid unnece

Re: Why setAllVerticesInSameSlotSharingGroupByDefault is set to false in batch mode

2020-09-15 Thread Zhu Zhu
Hi Zheng, To divide managed memory for operators[1], we need to consider which tasks will run in the same slot. In batch jobs, vertices in different regions may not run at the same time. If we put them in the same slot sharing group, running tasks may run slower with less managed memory, while man

Why setAllVerticesInSameSlotSharingGroupByDefault is set to false in batch mode

2020-09-15 Thread zheng faaron
Hi All, I find we set AllVerticesInSameSlotSharingGroupByDefault to false in flink 1.10. It will make batch job request lots of containers. I'm not sure why we set it to false directly. I try to set it to true and find the batch job can run correctly with a small amount containers. Why don't w

Fwd: I have a job with multiple Kafka sources. They all contain certain historical data.

2020-09-15 Thread hao kong
Hello guys, I have a job with multiple Kafka sources. They all contain certain historical data. If you use the events-time window, it will cause sources with less data to cover more sources through water mark. I can think of a solution, Implement a scheduler in the source phase, But it is quite

Re: Flink Table API and not recognizing s3 plugins

2020-09-15 Thread Dan Hill
Sweet, this was the issue. I got this to work by copying the s3 jar over to plugins for the client container. Thanks for all of the help! The Table API is sweet! On Mon, Sep 14, 2020 at 11:14 PM Dan Hill wrote: > Yes, the client runs in K8. It uses a different K8 config than the Helm > chart

Re: Flink alert after database lookUp

2020-09-15 Thread Arvid Heise
Hi Sunitha, dependency looks good. I'd probably bump the version to 1.1.0 though (version is off-cycle to Flink as of now to accelerate releases of this young feature). Best, Arvid On Tue, Sep 15, 2020 at 5:10 PM s_penakalap...@yahoo.com < s_penakalap...@yahoo.com> wrote: > Hi Arvid, > > Thank

Re: Flink alert after database lookUp

2020-09-15 Thread s_penakalap...@yahoo.com
Hi Arvid, Thank you!!! Will check change data capture approach. Please confirm including dependency and adding sourceFunction should help us to achieve CDC.   com.alibaba.ververica  flink-connector-postgre-cdc  1.0.0 wrote: Hi Sunitha, to listen to changes in your database a change-data-ca

Re: Emit event to kafka when finish sink

2020-09-15 Thread Dawid Wysakowicz
Hi, I am not sure if I understand your first solution, but it sounds rather complicated. I think implementing a custom operator could be a valid approach. You would have to make sure it is run with parallelism of 1. You could additionally implement a BoundedOneInput interface and notify the exter

Re: How to schedule Flink Batch Job periodically or daily

2020-09-15 Thread s_penakalap...@yahoo.com
Hi Arvid, Thanks a lot. Will check Airflow and Cron-job options. Regards,Sunitha. On Monday, September 14, 2020, 05:23:43 PM GMT+5:30, Arvid Heise wrote: Hi Sunitha, oozie is a valid approach, but I'd recommend to evaluate Airflow first [1]. It's much better maintained and easier to us

Re: The rpc invocation size 13478509 exceeds the maximum akka framesize

2020-09-15 Thread Jake
Hi zheng, It seem’s data is large. Resizing the framesize of akka will not working. You can increase the parallelism. Jake. > On Sep 15, 2020, at 5:58 PM, zheng faaron wrote: > > Hi Zhu, > > It's just a mistake in mail. It seems increase akka.framesize not works in > this scenario. > > B

I have a job with multiple Kafka sources. They all contain certain historical data.

2020-09-15 Thread hao kong
Hello, I have a job with multiple Kafka sources. They all contain certain historical data. If you use the events-time window, it will cause sources with less data to cover more sources through water mark. Is there a solution?

Emit event to kafka when finish sink

2020-09-15 Thread Antonio Manzano
Hello guys, i would like to know if there is any possibility to emit an event when a sink has finished. To put it in context, I have a simple ETL (streaming bounded) that reads data from a database, maps, and inserts into another database. Once I finish inserting the data I want to issue an event

Re: Flink DynamoDB stream connector losing records

2020-09-15 Thread Cranmer, Danny
Hi Jiawei, I agree that the offset management mechanism uses the same code as Kinesis Stream Consumer and in theory should not lose exactly-once semantics. As Ying is alluding to, if your application is restarted and you have snapshotting disabled in AWS there is a chance that records can be lo

[ANNOUNCE] Weekly Community Update 2020/37

2020-09-15 Thread Konstantin Knauf
Dear community, happy to share a belated update for the past week. This time with the release of Flink 1.11.2, a couple of discussions and FLIPs on improving Flink's APIs and dropping some baggage, most notably Scala 2.11, a new unified sink API and a bit more. Flink Development == *

Re: Performance issue associated with managed RocksDB memory

2020-09-15 Thread Yu Li
Thanks for the follow up Juha, I've just assigned FLINK-19238 to you. Let's further track this on JIRA. Best Regards, Yu On Tue, 15 Sep 2020 at 15:04, Juha Mynttinen wrote: > Hey > > I created this one https://issues.apache.org/jira/browse/FLINK-19238. > > Regards, > Juha > ---

Re: Performance issue associated with managed RocksDB memory

2020-09-15 Thread Juha Mynttinen
Hey I created this one https://issues.apache.org/jira/browse/FLINK-19238. Regards, Juha From: Yun Tang Sent: Tuesday, September 15, 2020 8:06 AM To: Juha Mynttinen ; Stephan Ewen Cc: user@flink.apache.org Subject: Re: Performance issue associated with managed R