Re: Zookeeper connection loss causing checkpoint corruption

2020-09-21 Thread Timo Walther
Hi Arpith, is there a JIRA ticket for this issue already? If not, it would be great if you can report it. This sounds like a critical priority issue to me. Thanks, Timo On 22.09.20 06:25, Arpith P wrote: Hi Peter, I have recently had a similar issue where I could not load from the checkpoi

Re: Zookeeper connection loss causing checkpoint corruption

2020-09-21 Thread Arpith P
Hi Peter, I have recently had a similar issue where I could not load from the checkpoints path. I found that whenever a corrupt checkpoint happens the "_metadata" file will not be persisted, and I've a program which tracks if checkpoint location based on this strategy and updates DB with location

Re: metaspace out-of-memory & error while retrieving the leader gateway

2020-09-21 Thread Xintong Song
Thanks for the input, Brain. This looks like what we are looking for. The issue is fixed in 1.10.3, which also matches this problem occurred in 1.10.2. Maybe Claude can further confirm it. Thank you~ Xintong Song On Tue, Sep 22, 2020 at 10:57 AM Zhou, Brian wrote: > Hi Xintong and Claude,

RE: metaspace out-of-memory & error while retrieving the leader gateway

2020-09-21 Thread Zhou, Brian
Hi Xintong and Claude, In our internal tests, we also encounter these two issues and we spent much time debugging them. There are two points I need to confirm if we share the same problem. 1. Your job is using default restart strategy, which is per-second restart. 2. Your CPU resource on

Re: Support for gRPC in Flink StateFun 2.x

2020-09-21 Thread Dalmo Cirne
Thank you for the quick reply, Igal. Our use case is the following: A stream of data from Kafka is fed into Flink where data transformations take place. After that we send that transformed data to an inference engine to score the relevance of each record. (Rough simplification.) Doing that usi

Re: metaspace out-of-memory & error while retrieving the leader gateway

2020-09-21 Thread Xintong Song
## Metaspace OOM As the error message already suggested, the metaspace OOM you encountered is likely caused by a class loading leak. I think you are on the right direction trying to look into the heap dump and find out where the leak comes from. IIUC, after removing the ZK folder, you are now able

Re: Flink multiple task managers setup

2020-09-21 Thread Yangze Guo
Hi, As the error message said, it could not find the flink-dist.jar in "/cygdrive/d/Apacheflink/dist/apache-flink-1.9.3/deps/lib". Where is your flink distribution and do you change the directory structure of it? Best, Yangze Guo On Mon, Sep 21, 2020 at 5:31 PM saksham sapra wrote: > > HI, > >

hourly counter

2020-09-21 Thread Lian Jiang
Hi, I have a window function with a window width of 1 min. I want to have an hourly counter which is reset every hour so it never overflows. There are multiple ways but none of them is straightforward: StatsDClient instance = new NonBlockingStatsDClientBuilder() int count = 0; void incr() { m

Re: Watermark advancement in late side output

2020-09-21 Thread orips
Great, thanks -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Unknown datum type java.time.Instant: 2020-09-15T07:00:00Z

2020-09-21 Thread Lian Jiang
Thanks guys. Given Flink 1.12 is not ready (e.g. not available in Maven repo), I need to stick to 1.11. Dawid, For the code throwing "java.lang.Long cannot be cast to java.time.Instant", The avro schema has: union {null, timestamp_ms } eventTime = null; The avro pojo does have the logical type

Re: How to disconnect taskmanager via rest api?

2020-09-21 Thread Timo Walther
Hi Luan, this sound more of a new feature request to me. Maybe you can already open an issue for it. I will loop in Chesnay in CC if there is some possibility to achieve this already? Regards, Timo On 21.09.20 06:37, Luan Cooper wrote: Hi We're running flink standalone cluster on k8s whe

Re: Problem with zookeeper and flink config

2020-09-21 Thread Timo Walther
Hi Saksham, if I understand you correctly, you are running Zookeeper and Flink locally on your machine? Are you using Docker or is this a bare metal setup? The exception indicates that your paths contain `hdfs:` as URL scheme. Are you sure you want to use HDFS? If yes, you need to add an HDFS

Re: Watermark advancement in late side output

2020-09-21 Thread Timo Walther
Hi Ori, first of all, watermarks are sent to all side outputs (this is tested here [1]). Thus, operators in the side output branch of the pipeline will work similar to operators in the main branch. When calling `assignTimestampsAndWatermarks`, the inserted operator will erase incoming waterm

Zookeeper connection loss causing checkpoint corruption

2020-09-21 Thread Peter Westermann
I recently ran into an issue with our Flink cluster: A zookeeper service deploy caused a temporary connection loss and triggered a new jobmanager leader election. Leadership election was successful and our Flink job restarted from the last checkpoint. This checkpoint appears to have been taken w

Re: [DISCUSS] Drop Scala 2.11

2020-09-21 Thread Theo Diefenthal
We use a Cloudera 6.3 cluster in prod. I'd guess that it's still widely used in prod as those cloudera upgrades for major versions are planned long time ahead and take a significant amount of resources in big data lakes. On that 6.3. cluster, if I open spark-shell, I still see scala 2.11 in use

Re: App gets stuck in Created State

2020-09-21 Thread Zhu Zhu
Hi Arpith, All tasks in CREATED state indicates no task is scheduled yet. It is strange it a job gets stuck in this state. Is it possible that you share the job manager log so we can check what is happening there? Thanks, Zhu Arpith P 于2020年9月21日周一 下午3:52写道: > Hi, > > We have Flink 1.8.0 clust

Watermark advancement in late side output

2020-09-21 Thread Ori Popowski
Let's say I have an event-time stream with a window and a side output for late data, and in the side output of the late data, I further assign timestamps and do windowing - what is the watermark situation here? The main stream has its own watermark advancement but the side output has its own. Do t

Re: Flink Table SQL and writing nested Avro files

2020-09-21 Thread Dawid Wysakowicz
Hi Dan, I think the best what I can suggest is this: |SELECT || | |    ROW(left.field0, left.field1, left.field2, ...),| |    ROW(right.field0, right.field1, right.field2, ...)| |FROM ...| You will need to list all the fields manually, as SQL does not allow for asterisks in regular function c

Fwd: Flink multiple task managers setup

2020-09-21 Thread saksham sapra
HI, i installed cygdrive and tried to run start-cluster.sh where zookeeper is up and running and defined one job manager and one task manager, but getting this issue. $ start-cluster.sh start Starting HA cluster with 1 masters. -zTheFIND: Invalid switch system cannot find the file specified. [E

App gets stuck in Created State

2020-09-21 Thread Arpith P
Hi, We have Flink 1.8.0 cluster deployed in Hadoop distributed mode, I often see even though Hadoop has enough resources Flink sits in Created state. We have 4 operators using 15 parallelism, 1 operator using 40 & 2 operators using 10. At time of submission I'm passing taskmanager memory as 4Gb an

Re: Disable WAL in RocksDB recovery

2020-09-21 Thread Yu Li
Great, thanks for the follow up. Best Regards, Yu On Mon, 21 Sep 2020 at 15:04, Juha Mynttinen wrote: > Good, > > I opened this JIRA for the issue > https://issues.apache.org/jira/browse/FLINK-19303. The discussion can be > moved there. > > Regards, > Juha > -- > *F

Re: Unknown datum type java.time.Instant: 2020-09-15T07:00:00Z

2020-09-21 Thread Aljoscha Krettek
Hi All, Avro was finally bumped in https://issues.apache.org/jira/browse/FLINK-18192. The implementers didn't see https://issues.apache.org/jira/browse/FLINK-12532, but it is also updated now. Best, Aljoscha On 21.09.20 08:04, Arvid Heise wrote: Hi Lian, we had a similar discussion on [

Re: Unknown datum type java.time.Instant: 2020-09-15T07:00:00Z

2020-09-21 Thread Dawid Wysakowicz
Hey Arvid, Just a quick comment to Arvid's mail for now. It should be safe to update the Avro version even if we've been declaring dependency on Avro 1.8.2 by default. Moreover up until now we do not bundle any version of Avro in any of the uber jars we ship. It is true we used Avro version 1.8.2

Re: Disable WAL in RocksDB recovery

2020-09-21 Thread Juha Mynttinen
Good, I opened this JIRA for the issue https://issues.apache.org/jira/browse/FLINK-19303. The discussion can be moved there. Regards, Juha From: Yu Li Sent: Friday, September 18, 2020 3:58 PM To: Juha Mynttinen Cc: user@flink.apache.org Subject: Re: Disable W