Re: Streaming Job eventually begins failing during checkpointing

2020-04-15 Thread Yun Tang
Hi Stephen This is not related with RocksDB but with default on-heap operator state backend. From your exception stack trace, you have created too many operator states (more than 32767). How do you call context.getOperatorStateStore().getListState or context.getOperatorStateStore().getBroadcas

Re: Exception in thread "main" org.apache.flink.table.api.TableException: Group Window Aggregate: Retraction on windowed GroupBy Aggregate is not supported yet.

2020-04-15 Thread Benchao Li
Hi, In blink planner, if you set retention time, it means that you enabled late records handling in WindowOperator. It also changes the output of WindowOperator from append to retract. 刘建刚 于2020年4月16日周四 上午8:40写道: > No ,I do not use "fast-emit”. Another group by is combined with this SQL. > I us

Re: Flink Conf "yarn.flink-dist-jar" Question

2020-04-15 Thread Yang Wang
Hi All, thanks a lot for reviving this discussion. I think we could unify the FLINK-13938 and FLINK-14964 since they have the similar purpose, avoid unnecessary uploading and downloading jars in YARN deployment. The difference is FLINK-13938 aims to support the flink system lib directory only, whi

Re: Exception in thread "main" org.apache.flink.table.api.TableException: Group Window Aggregate: Retraction on windowed GroupBy Aggregate is not supported yet.

2020-04-15 Thread 刘建刚
Thank you. I will use flink planner first and have a look at the detail code. > 2020年4月16日 上午10:17,Benchao Li 写道: > > Hi, > > In blink planner, if you set retention time, it means that you enabled late > records handling in WindowOperator. > It also changes the output of WindowOperator from ap

RE: Flink Conf "yarn.flink-dist-jar" Question

2020-04-15 Thread Hailu, Andreas [Engineering]
Okay, I’ll continue to watch the JIRAs. Thanks for the update, Till. // ah From: Till Rohrmann Sent: Wednesday, April 15, 2020 10:51 AM To: Hailu, Andreas [Engineering] Cc: Yang Wang ; tison ; user@flink.apache.org Subject: Re: Flink Conf "yarn.flink-dist-jar" Question Hi Andreas, it looks a

Re: Exception in thread "main" org.apache.flink.table.api.TableException: Group Window Aggregate: Retraction on windowed GroupBy Aggregate is not supported yet.

2020-04-15 Thread 刘建刚
No ,I do not use "fast-emit”. Another group by is combined with this SQL. I use “tableConfig.setIdleStateRetentionTime()” to control idled state. If I delete “tableConfig.setIdleStateRetentionTime()” in blink, the error disappears. How can I resolve it? Thank you. > 2020年4月15日 下午8:11,Benchao Li

Re: [PROPOSAL] Contribute training materials to Apache Flink

2020-04-15 Thread David Anderson
Till, Yun, etal, Now that we've established the community's interest in engaging with this content, I've started a new thread on the dev list for discussion of the details. I've said a bit there already regarding ongoing maintenance, and CI for the exercises. Best, David On Wed, Apr 15, 2020 at

Streaming Job eventually begins failing during checkpointing

2020-04-15 Thread Stephen Patel
I've got a flink (1.8.0, emr-5.26) streaming job running on yarn. It's configured to use rocksdb, and checkpoint once a minute to hdfs. This job operates just fine for around 20 days, and then begins failing with this exception (it fails, restarts, and fails again, repeatedly): 2020-04-15 13:15:

Re: Question about Writing Incremental Graph Algorithms using Apache Flink Gelly

2020-04-15 Thread Kaan Sancak
Thanks that is working now! I have one last question. Goin one step further, I have changed vertex value type to be a POJO class. The structure is somewhat similar to this, class LocalStorage { Integer id; Long degree; Boolean active; List labels; Map nei

AvroParquetWriter issues writing to S3

2020-04-15 Thread Diogo Santos
Hi guys, I'm using AvroParquetWriter to write parquet files into S3 and when I setup the cluster (starting fresh instances jobmanager/taskmanager etc), the scheduled job starts executing without problems and could write the files into S3 but if the job is canceled and starts again the job throws t

Re: Processing Message after emitting to Sink

2020-04-15 Thread KristoffSC
My point was, that as far as I know, Sinks are "terminating" operators, that ends the stream like .collect in Java 8 stream API. The don't emit elements further and I cannot link then in a way: source - proces - sink - process - sink Sink function produces DataStreamSink which is used for emittin

Re: post-checkpoint watermark out of sync with event stream?

2020-04-15 Thread Aljoscha Krettek
Hi Cliff, On 14.04.20 19:29, Cliff Resnick wrote I'm wondering how this could be possible. The only explanation I can think of is: 4. on "endTime" timer key state is purged. 5 --- job fail --- 6. job restarted on 2.5 hour old Savepoint 7. watermark regresses (?) from "endTime" watermark. 8. a

Re: Question about Writing Incremental Graph Algorithms using Apache Flink Gelly

2020-04-15 Thread Till Rohrmann
Hi Kaan, I think what you are proposing is something like this: Graph graph = ... // get first batch Graph graphAfterFirstSG = graph.runScatterGatherIteration(); Graph secondBatch = ... // get second batch // Adjust the result of SG iteration with secondBatch Graph updatedGraph = graphAfterFi

Re: Flink Conf "yarn.flink-dist-jar" Question

2020-04-15 Thread Till Rohrmann
Hi Andreas, it looks as if FLINK-13938 and FLINK-14964 won't make it into the 1.10.1 release because the community is about to start the release process. Since FLINK-13938 is a new feature it will be shipped with a major release. There is still a bit of time until the 1.11 feature freeze and if Ya

Re: [PROPOSAL] Contribute training materials to Apache Flink

2020-04-15 Thread Jingsong Li
+1. It's very useful for Flink newcomers. Best, Jingsong Lee On Wed, Apr 15, 2020 at 10:23 PM Yun Tang wrote: > +1 for this idea. > > I think there would existed many details to discuss once community ready > to host the materials: > >1. How to judge whether a lab exercise should be added?

Re: [PROPOSAL] Contribute training materials to Apache Flink

2020-04-15 Thread Yun Tang
+1 for this idea. I think there would existed many details to discuss once community ready to host the materials: 1. How to judge whether a lab exercise should be added? There would be many user cases for streaming computation, I think we should need a outline for the knowledge map of Flink

Re: Quick survey on checkpointing performance

2020-04-15 Thread Yun Tang
Hi Robin First of all, did you get the state size from the web UI? If so, the state size is the incremental checkpoint size not the actual full size [1]. I assume you only have one RocksDB instance per slot, the incremental checkpoint size for each RocksDB instance is 2011MB, which is some how

Re: Processing Message after emitting to Sink

2020-04-15 Thread Timo Walther
Yes. But that's the problem of your use cases, right? If you need to wait for the sink to be completed, it is not a terminating operator anymore. Regards, Timo On 15.04.20 10:50, KristoffSC wrote: Thank you very much for your answer. I have a question regarding your first paragraph: " it req

RE: Flink Conf "yarn.flink-dist-jar" Question

2020-04-15 Thread Hailu, Andreas [Engineering]
Yang, Tison, Do we know when some solution for 13938 and 14964 will arrive? Do you think it will be in a 1.10.x version? // ah From: Hailu, Andreas [Engineering] Sent: Friday, March 20, 2020 9:19 AM To: 'Yang Wang' Cc: tison ; user@flink.apache.org Subject: RE: Flink Conf "yarn.flink-dist-jar"

Re: Registering UDAF in blink batch app

2020-04-15 Thread Jingsong Li
Hi Dmytro, For 1.11: Like Godfrey said, you can use "TableEnvironment#createFunction/createTemporarySystemFunction". And like Timo said, can support function with new type system. But for 1.10 and 1.9: A workaround way is: "tEnv.getCatalog(tEnv.getCurrentCatalog()).get().createFunction" You may n

Re: Exception in thread "main" org.apache.flink.table.api.TableException: Group Window Aggregate: Retraction on windowed GroupBy Aggregate is not supported yet.

2020-04-15 Thread Benchao Li
Hi, Did you set "fast-emit" for your query? If yes, the exception is by-design. Because emit will change the output of windowed aggregate from append to retract. There is an open issue about this[1]. [1] https://issues.apache.org/jira/browse/FLINK-16844 刘建刚 于2020年4月15日周三 下午7:07写道: > I am

Exception in thread "main" org.apache.flink.table.api.TableException: Group Window Aggregate: Retraction on windowed GroupBy Aggregate is not supported yet.

2020-04-15 Thread 刘建刚
I am using two sequence windows in SQL as following: SELECT TUMBLE_START(rowtime, interval '1' minute) AS windowStart, bitmapUnion(bmp) AS bmp FROM (SELECT TUMBLE_ROWTIME(eventTime, interval '1' minute) AS rowtime, bitmap(id) AS bmp FROM person GROUP BY TUMBLE(eventTi

Quick survey on checkpointing performance

2020-04-15 Thread Robin Cassan
Hi all, We are currently experiencing long checkpointing times on S3 and are wondering how abnormal it is compared to other loads and setups. Could some of you share a few stats in your running architecture so we can compare? Here are our stats: *Architecture*: 28 TM on Kubernetes, 4 slots per T

Re: Processing Message after emitting to Sink

2020-04-15 Thread KristoffSC
Thank you very much for your answer. I have a question regarding your first paragraph: " it requires that a sink participates in the pipeline. So it is not located as a "leaf" operator but location somewhere in the middle." Isn't Sink a terminating operator? So as far as I know Sinks cannot be in

Re: Objects with fields that are not serializable

2020-04-15 Thread Timo Walther
Hi Dominik, Flink does not use Java serialization logic for network communication. So objects must not implement `Serializable` for usage during runtime (DataStream). Only if those classes are member variables of a Function like MapFunction, they need to serializable to ship the function cod

Re: Processing Message after emitting to Sink

2020-04-15 Thread Timo Walther
Hi Kristoff, synchronization across operators is not easy to achieve. If one needs to wait until a sink has processed some element, it requires that a sink participates in the pipeline. So it is not located as a "leaf" operator but location somewhere in the middle. So your idea to call MQ di

Re: [PROPOSAL] Contribute training materials to Apache Flink

2020-04-15 Thread Till Rohrmann
Hi David, making the training materials available on flink.apache.org would increase the reach and improve its visibility. Since this is very helpful material for our users +1 for contributing the training material. If we decide not maintain different versions, then we might be able to highlight

Re: Flink On Yarn , ResourceManager is HA , if active ResourceManager changed,what is flink task status ?

2020-04-15 Thread Xintong Song
Normally, Yarn RM switch should not cause any problem to the running Flink instance. Unless the RM switch takes too long and Flink happens to request new containers during that time, it might lead to resource allocation timeout. Thank you~ Xintong Song On Wed, Apr 15, 2020 at 3:49 PM LakeShen

Flink On Yarn , ResourceManager is HA , if active ResourceManager changed,what is flink task status ?

2020-04-15 Thread LakeShen
Hi community, I have a question about flink on yarn ha , if active resourcemanager changed, what is the flink task staus. Is flink task running normally? Should I must restart my flink task to run? Thanks to your reply. Best, LakeShen

Re: flink java.util.concurrent.TimeoutException

2020-04-15 Thread Yangze Guo
日志上看是Taskmanager心跳超时了,如果tm还在,是不是网络问题呢?尝试把heartbeat.timeout调大一些试试? Best, Yangze Guo On Mon, Apr 13, 2020 at 10:40 AM 欧阳苗 wrote: > > job运行了两天就挂了,然后抛出如下异常,但是taskManager没有挂,其他的job还能正常在上面跑,请问这个问题是什么原因导致的,有什么好的解决办法吗 > > > 2020-04-13 06:20:31.379 ERROR 1 --- [ent-IO-thread-3] > org.apache.flink.runtim