Re: DataSet/DataStream of scala type class interface

2020-04-13 Thread Salva Alcántara
FYI, I have posted the same question (a bit more polished) in https://stackoverflow.com/questions/61193662/dataset-datastream-of-type-class-interface Also, you can find the code in this repo: https://github.com/salvalcantara/flink-events-and-polymorphism -- Sent from: http://apache-flink-use

[1.10.0] flink-dist source jar is empty

2020-04-13 Thread Steven Wu
We build and publish flink-dist locally. But the source jar turns out empty. Other source jars (like flink-core) are good. Anyone else experienced similar problem? Thanks, Steven

Re: Flink job didn't restart when a task failed

2020-04-13 Thread Zhu Zhu
Sorry for not following this ML earlier. I think the cause might be that the final state ('FAILED') update message to JM is lost. TaskExecutor will simply fail the task (which does not take effect in this case since the task is already FAILED) and will not update the task state again in this case.

Re: Setting different idleStateRetentionTime for different queries executed in the same TableEnvironment in Flink 1.10

2020-04-13 Thread Jiahui Jiang
Hey Godfrey, in some of the use cases our users have, they have a couple of complex join queries where the key domains key evolving - we definitely want some sort of state retention for those queries; but there are other where the key domain doesn't evolve overtime, but there isn't really a guar

Re: Question about EventTimeTrigger

2020-04-13 Thread Tzu-Li (Gordon) Tai
Hi, Could you briefly describe what you are trying to achieve? By definition, a GlobalWindow includes all data - the ending timestamp for these windows are therefore Long.MAX_VALUE. An event time trigger wouldn't make sense here, since that trigger would never fire (watermark can not pass the end

Re: Question about Writing Incremental Graph Algorithms using Apache Flink Gelly

2020-04-13 Thread Tzu-Li (Gordon) Tai
Hi, As you mentioned, Gelly Graph's are backed by Flink DataSets, and therefore work primarily on static graphs. I don't think it'll be possible to implement incremental algorithms described in your SO question. Have you tried looking at Stateful Functions, a recent new API added to Flink? It sup

Re: [Stateful Functions] Using Flink CEP

2020-04-13 Thread Tzu-Li (Gordon) Tai
Hi! It isn't possible to use Flink CEP within Stateful Functions. That could be an interesting primitive, to add CEP-based function constructs. Could your briefly describe what you are trying to achieve? On the other hand, there are plans to integrate Stateful Functions more closely with the Fli

Re: Setting different idleStateRetentionTime for different queries executed in the same TableEnvironment in Flink 1.10

2020-04-13 Thread godfrey he
Hi Jiahui, Query hint is a way for fine-grained configuration. just out of curiosity, is it a strong requirement that users need to config different IDLE_STATE_RETENTION_TIME for each operator? Best, Godfrey Jiahui Jiang 于2020年4月14日周二 上午2:07写道: > Also for some more context, we are building a

Re: New kafka producer on each checkpoint

2020-04-13 Thread Becket Qin
A slightly more common case that may cause the producer to be not reusable is when there is no data for long time, the producer won't send any request to the broker and the tansactional.id may also expire on the broker side. On Tue, Apr 14, 2020 at 8:44 AM Becket Qin wrote: > Hi Maxim, > > That

Re: New kafka producer on each checkpoint

2020-04-13 Thread Becket Qin
Hi Maxim, That is a good question. I don't see an obvious reason that we cannot reuse the producers. That said, there might be some corner cases where the KafkaProducer is not reusable. For example, if the checkpoint interval is long, the producer.id assigned by the TransactionCoordinator may have

Debug Slowness in Async Checkpointing

2020-04-13 Thread Lu Niu
Hi, Flink users We notice sometimes async checkpointing can be extremely slow, leading to checkpoint timeout. For example, For a state size around 2.5MB, it could take 7~12min in async checkpointing: [image: Screen Shot 2020-04-09 at 5.04.30 PM.png] Notice all the slowness comes from async check

Re: Checkpoint Error Because "Could not find any valid local directory for s3ablock-0001"

2020-04-13 Thread Lu Niu
Thank you both. Given the debug overhead, I might just try out presto s3 file system then. Besides that presto s3 file system doesn't support streaming sink, is there anything else I need to keep in mind? Thanks! Best Lu On Thu, Apr 9, 2020 at 12:29 AM Robert Metzger wrote: > Hey, > Others have

Re: Setting different idleStateRetentionTime for different queries executed in the same TableEnvironment in Flink 1.10

2020-04-13 Thread Jiahui Jiang
Also for some more context, we are building a framework to help users build their Flink pipeline with SQL. Our framework handles all the setup and configuration, so that users only need to write the SQL queries without having to have any Flink knowledge. One issue we encountered was, for some o

DataSet/DataStream of scala type class interface

2020-04-13 Thread Salva Alcántara
I am just experimenting with the usage of Scala type classes within Flink. I have defined the following type class interface: ```scala trait LikeEvent[T] { def timestamp(payload: T): Int } ``` Now, I want to consider a `DataSet` of `LikeEvent[_]` like this: ```scala // existing classes t

Re: [Stateful Functions] Using Flink CEP

2020-04-13 Thread Oytun Tez
I am leaning towards using Siddhi as a library, but I would really love to stick with Flink CEP, or at least the specific CEP mechanism that Flink CEP uses. Exploring the codebase of Flink CEP wasn't much promising on this end. -- [image: MotaWord] Oytun Tez M O T A W O R D | CTO & Co-Founder o

Unsubscribe

2020-04-13 Thread Simec, Nick
Unsubscribe This electronic correspondence, including any attachments, is intended solely for the use of the intended recipient(s) and may contain legally privileged, proprietary and/or confidential information. If you are not the intended recipient, please immediately notify the sender by repl

[Stateful Functions] Using Flink CEP

2020-04-13 Thread Oytun Tez
Hi there, I was wondering if I could somehow use CEP within a Function. Have you experimented with this before? Or, do you have any suggestions to do CEP within a Function? I am looking for a standalone library now. Oytun -- [image: MotaWord] Oytun Tez M O T A W O R D | CTO & Co-Founder oy.

Re: Flink incremental checkpointing - how long does data is kept in the share folder

2020-04-13 Thread Yun Tang
Hi Shachar I think you could refer to [1] to know the directory structure of checkpoints. The '_metadata' file contains all information of which checkpointed data file belongs, e.g. file paths under 'shared' folder. As I said before, you need to call Checkpoints#loadCheckpointMetadata to load

Re: New kafka producer on each checkpoint

2020-04-13 Thread Maxim Parkachov
Hi Yun, thanks for the answer. I did now increased checkpoint interval, but still I don't understand reason for creating producer and re-connecting to to kafka broker each time. According to documentation: Note: Semantic.EXACTLY_ONCE mode uses a fixed size pool of KafkaProducers per each FlinkKaf