Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2020-10-13 Thread Yun Gao
Hi Till, Very thanks for the feedbacks ! > 1) When restarting all tasks independent of the status at checkpoint time > (finished, running, scheduled), we might allocate more resources than we > actually need to run the remaining job. From a scheduling perspective it > would be easier if we alrea

Re: Rocksdb - Incremental vs full checkpoints

2020-10-13 Thread Yun Tang
Hi This difference of data size of incremental vs full checkpoint is due to the different implementations. The incremental checkpoint strategy upload binary sst files while full checkpoint strategy scans the DB and write all kv entries to external DFS. As your state size is really small (only 2

Re: why this pyflink code has no output?

2020-10-13 Thread Xingbo Huang
Hi, Which version of pyflink are you using? I think the api you are using is not the pyflink since flink 1.9. For detailed usage of pyflink, you can refer to doc[1] [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/python/table_api_tutorial.html Best, Xingbo 大森林 于2020年10月13日周

Upgrade to Flink 1.11 in EMR 5.31 Command line interface

2020-10-13 Thread Vijayendra Yadav
Hi Team, I have upgraded to Flink 1.11 (EMR 5.31) from Flink 1.10 (5.30.1). I am facing following Error while running *flink streaming *Job from command line. run command like:*/usr/lib/flink/bin/flink run* *What dependency I might be missing or conflicting ?*

Re: Stateful function and large state applications

2020-10-13 Thread Tzu-Li (Gordon) Tai
Hi, The StateFun runtime is built directly on top of Apache Flink, so RocksDB as the state backend is supported as well as all the features for large state such as checkpointing and local task recovery. Cheers, Gordon On Wed, Oct 14, 2020 at 11:49 AM Lian Jiang wrote: > Hi, > > I am learning

Stateful function and large state applications

2020-10-13 Thread Lian Jiang
Hi, I am learning Stateful function and saw below: "In addition to the Apache Flink processes, a full deployment requires ZooKeeper (for master failover ) and bulk storage

Rocksdb - Incremental vs full checkpoints

2020-10-13 Thread sudranga
Hi, I have an event-window pipeline which handles a fixed number of messages per second for a fixed number of keys. When i have rocksdb as the state backend with incremental checkpoints, i see the delta checkpoint size constantly increase. Please see

Required context properties mismatch in connecting the flink with mysql database

2020-10-13 Thread xi sizhe
I am using flink latest (1.11.2) to work with a sample mysql database, which the database is working fine. Additionally, i have added the flink-connector-jdbc_2.11-1.11.2, mysql-connector-java-8.0.21.jar, postgresql-42.2.17.jar to the {FLINK}/lib Here is my code T_CONFIG = TableConfig() B_EXEC_E

Re: Externally load data into RocksDB backend

2020-10-13 Thread Daksh Talwar
I believe you're trying to bootstrap state for an operator. If yes, then it might be worthwhile to check out State Processor API

Re: NPE when checkpointing

2020-10-13 Thread Binh Nguyen Van
Hi, Sorry for the late reply. It took me quite a while to change the JDK version to reproduce the issue. I confirmed that if I upgrade to a newer JDK version (I tried with JDK 1.8.0_265) the issue doesn’t happen. Thank you for helping -Binh On Fri, Oct 9, 2020 at 11:36 AM Piotr Nowojski wrote:

Re: Is MapState tied to Operator Input & Output type?

2020-10-13 Thread Yun Tang
Hi Arpith I'm afraid that you're totally talking about the wrong thing in previous thread. The root cause is not restoring state from checkpoint but not access the state legally. Have you ever add keyBy before process your function as doc's note [1] said: "If you want to access keyed state and

Re: Externally load data into RocksDB backend

2020-10-13 Thread Akshay Aggarwal
Hi Arpith, You should look into the State Processor API [1], it can be used to bootstrap state, and also for reading and modifying existing state. [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/state_processor_api.html#writing-new-savepoints Thanks, Akshay Aggarwal On Tue,

Re: Flink Kafka offsets

2020-10-13 Thread Rex Fenley
Thanks for the explanation, this was all super helpful. On Tue, Oct 13, 2020 at 2:16 AM Dawid Wysakowicz wrote: > Hey Rex, > > I agree the documentation might be slightly misleading. To get the full > picture of that configuration I'd suggest having a look at the DataStream > Kafka connector pag

Re: Dynamic file name prefix - StreamingFileSink

2020-10-13 Thread Vijayendra Yadav
*Thanks Ravi. I got following Error:* [ERROR] DynamicOutputFileConfig.scala:21: error: method getPartPrefix overrides nothing [ERROR] override def getPartPrefix: String = if(partPrefixFunction == null) partPrefix else partPrefixFunction.apply() [ERROR]^ [ERROR] DynamicOutputFileCo

Externally load data into RocksDB backend

2020-10-13 Thread Arpith P
Hi, Is it possible to load data to RocksDB backend externally(outside Flink ) using common dbPath, so that it will be available to MapState inside ProcessFunction. I've external data available in JSON format which I want to load to RocksDb. One option is to make Stream from the JSON path and proc

Re: Is MapState tied to Operator Input & Output type?

2020-10-13 Thread Arpith P
Hi Yun, Neither state descriptor type or name changed. I did assign an ID as well but it didn't help me. What I'm trying to do is I have two stream A & B which I want to connect/process in C; I eventually want values from stream A to be saved in C's MapState. What I've tried is I used ConnectedStr

why this pyflink code has no output?

2020-10-13 Thread ??????
My code is: https://paste.ubuntu.com/p/KqpKwTw5zH/ My step is: $FLINK_HOME/bin/flink run -py /wordcount.py the sentence"data.output()"has no output. where am I wrong in above code? Thanks for your help

Re: Is MapState tied to Operator Input & Output type?

2020-10-13 Thread Yun Tang
Hi The type of map state is not directly related with input & output type, this is only related with how you define the state descriptor. * Have you ever changed the state descriptor after changing the type of input/output type? * Have you assigned the id [1] to the operator which using

Is MapState tied to Operator Input & Output type?

2020-10-13 Thread Arpith P
Hi, I’ve a *ProcessFunction* which initially was receiving input & output type of String (1) & inside *processElement* I was updating MapState. Now I have changed the Input & Output type to be Map, String (2), but if I restore from the last checkpoint folder MapState is coming in as empty. I’ve c

Re: TaskManager takes abnormally long time to register with JobManager on Kubernetes for Flink 1.11.0

2020-10-13 Thread Till Rohrmann
Hi Weike, could you try setting kubernetes.jobmanager.cpu: 4 in your flink-conf.yaml? I fear that a single CPU is too low for the JobManager component. Cheers, Till On Tue, Oct 13, 2020 at 11:33 AM Till Rohrmann wrote: > Hi Weike, > > thanks for posting the logs. I will take a look at them. My

Re: Additional options to S3 Filesystem: Interest?

2020-10-13 Thread Padarn Wilson
Great. Thanks. On Tue, Oct 13, 2020 at 4:29 PM Arvid Heise wrote: > Hi Padarn, > > I assigned the ticket to you, so you can start working on it. Here are > some contribution guidelines [1] in case it's your first contribution. > > Basically, you will need to open a PR which contains the ticket a

Re: Flink is failing for all Jobs if one job gets failed

2020-10-13 Thread Dawid Wysakowicz
Hi, As far as I understand it, it is not a Flink problem. It's your code that is failling to compile the code it gets. It's also quite hard to actually figure out how it is used from within Flink. Best, Dawid On 13/10/2020 10:42, saksham sapra wrote: > > I am working on flink local, i have crea

Re: TaskManager takes abnormally long time to register with JobManager on Kubernetes for Flink 1.11.0

2020-10-13 Thread Till Rohrmann
Hi Weike, thanks for posting the logs. I will take a look at them. My suspicion would be that there is some operation blocking the JobMaster's main thread which causes the registrations from the TMs to time out. Maybe the logs allow me to validate/falsify this suspicion. Cheers, Till On Mon, Oct

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2020-10-13 Thread Till Rohrmann
Thanks for starting this discussion Yun Gao, I have three comments/questions: 1) When restarting all tasks independent of the status at checkpoint time (finished, running, scheduled), we might allocate more resources than we actually need to run the remaining job. From a scheduling perspective it

Re: [DISCUSS] Remove flink-connector-filesystem module.

2020-10-13 Thread David Anderson
I think the pertinent question is whether there are interesting cases where the BucketingSink is still a better choice. One case I'm not sure about is the situation described in docs for the StreamingFileSink under Important Note 2 [1]: ... upon normal termination of a job, the last in-progres

Re: Flink Kafka offsets

2020-10-13 Thread Dawid Wysakowicz
Hey Rex, I agree the documentation might be slightly misleading. To get the full picture of that configuration I'd suggest having a look at the DataStream Kafka connector page[1]. The Table connector is just a wrapper around the DataStream one. Let me also try to clarify it a bit more. In case of

Re: [DISCUSS] Remove flink-connector-filesystem module.

2020-10-13 Thread Konstantin Knauf
Given that it has been deprecated for three releases now, I am +1 to dropping it. On Mon, Oct 12, 2020 at 9:38 PM Chesnay Schepler wrote: > Is there a way for us to change the module (in a reasonable way) that > would allow users to continue using it? > Is it an API problem, or one of semantics?

Flink is failing for all Jobs if one job gets failed

2020-10-13 Thread saksham sapra
I am working on flink local, i have created one task manager which pushes the request to flink. So if one job gets failed for some function error, then other jobs which were running correctly before the error came for one file , the new jobs fail automatically if configuration for that file is dif

Re: Additional options to S3 Filesystem: Interest?

2020-10-13 Thread Arvid Heise
Hi Padarn, I assigned the ticket to you, so you can start working on it. Here are some contribution guidelines [1] in case it's your first contribution. Basically, you will need to open a PR which contains the ticket and component. So the prefix should be "[FLINK-19589][s3]" (also for your commit

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2020-10-13 Thread Yun Gao
Hi Arvid, Very thanks for the comments! >>> 4) Yes, the interaction is not trivial and also I have not completely >>> thought it through. But in general, I'm currently at the point where I >>> think that we also need non-checkpoint related events in unaligned >>> checkpoints. So just keep that i

Re: Dynamic file name prefix - StreamingFileSink

2020-10-13 Thread Ravi Bhushan Ratnakar
Hi Vijayendra, OutputFileConfig provides a builder method to create immutable objects with given 'prefix' and 'suffix'. The parameter which you are passing to '*withPartPrefix*' will only be evaluated at the time of calling this method '*withPartPrefix*'. So if you want to achieve a dynamic 'prefi

Re: PyFlink 1.11.2 couldn’t configure [taskmanager.memory.task.off-heap.size] property when registering custom UDF function

2020-10-13 Thread Xingbo Huang
Hi, >From my point of view, pyflink-shell only provides an interactive tool. Below it, you can choose whether to run the job in minicluster(similar to python xx.py) or submit it to the cluster through flink run. For python xxx.py, it is reasonable to not load the config of flink-conf.yaml. What d

回复: why we need keyed state and operate state when we already have checkpoint?

2020-10-13 Thread 大森林
So state: store the result of some operator(such as keyby,map) Checkpoint: store the last result when the program is running OK. Am I right? Thanks for your help~! -- 原始邮件 -- 发件人: