Effect of renaming a primary key

2020-10-15 Thread Rex Fenley
Hello, When two column names collide in a join, the user is forced to change the name of one of the columns for the join to be valid. However, if those columns are primary keys such as "id", won't that therefore change the key used to reference the state in RockDB and in a checkpoint for the assoc

Re: Runtime Dependency Issues Upgrading to Flink 1.11.2 from 1.9.2

2020-10-15 Thread Leonard Xu
Hi, Chesnay > @Leonared I noticed you handled a similar case on the Chinese ML in July > , do you have > any insights? The case in Chinese ML is the user added jakarta.ws.rs-api-3.0.0-M1.jar to Flink/lib which lead the dependency

Re: TaskManager takes abnormally long time to register with JobManager on Kubernetes for Flink 1.11.0

2020-10-15 Thread Yang Wang
I am afraid the InetAddress cache could not take effect. Because Kubernetes only creates A and SRV records for Services. It doesn't generate pods' A records as you may expect. Refer here[1][2] for more information. So the DNS reverse lookup will always fail. IIRC, the default timeout is 5s. This co

Table SQL and updating rows

2020-10-15 Thread Dan Hill
*Context* I'm working on a user event logging system using Flink. I'm tracking some info that belongs to the user's current session (e.g. location, device, experiment info). I don't have a v1 requirement to support mutability but I want to plan for it. I think the most likely reason for mutatio

Re: [DISCUSS] Remove flink-connector-filesystem module.

2020-10-15 Thread Kostas Kloudas
@Arvid Heise I also do not remember exactly what were all the problems. The fact that we added some more bulk formats to the streaming file sink definitely reduced the non-supported features. In addition, the latest discussion I found on the topic was [1] and the conclusion of that discussion seems

Re: Runtime Dependency Issues Upgrading to Flink 1.11.2 from 1.9.2

2020-10-15 Thread Chesnay Schepler
Can you provide us with the classpath (found at the top of the log file) for your 1.9.2/1.11.2 setups? (to see whether maybe something has changed in regards to classpath ordering) It would also be good to know what things were copied  from opt/ to lib/ or plugins/. (to see whether some packa

Re: Broadcasting control messages to a sink

2020-10-15 Thread Jaffe, Julian
Hey AJ, I’m not familiar with the stock parquet sink, but if it requires a schema on creation you won’t be able to change the output schema without restarting the job. I’m using a custom sink that can update the schema it uses. The problem I’m facing is how to communicate those updates in an ef

RE: Runtime Dependency Issues Upgrading to Flink 1.11.2 from 1.9.2

2020-10-15 Thread Hailu, Andreas
Hi Chesnay, no, we haven't changed our Hadoop version. The only changes were the update the 1.11.2 runtime dependencies listed earlier, as well as compiling with the flink-clients in some of our modules since we were relying on the transitive dependency. Our 1.9.2 jobs are still able to run jus

Re: what's the datasets used in flink sql document?

2020-10-15 Thread David Anderson
For a dockerized playground that includes a dataset, many working examples, and training slides, see [1]. [1] https://github.com/ververica/sql-training David On Thu, Oct 15, 2020 at 10:18 AM Piotr Nowojski wrote: > Hi, > > The examples in the documentation are not fully functional. They assume

akka.framesize configuration does not runtime execution

2020-10-15 Thread Yuval Itzchakov
Hi, Due to a very large savepoint metadata file (3GB +), I've set the akka.framesize that is being required to 5GB. I set this via flink.conf `akka.framesize` property. When trying to recover from the savepoint, the JM emits the following error message: "thread":"flink-akka.actor.default-dispatc

Re: TaskManager takes abnormally long time to register with JobManager on Kubernetes for Flink 1.11.0

2020-10-15 Thread Chesnay Schepler
The InetAddress caches the result of getCanonicalHostName(), so it is not a problem to call it twice. On 10/15/2020 1:57 PM, Till Rohrmann wrote: Hi Weike, thanks for getting back to us with your findings. Looking at the `TaskManagerLocation`, we are actually calling `InetAddress.getCanonica

Re: Correct way to handle RedisSink exception

2020-10-15 Thread Manas Kale
Hi all, Thank you for the help, I understand now. On Thu, Oct 15, 2020 at 5:28 PM 阮树斌 浙江大学 wrote: > hello, Manas Kale. > > From the log, it can be found that the exception was thrown on the > 'open()' method of the RedisSink class. You can inherit the RedisSink > class, then override the 'open()

Re:Correct way to handle RedisSink exception

2020-10-15 Thread 阮树斌 浙江大学
hello, Manas Kale. From the log, it can be found that the exception was thrown on the 'open()' method of the RedisSink class. You can inherit the RedisSink class, then override the 'open()' method, and handle the exception as you wish.Or no longer use Apache Bahir[1] Flink redis connector clas

Re: TaskManager takes abnormally long time to register with JobManager on Kubernetes for Flink 1.11.0

2020-10-15 Thread Till Rohrmann
Hi Weike, thanks for getting back to us with your findings. Looking at the `TaskManagerLocation`, we are actually calling `InetAddress.getCanonicalHostName` twice for every creation of a `TaskManagerLocation` instance. This does not look right. I think it should be fine to make the look up config

Re: StatsD metric name prefix change for task manager after upgrading to Flink 1.11

2020-10-15 Thread Chesnay Schepler
The TaskExecutor host being exposed is directly wired to what the RPC system for addresses, which may have changed due to (FLINK-15911; NAT support). If the problem is purely about the periods in the IP, then I would suggest to create a custom reporter that extends the StatsDReporter and over

Re: Correct way to handle RedisSink exception

2020-10-15 Thread Chesnay Schepler
You will have to create a custom version of the redis connector that ignores such exceptions. On 10/15/2020 1:27 PM, Manas Kale wrote: Hi, I have a streaming application that pushes output to a redis cluster sink. I am using the Apache Bahir[1] Flink redis connector for this. I want to handle

Re: Runtime Dependency Issues Upgrading to Flink 1.11.2 from 1.9.2

2020-10-15 Thread Chesnay Schepler
I'm not aware of any Flink module bundling this class. Note that this class is also bundled in jersey-core (which is also on your classpath), so it appears that there is a conflict between this jar and your shaded one. Have you changed the Hadoop version you are using or how you provide them to

Correct way to handle RedisSink exception

2020-10-15 Thread Manas Kale
Hi, I have a streaming application that pushes output to a redis cluster sink. I am using the Apache Bahir[1] Flink redis connector for this. I want to handle the case when the redis server is unavailable. I am following the same pattern as outlined by them in [1]: FlinkJedisPoolConfig conf = new

Re: Broadcasting control messages to a sink

2020-10-15 Thread aj
Hi Jaffe, I am also working on something similar type of a problem. I am receiving a set of events in Avro format on different topics. I want to consume these and write to s3 in parquet format. I have written a job that creates a different stream for each event and fetches its schema from the co

Re: TaskManager takes abnormally long time to register with JobManager on Kubernetes for Flink 1.11.0

2020-10-15 Thread DONG, Weike
Hi Till and community, By the way, initially I resolved the IPs several times but results returned rather quickly (less than 1ms, possibly due to DNS cache on the server), so I thought it might not be the DNS issue. However, after debugging and logging, it is found that the lookup time exhibited

Re: TaskManager takes abnormally long time to register with JobManager on Kubernetes for Flink 1.11.0

2020-10-15 Thread DONG, Weike
Hi Till and community, Increasing `kubernetes.jobmanager.cpu` in the configuration makes this issue alleviated but not disappeared. After adding DEBUG logs to the internals of *flink-runtime*, we have found the culprit is inetAddress.getCanonicalHostName() in *org.apache.flink.runtime.taskmanag

Re: what's the datasets used in flink sql document?

2020-10-15 Thread Piotr Nowojski
Hi, The examples in the documentation are not fully functional. They assume (like in this case), that you would have an already predefined table orders, with the required fields. As I mentioned before, there are working examples available and you can also read the documentation on how to register

Streaming File Sink cannot generate _SUCCESS tag files

2020-10-15 Thread highfei2011
Hi, everyone! Currently experiencing a problem with the bucketing policy sink to hdfs using BucketAssigner of Streaming File Sink after consuming Kafka data with FLink -1.11.2, the _SUCCESS tag file is not generated by default. I have added the following to the configuration val ha