Is there a way to avoid submit hive-udf's resources when we submit a job?

2020-09-18 Thread Husky Zeng
When we submit a job which use udf of hive , the job will dependent on udf's jars and configuration files. We have already store udf's jars and configuration files in hive metadata store,so we excpet that flink could get those files hdfs paths by hive-connector,and get those files in hdfs by paths

Flink Table SQL and writing nested Avro files

2020-09-18 Thread Dan Hill
Hi! I want to join two tables and write the results to Avro where the left and right rows are nested in the avro output. Is it possible to do this with the SQL interface? Thanks! - Dan CREATE TABLE `flat_avro` ( `left` ROW, `right` ROW ) WITH ( 'connector' = 'filesystem', 'path' =

On Flink job re-submission, observing error, "the rpc invocation size exceeds the maximum akka framesize"

2020-09-18 Thread shravan
Hi, This is in continuation to an already raised request, (had replied to the same thread but couldn't get any response yet, hence posting a new request) http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/The-rpc-invocation-size-exceeds-the-maximum-akka-framesize-when-the-job-was-

Re: Automatically restore from checkpoint

2020-09-18 Thread David Anderson
If your job crashes, Flink will automatically restart from the latest checkpoint, without any manual intervention. JobManager HA is only needed for automatic recovery after the failure of the Job Manager. You only need externalized checkpoints and "-s :checkpointPath" if you want to use checkpoint

valuestate(descriptor) using a custom caseclass

2020-09-18 Thread Martin Frank Hansen
Hi, I am trying to calculate some different metrics using the state backend to control if events have been seen before. I am using the ProcessWindowFunction, but nothing gets through, it is as if the .process-function is ignored. Is it not possible to store a custom case class as ValueState? Or do

Re: On Flink job re-submission, observing error, "the rpc invocation size exceeds the maximum akka framesize"

2020-09-18 Thread Chesnay Schepler
how can we know the expected size for which it is failing? If you did not configure akka.framesize yourself then it is set to the documented default value. See the configuration documentation for the release you are using. > Does the operator state have any impact on the expected Akka frame

Re: Re: [ANNOUNCE] Apache Flink 1.11.2 released

2020-09-18 Thread Yun Gao
Great! Very thanks @ZhuZhu for driving this and thanks for all contributed to the release! Best, Yun --Original Mail -- Sender:Jingsong Li Send Date:Thu Sep 17 13:31:41 2020 Recipients:user-zh CC:dev , user , Apache Announce List Subject:Re: [ANNOUNCE] Apac

Re: Re: [ANNOUNCE] Apache Flink 1.11.2 released

2020-09-18 Thread Guowei Ma
Thanks Zhuzhu for driving the release!!! Best, Guowei On Fri, Sep 18, 2020 at 5:10 PM Yun Gao wrote: > Great! Very thanks @ZhuZhu for driving this and thanks for all contributed > to the release! > > Best, > Yun > > --Original Mail -- > *Sender:*Jingsong Li >

Error on deploying Flink docker image with Kubernetes (minikube) and automatically launch a stream WordCount job.

2020-09-18 Thread Felipe Gutierrez
Hi community, I am trying to deploy the default WordCount stream application on minikube using the official documentation at [1]. I am using minikube v1.13.0 on Ubuntu 18.04 and Kubernetes v1.19.0 on Docker 19.03.8. I could sucessfully start 1 job manager and 3 task managers using the yaml files f

Re: On Flink job re-submission, observing error, "the rpc invocation size exceeds the maximum akka framesize"

2020-09-18 Thread shravan
Thanks for the quick response. I might have wrongly phrased one of the questions. /"> how can we know the expected size for which it is failing? If you did not configure akka.framesize yourself then it is set to the documented default value. See the configuration documentation for the release y

Re: On Flink job re-submission, observing error, "the rpc invocation size exceeds the maximum akka framesize"

2020-09-18 Thread Chesnay Schepler
If you use 1.10.0 or above the framesize for which it failed is part of the exception message, see FLINK-14618. If you are using older version, then I'm afraid there is no way to tell. On 9/18/2020 12:11 PM, shravan wrote: Thanks for the quick response. I might have wrongly phrased one of the

Re: valuestate(descriptor) using a custom caseclass

2020-09-18 Thread Dawid Wysakowicz
Hi Martin, I am not sure what is the exact problem. Is it that the ProcessFunction is not invoked or is the problem with values in your state? As for the question of the case class and ValueState. The best way to do it, is to provide the TypeInformation explicitly. If you do not provide the TypeI

Re: On Flink job re-submission, observing error, "the rpc invocation size exceeds the maximum akka framesize"

2020-09-18 Thread shravan
Thanks again for the quick response. In that case, could you tell me what are the possible factors that warrant a framesize increase? I see the official documentation and it simply states "If Flink fails because messages exceed this limit, then you should increase it", which isn't very convincing.

Re: [DISCUSS] Deprecate and remove UnionList OperatorState

2020-09-18 Thread Aljoscha Krettek
On 14.09.20 02:20, Steven Wu wrote: Right now, we use UnionState to store the `nextCheckpointId` in the Iceberg sink use case, because we can't retrieve the checkpointId from the FunctionInitializationContext during the restore case. But we can move away from it if the restore context provides th

Re: On Flink job re-submission, observing error, "the rpc invocation size exceeds the maximum akka framesize"

2020-09-18 Thread Chesnay Schepler
There are quite a few reason why the framesize could be exceeded. The most common one we see is due to the parallelism being so high that tasks can't be deployed in the first place. When a task is deployed the RPC payload also contains information about all downstream tasks this task sends dat

Re: valuestate(descriptor) using a custom caseclass

2020-09-18 Thread Martin Frank Hansen
Hi Dawid, Thanks for your reply, much appreciated. I tried using your implementation for TypeInformation, but still nothing gets through. There are no errors either, but it simply runs without sending data to the sink. I have checked that there is data in the input topic, and I have used the code

Re: valuestate(descriptor) using a custom caseclass

2020-09-18 Thread Martin Frank Hansen
Another note, the case class in hand has about 40 fields in it, is there a maximum limit for the number of fields? best regards Den fre. 18. sep. 2020 kl. 13.05 skrev Martin Frank Hansen < m...@berlingskemedia.dk>: > Hi Dawid, > > Thanks for your reply, much appreciated. > > I tried using your i

Re: Disable WAL in RocksDB recovery

2020-09-18 Thread Yu Li
Thanks for bringing this up Juha, and good catch. We actually are disabling WAL for routine writes by default when using RocksDB and never encountered segment fault issues. However, from history in FLINK-8922, segment fault issue occurs during restore if WAL is disabled, so I guess the root cause

Re: Automatically restore from checkpoint

2020-09-18 Thread Arpith P
Thanks David, in case of manual restart; to get checkpoint path programmatically I'm using the following code to retrieve JobId and CheckpointID so i could pass along while restarting with "-s" but seems I'm missing something as I'm getting empty TimestampedFileSplit array. GlobFilePathFilter file

metaspace out-of-memory & error while retrieving the leader gateway

2020-09-18 Thread Claude M
Hello, I upgraded from Flink 1.7.2 to 1.10.2. One of the jobs running on the task managers is periodically crashing w/ the following error: java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error has occurred. This can mean two things: either the job requires a larger size of JV

Re: Maximum query and refresh rate for metrics from REST API

2020-09-18 Thread Piper Piper
Thank you, Chesnay! On Thu, Sep 17, 2020, 3:59 AM Chesnay Schepler wrote: > By default metrics are only updated every 10 seconds; this can be > controlled via > > https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/config.html#metrics-fetcher-update-interval > . > > On 9/17/2020 12:

Flink SQL - can I have multiple outputs per job?

2020-09-18 Thread Dan Hill
I have a few results that I want to produce. - A join B - A join B join C - A join B join C join D - A join B join C join D join E When I use the DataSet API directly, I can execute all of these in the same job to reduce redundancy. When I use the SQL interface, it looks like separate jobs are cr