Re: Error on deploying Flink docker image with Kubernetes (minikube) and automatically launch a stream WordCount job.

2020-09-23 Thread Felipe Gutierrez
thanks Yang, I got to put it to work in the way that you said. https://github.com/felipegutierrez/explore-flink Best, Felipe -- -- Felipe Gutierrez -- skype: felipe.o.gutierrez -- https://felipeogutierrez.blogspot.com On Thu, Sep 24, 2020 at 6:59 AM Yang Wang wrote: > > Hi Felipe, > > Currently,

Re: How to disconnect taskmanager via rest api?

2020-09-23 Thread Yang Wang
I think this is an interesting feature, especially when deploying Flink standalone clusters on K8s. The TaskManager pods are started/stopped externally via kubectl or other tools. When we need to stop a TaskManager pod, even though the pod is deleted quickly, we have to wait for a timeout so that i

Re: metaspace out-of-memory & error while retrieving the leader gateway

2020-09-23 Thread Xintong Song
How many slots do you have on each task manager? Flink uses ChildFirstClassLoader for loading user codes, to avoid dependency conflicts between user codes and Flink's framework. Ideally, after a slot is freed and reassigned to a new job, the user class loaders of the previous job should be unloade

Re: Error on deploying Flink docker image with Kubernetes (minikube) and automatically launch a stream WordCount job.

2020-09-23 Thread Yang Wang
Hi Felipe, Currently, if you want to deploy a standalone job/application Flink cluster on K8s via yamls. You should have your own image with user jar baked located at /opt/flink/usrlib. It could not be specified via config option. Usually, you could add new layer on the official docker image to bu

Re: [DISCUSS] Move Hive document to "Table & SQL Connectors" from "Table API & SQL"

2020-09-23 Thread Jark Wu
+1 to move it there. On Thu, 24 Sep 2020 at 12:16, Jingsong Li wrote: > Hi devs and users: > > After the 1.11 release, I heard some voices recently: How can't Hive's > documents be found in the "Table & SQL Connectors". > > Actually, Hive's documents are in the "Table API & SQL". Since the "Tabl

Re: Adaptive load balancing

2020-09-23 Thread Zhijiang
Hi Krishnan, Thanks for discussing this interesting scenario! It makes me remind of a previous pending improvement of adaptive load balance for rebalance partitioner. Since the rebalance mode can emit the data to any nodes without precision consideration, then the data can be emitted based on

[DISCUSS] Move Hive document to "Table & SQL Connectors" from "Table API & SQL"

2020-09-23 Thread Jingsong Li
Hi devs and users: After the 1.11 release, I heard some voices recently: How can't Hive's documents be found in the "Table & SQL Connectors". Actually, Hive's documents are in the "Table API & SQL". Since the "Table & SQL Connectors" document was extracted separately, Hive is a little out of plac

Re: Support for gRPC in Flink StateFun 2.x

2020-09-23 Thread Dalmo Cirne
Thank you so much for creating the ticket, Igal. We are looking forward to being able to use it! And thank you for giving a little more context about how StateFun keeps a connection pool and tries to optimize for performance and throughput. With that said, gRPC is an architectural choice we hav

Reusing Flink SQL Client's Environment for Flink pipelines

2020-09-23 Thread Dan Hill
Has anyone tried to reused the Flink SQL Client's yaml Environment configuration for their production setups? It seems pr

Re: RichFunctions in Flink's Table / SQL API

2020-09-23 Thread Piyush Narang
Hi Timo, Thanks for getting back and filing the jira. I'll try to see if there's a way we can rework things to take advantage of the aggregate functions. -- Piyush On 9/23/20, 3:55 AM, "Timo Walther" wrote: Hi Piyush, unfortunately, UDFs have no direct access to Flink's state

Re: Unknown datum type java.time.Instant: 2020-09-15T07:00:00Z

2020-09-23 Thread Lian Jiang
Dawid, Thanks for the fix. I may wait for Flink 1.12 coming out at the end of Oct this year. Meanwhile, I may want to better understand the current solution at the beginning of this thread. My observations: 1. ProcessFunction with streamEnv.getConfig().enableObjectReuse() --> working 2. Process

Re: How to stop multiple Flink jobs of the same name from being created?

2020-09-23 Thread Yang Wang
Hi Dan, If you are using a K8s job to deploy the "INSERT INTO" SQL jobs into the existing Flink cluster, then you have to manage the lifecycle of these jobs by yourself. I think you could use flink command line or rest API to check the job status first. Best, Yang Dan Hill 于2020年9月23日周三 上午8:07写

Poor performance with large keys using RocksDB and MapState

2020-09-23 Thread ירון שני
Hello, I have a poor throughput issue, and I think I managed to reproduce it using the following code: val conf = new Configuration() conf.set(TaskManagerOptions.MANAGED_MEMORY_SIZE, MemorySize.ofMebiBytes(6 * 1000)) conf.set(TaskManagerOptions.TOTAL_PROCESS_MEMORY, MemorySize.ofMebiBytes(8 * 1000

[DISCUSS] ReplayableSourceStateBackend

2020-09-23 Thread Theo Diefenthal
Hi there, I just had the idea of a "ReplayableSourceStateBackend". I opened up a JIRA issue where I described the idea about it [1]. I would love to hear your feedback: Do you think it is possible to implement (I am not sure if a pipeline can be fully reconstructed from the source elements w

Re: Stateful Functions + ML model prediction

2020-09-23 Thread John Morrow
Thanks very much Igal - that sounds like a good solution! I'm new to StateFun so I'll have to dig into it a bit more, but this sounds like a good direction. Thanks again, John. From: Igal Shilman Sent: Wednesday 23 September 2020 09:06 To: John Morrow Cc: user

Fault tolerance: StickyAllocationAndLocalRecoveryTestJob

2020-09-23 Thread cokle
Hello members, I am new to the Apache Flink word and in the last month, I have been exploring the testing scenarios offered by Flink team and different books to learn Flink. Today I was trying to better understand this test that you can find it here:

Re: Flink Statefun Byte Ingress

2020-09-23 Thread Igal Shilman
Hi, For ingress, we don't look at the content at all, we put the bytes "as-is" into the Any's value field, and we set the typeUrl field with whatever was specified in the module.yaml. See here for example: https://github.com/apache/flink-statefun/blob/master/statefun-examples/statefun-python-k8s-

Re: Support for gRPC in Flink StateFun 2.x

2020-09-23 Thread Igal Shilman
Hi Dalmo, Thanks a lot for sharing this use case! If I understand the requirement correctly, you are mostly concerned with performance. In that case I've created an issue [1] to add a gRPC transport for StateFun, and I believe we would be able to implement it in the upcoming weeks. Just a side n

Re: Flink Statefun Byte Ingress

2020-09-23 Thread Timothy Bess
Hi Igal, Ah that definitely helps to know for Function -> Function invocations, but when doing Ingress via statefun how would that work? Is there a config I can set in the "module.yaml" to have it just pack arbitrary bytes into the Any? Thanks, Tim On Wed, Sep 23, 2020 at 7:01 AM Igal Shilman

Re: Stateful Functions + ML model prediction

2020-09-23 Thread Igal Shilman
Hi John, Thank you for sharing your interesting use case! Let me start from your second question: > Are stateful functions available to all Flink jobs within a cluster? Yes, the remote functions are some logic exposed behind an HTTP endpoint, and Flink would forward any message addressed to th

Re: metaspace out-of-memory & error while retrieving the leader gateway

2020-09-23 Thread Claude M
It was mentioned that this issue may be fixed in 1.10.3 but there is no 1.10.3 docker image here: https://hub.docker.com/_/flink On Wed, Sep 23, 2020 at 7:14 AM Claude M wrote: > In regards to the metaspace memory issue, I was able to get a heap dump > and the following is the output: > > Probl

Re: metaspace out-of-memory & error while retrieving the leader gateway

2020-09-23 Thread Claude M
In regards to the metaspace memory issue, I was able to get a heap dump and the following is the output: Problem Suspect 1 One instance of *"java.lang.ref.Finalizer"* loaded by *""* occupies *4,112,624 (11.67%)* bytes. The instance is referenced by *sun.misc.Cleaner @ 0xb5d6b520* , loaded by *""*

Ignoring invalid values in KafkaSerializationSchema

2020-09-23 Thread Yuval Itzchakov
Hi, I'm using a custom KafkaSerializationSchema to write records to Kafka using FlinkKafkaProducer. The objects written are Rows coming from Flink's SQL API. In some cases, when trying to convert the Row object to a byte[], serialization will fail due to malformed values. In such cases, I would l

Flink Statefun Byte Ingress

2020-09-23 Thread Igal Shilman
Hi Tim, You are correct, currently the argument to a remote function must be a Protobuf Any, however StateFun doesn't interpret the contents of that Any, and it would be passed as-is to the remote function. As you mentioned in your email you can interpret the bytes as the bytes of a JSON string.

Efficiently processing sparse events in a time windows

2020-09-23 Thread Steven Murdoch
Hello, I am trying to do something that seems like it should be quite simple but I haven’t found an efficient way to do this with Flink and I expect I’m missing something obvious here. The task is that I would like to process a sequence of events when a certain number appear within a keyed ev

Re: Better way to share large data across task managers

2020-09-23 Thread Dongwon Kim
Hi Kostas, Thanks for the input! BTW, I guess you assume that the broadcasting occurs just once for bootstrapping, huh? My job needs not only bootstrapping but also periodically fetching a new version of data from some external storage. Thanks, Dongwon > 2020. 9. 23. 오전 4:59, Kostas Kloudas 작

Re: How to disconnect taskmanager via rest api?

2020-09-23 Thread Luan Cooper
thanks I'll create a new issue for this feature on github On Mon, Sep 21, 2020 at 11:51 PM Timo Walther wrote: > Hi Luan, > > this sound more of a new feature request to me. Maybe you can already > open an issue for it. > > I will loop in Chesnay in CC if there is some possibility to achieve >

Re: RichFunctions in Flink's Table / SQL API

2020-09-23 Thread Timo Walther
Hi Piyush, unfortunately, UDFs have no direct access to Flink's state. Aggregate functions are the only type of functions that can be stateful at the moment. Aggregate functions store their state in an accumulator that is serialized/deserialized on access, but an accumulator field can be back

Flink stateful functions and Event Driven microservices

2020-09-23 Thread Mazen Ezzeddine
Hello, What are the differences between Flink stateful functions and Event driven microservices are they almost the same concept? Indeed I am aware that flink stateful functions provide out of the box functionalities like Exaclty once processing gurantees on Failure and recovery, stateful middle t