[PyFlink] register udf functions with different versions of the same library in the same job

2020-10-09 Thread Sharipov, Rinat
Hi mates ! I've just read an amazing article about PyFlink and I'm absolutely delighted. I got some questions about udf registration, and it seems that it's possible to specify the list of libraries that

[PyFlink] update udf functions on the fly

2020-10-10 Thread Sharipov, Rinat
Hi mates ! I'm in the beginning of the road of building a recommendation pipeline on top of Flink. I'm going to register a list of UDF python functions on job startups where each UDF is an ML model. Over time new model versions appear in the ML registry and I would like to update my UDF functions

Re: [PyFlink] register udf functions with different versions of the same library in the same job

2020-10-11 Thread Sharipov, Rinat
ry or what would be even better with different > python environments and they won't clash > > A PyFlink job All nodes use the same python environment path currently. So > there is no way to make each UDF use a different python execution > environment. Maybe you need to use multiple

Re: [PyFlink] update udf functions on the fly

2020-10-12 Thread Sharipov, Rinat
tream API, the common way to > simulate side inputs (which is what you need) is to use a broadcast. There > is an example on SO [1]. > > [1] > https://stackoverflow.com/questions/54667508/how-to-unit-test-broadcastprocessfunction-in-flink-when-processelement-depends-o > > On Sa

Re: [PyFlink] update udf functions on the fly

2020-10-12 Thread Sharipov, Rinat
rding to some custom strategy. It's the behavior of the > UDF. > > Regards, > Dian > > 在 2020年10月12日,下午5:51,Sharipov, Rinat 写道: > > Hi Arvid, thx for your reply. > > We are already using the approach with control streams to propagate > business rules through o

PyFlink 1.11.2 couldn’t configure [taskmanager.memory.task.off-heap.size] property when registering custom UDF function

2020-10-12 Thread Sharipov, Rinat
Hi mates ! I'm very new at pyflink and trying to register a custom UDF function using python API. Currently I faced an issue in both server env and my local IDE environment. When I'm trying to execute the example below I got an error message: *The configured Task Off-Heap Memory 0 bytes is less t

Re: PyFlink 1.11.2 couldn’t configure [taskmanager.memory.task.off-heap.size] property when registering custom UDF function

2020-10-12 Thread Sharipov, Rinat
; > You can use api to set configuration: > table_env.get_config().get_configuration().set_string("taskmanager.memory.task.off-heap.size", > '80m') > > The flink-conf.yaml way will only take effect when submitted through flink > run, and the minicluster way(pyth

PyFlink :: Bootstrap UDF function

2020-10-14 Thread Sharipov, Rinat
Hi mates ! I keep moving in my research of new features of PyFlink and I'm really excited about that functionality. My main goal is to understand how to integrate our ML registry, powered by ML Flow and PyFlink jobs and what restrictions we have. I need to bootstrap the UDF function on it's start

Re: PyFlink :: Bootstrap UDF function

2020-10-14 Thread Sharipov, Rinat
alarFunction and > you could also override them and perform the initialization work in the > open method. > > Regards, > Dian > > 在 2020年10月15日,上午3:19,Sharipov, Rinat 写道: > > Hi mates ! > > I keep moving in my research of new features of PyFlink and I'

[Flink::Test] access registered accumulators via harness

2020-10-27 Thread Sharipov, Rinat
Hi mates ! I guess that I'm doing something wrong, but I couldn't find a way to access registered accumulators and their values via *org.apache.flink.streaming.util.**ProcessFunctionTestHarness *function wrapper that I'm using to test my functions. During the code research I've found, that requir

Re: [Flink::Test] access registered accumulators via harness

2020-10-30 Thread Sharipov, Rinat
// do the processing > } > > } > > That way you can easily test your MyLogic class including interactions > with the counter, by passing a mock counter. > > Best, > > Dawid > > > [1] > https://ci.apache.org/projects/flink/flin

“feedback loop” and checkpoints in itearative streams

2020-04-04 Thread Sharipov, Rinat
Hi mates, for some reason, it's necessary to create a feedback look in my streaming application. The best choice to implement it was iterative stream, but at the moment of job implementation (flink version is 1.6.1) it wasn't checkpointed. So I decided to write this output into kafka. As I see, t

write into parquet with variable number of columns

2021-08-05 Thread Sharipov, Rinat
leEnvironmentImpl.java:334)* Maybe someone has tried this feature and can guess what's wrong with the current code and how to make it work. Anyway I have a fallback - accumulate a butch of events, define the schema for them and write into file system manually, but I still hope that I can do this in more elegant way. Thx for your advice and time ! -- Best Regards, *Sharipov Rinat* CleverDATA make your data clever