Re: Extracting state keys for a very large RocksDB savepoint

2021-03-14 Thread Tzu-Li (Gordon) Tai
Hi Andrey, Perhaps the functionality you described is worth adding to the State Processor API. Your observation on how the library currently works is correct; basically it tries to restore the state backends as is. In you current implementation, do you see it worthwhile to try to add this? Cheer

Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint

2021-03-14 Thread Alexey Trenikhun
With 1.12.1 it happened quite often, with 1.12.2 not that match, I think I saw it once or twice for ~20 cancels, when it happened, job actually restarted on cancel, did not grab log at that time, but chances good that I will able to reproduce. Thanks, Alexey Fro

flink参数问题

2021-03-14 Thread lxk7...@163.com
大佬们,我现在flink的版本是flink 1.10,但是我通过-ynm 指定yarn上的任务名称不起作用,一直显示的是Flink per-job cluster lxk7...@163.com

Re: Evenly Spreading Out Source Tasks

2021-03-14 Thread Xintong Song
Hi Aeden, IIUC, the topic being read has 36 partitions means that your source task has a parallelism of 36. What's the parallelism of other tasks? Is the job taking use of all the 72 (18 TMs * 4 slots/TM) slots? I'm afraid currently there's no good way to guarantee subtasks of a task are spread o

Got “pyflink.util.exceptions.TableException: findAndCreateTableSource failed.” when running PyFlink example

2021-03-14 Thread Yik San Chan
(The question is cross-posted on StackOverflow https://stackoverflow.com/questions/66632765/got-pyflink-util-exceptions-tableexception-findandcreatetablesource-failed-w ) I am running below PyFlink program (copied from https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/python/table_a

Re: Python DataStream API Questions -- Java/Scala Interoperability?

2021-03-14 Thread Shuiqiang Chen
Hi Kevin, Sorry for the late reply. Actually, you are able to pass arguments to the constructor of the Java object when instancing in Python. Basic data types (char/boolean/int/long/float/double/string, etc) can be directory passed while complex types (array/list/map/POJO, etc) must be converted

Can I use PyFlink together with PyTorch/Tensorflow/PyTorch

2021-03-14 Thread Yik San Chan
Hi community, I am exploring PyFlink and I wonder if it is possible to use PyFlink together with all these ML libs that ML engineers normally use: PyTorch, Tensorflow, Scikit Learn, Xgboost, LightGBM, etc. According to this SO thread

Re: Kubernetes HA - attempting to restore from wrong (non-existing) savepoint

2021-03-14 Thread Yang Wang
If the HA related ConfigMaps still exists, then I am afraid the data located on the distributed storage should also exist. So I suggest to delete the HA related storage as well. Delete all the HA related data manually should help in your current situation. After then you could recover from the new

Re: Extracting state keys for a very large RocksDB savepoint

2021-03-14 Thread Andrey Bulgakov
If anyone is interested, I reliazed that State Processor API was not the right tool for this since it spends a lot of time rebuilding RocksDB tables and then a lot of memory trying to read from it. All I really needed was operator keys. So I used SavepointLoader.loadSavepointMetadata to get KeyGro

Re: Question about session_aggregate.merging-window-set.rocksdb_estimate-num-keys

2021-03-14 Thread Vishal Santoshi
All I can think is, that any update on a state key, which I do in my ProcessFunction, creates an update ( essentially an append on rocksdb ) which does render the previous value for the key, a tombstone , but that need not reflect on the count ( as double or triple counts ) atomically, thus the c

Re: Question about session_aggregate.merging-window-set.rocksdb_estimate-num-keys

2021-03-14 Thread Vishal Santoshi
The reason I ask is that I have a "Process Window Function" on that Session Window and I keep key scoped Global State. I maintain a TTL on that state ( that is outside the Window state ) that is roughly the current WM + lateness. I would imagine that keys for that custom state are *roughly* eq

Re: Evenly Spreading Out Source Tasks

2021-03-14 Thread Chesnay Schepler
Is this a brand-new job, with the cluster having all 18 TMs at the time of submission? (or did you add more TMs while the job was running) On 3/12/2021 5:47 PM, Aeden Jameson wrote: Hi Matthias, Yes, all the task managers have the same hardware/memory configuration. Aeden On Fri, Mar 12, 202

Question about session_aggregate.merging-window-set.rocksdb_estimate-num-keys

2021-03-14 Thread Vishal Santoshi
Hey folks, Was looking at this very specific metric "session_aggregate.merging-window-set.rocksdb_estimate-num-keys". Does this metric also represent session windows ( it is a session window ) that have lateness on them ? In essence if the session window was closed but has a lateness of a f

Re: Questions with State Processor Api

2021-03-14 Thread Maminspapin
Please, someone help me to understand is State Processor Api a solve or not for a task. I want to add to state 'Events' some target actions of user and remove them if cancel action is received. Every X period I need to check this state if it's time to make some communication with user. If yes, so

Re: Handling Bounded Sources with KafkaSource

2021-03-14 Thread Maciej Obuchowski
Hey Rion, We solved this issue by using usual, unbounded streams, and using awaitility library to express conditions that would end the test - for example, having particular data in a table. IMO this type of testing has the advantage that you won't have divergent behavior from production as you h