hello, For my master thesis i am comparing ml frameworks on data streams.
What is the current status on FlinkML? Is distributed learning possible on multiple nodes? If yes, how? I played around with FlinkML a bit and modeled a simple pipeline for sentiment analysis on tweets. For this I used the Sentiment 140 dataset which contains 1.6 million tweets. Unfortunately I can only use a small amount of data (about 30000 samples) for training, otherwise Taskmanager gets lost or crashes. I have also allocated enough memory to taskmanager (JVM heap size is set to 50gb). But training should also work with more data, right?I have also allocated enough memory to taskmanager (JVM heap size is set to 50gb). I have also allocated enough memory to taskmanager (JVM heap size is set to 50gb). greetings