Hi Juan, I think the recommendations in the Spark guide are quite good, and are similar to what I would recommend for Flink as well. Depending on the workloads you are interested to run, you can certainly use Flink with less than 8 GB per machine. I think you can start Flink TaskManagers with 500 MB of heap space and they'll still be able to process some GB of data.
Everything above 2 GB is probably good enough for some initial experimentation (again depending on your workloads, network, disk speed etc.) On Wed, Sep 2, 2015 at 2:30 PM, Kostas Tzoumas <ktzou...@apache.org> wrote: > Hi Juan, > > Flink is quite nimble with hardware requirements; people have run it in > old-ish laptops and also the largest instances available in cloud > providers. I will let others chime in with more details. > > I am not aware of something along the lines of a cheatsheet that you > mention. If you actually try to do this, I would love to see it, and it > might be useful to others as well. Both use similar abstractions at the API > level (i.e., parallel collections), so if you stay true to the functional > paradigm and not try to "abuse" the system by exploiting knowledge of its > internals things should be straightforward. These apply to the batch APIs; > the streaming API in Flink follows a true streaming paradigm, where you get > an unbounded stream of records and operators on these streams. > > Funny that you ask about a video for the DataStream slides. There is a > Flink training happening as we speak, and a video is being recorded right > now :-) Hopefully it will be made available soon. > > Best, > Kostas > > > On Wed, Sep 2, 2015 at 1:13 PM, Juan Rodríguez Hortalá < > juan.rodriguez.hort...@gmail.com> wrote: > >> Answering to myself, I have found some nice training material at >> http://dataartisans.github.io/flink-training. There are even videos at >> youtube for some of the slides >> >> - http://dataartisans.github.io/flink-training/overview/intro.html >> https://www.youtube.com/watch?v=XgC6c4Wiqvs >> >> - http://dataartisans.github.io/flink-training/dataSetBasics/intro.html >> https://www.youtube.com/watch?v=0EARqW15dDk >> >> The third lecture >> http://dataartisans.github.io/flink-training/dataSetAdvanced/intro.html >> more or less corresponds to https://www.youtube.com/watch?v=1yWKZ26NQeU >> but not exactly, and there are more lessons at >> http://dataartisans.github.io/flink-training, for stream processing and >> the table API for which I haven't found a video. Does anyone have pointers >> to the missing videos? >> >> Greetings, >> >> Juan >> >> 2015-09-02 12:50 GMT+02:00 Juan Rodríguez Hortalá < >> juan.rodriguez.hort...@gmail.com>: >> >>> Hi list, >>> >>> I'm new to Flink, and I find this project very interesting. I have >>> experience with Apache Spark, and for I've seen so far I find that Flink >>> provides an API at a similar abstraction level but based on single record >>> processing instead of batch processing. I've read in Quora that Flink >>> extends stream processing to batch processing, while Spark extends batch >>> processing to streaming. Therefore I find Flink specially attractive for >>> low latency stream processing. Anyway, I would appreciate if someone could >>> give some indication about where I could find a list of hardware >>> requirements for the slave nodes in a Flink cluster. Something along the >>> lines of https://spark.apache.org/docs/latest/hardware-provisioning.html. >>> Spark is known for having quite high minimal memory requirements (8GB RAM >>> and 8 cores minimum), and I was wondering if it is also the case for Flink. >>> Lower memory requirements would be very interesting for building small >>> Flink clusters for educational purposes, or for small projects. >>> >>> Apart from that, I wonder if there is some blog post by the comunity >>> about transitioning from Spark to Flink. I think it could be interesting, >>> as there are some similarities in the APIs, but also deep differences in >>> the underlying approaches. I was thinking in something like Breeze's >>> cheatsheet comparing its matrix operatations with those available in Matlab >>> and Numpy >>> https://github.com/scalanlp/breeze/wiki/Linear-Algebra-Cheat-Sheet, or >>> like http://rosettacode.org/wiki/Factorial. Just an idea anyway. Also, >>> any pointer to some online course, book or training for Flink besides the >>> official programming guides would be much appreciated >>> >>> Thanks in advance for help >>> >>> Greetings, >>> >>> Juan >>> >>> >> >