Re: Hardware requirements and learning resources

Jay Vyas Wed, 02 Sep 2015 06:01:52 -0700

Just running the main class is sufficient


> On Sep 2, 2015, at 8:59 AM, Robert Metzger <[email protected]> wrote:
> 
> Hey jay,
> 
> How can I reproduce the error?
> 
>> On Wed, Sep 2, 2015 at 2:56 PM, jay vyas <[email protected]> wrote:
>> We're also working on a bigpetstore implementation of flink which will help 
>> onboard spark/mapreduce folks.
>> 
>> I have prototypical code here that runs a simple job in memory, 
>> contributions welcome,
>> 
>> right now there is a serialization error 
>> https://github.com/bigpetstore/bigpetstore-flink .
>> 
>>> On Wed, Sep 2, 2015 at 8:50 AM, Robert Metzger <[email protected]> wrote:
>>> Hi Juan,
>>> 
>>> I think the recommendations in the Spark guide are quite good, and are 
>>> similar to what I would recommend for Flink as well. 
>>> Depending on the workloads you are interested to run, you can certainly use 
>>> Flink with less than 8 GB per machine. I think you can start Flink 
>>> TaskManagers with 500 MB of heap space and they'll still be able to process 
>>> some GB of data.
>>> 
>>> Everything above 2 GB is probably good enough for some initial 
>>> experimentation (again depending on your workloads, network, disk speed 
>>> etc.)
>>> 
>>> 
>>> 
>>> 
>>>> On Wed, Sep 2, 2015 at 2:30 PM, Kostas Tzoumas <[email protected]> wrote:
>>>> Hi Juan,
>>>> 
>>>> Flink is quite nimble with hardware requirements; people have run it in 
>>>> old-ish laptops and also the largest instances available in cloud 
>>>> providers. I will let others chime in with more details.
>>>> 
>>>> I am not aware of something along the lines of a cheatsheet that you 
>>>> mention. If you actually try to do this, I would love to see it, and it 
>>>> might be useful to others as well. Both use similar abstractions at the 
>>>> API level (i.e., parallel collections), so if you stay true to the 
>>>> functional paradigm and not try to "abuse" the system by exploiting 
>>>> knowledge of its internals things should be straightforward. These apply 
>>>> to the batch APIs; the streaming API in Flink follows a true streaming 
>>>> paradigm, where you get an unbounded stream of records and operators on 
>>>> these streams.
>>>> 
>>>> Funny that you ask about a video for the DataStream slides. There is a 
>>>> Flink training happening as we speak, and a video is being recorded right 
>>>> now :-) Hopefully it will be made available soon.
>>>> 
>>>> Best,
>>>> Kostas
>>>> 
>>>> 
>>>>> On Wed, Sep 2, 2015 at 1:13 PM, Juan Rodríguez Hortalá 
>>>>> <[email protected]> wrote:
>>>>> Answering to myself, I have found some nice training material at 
>>>>> http://dataartisans.github.io/flink-training. There are even videos at 
>>>>> youtube for some of the slides
>>>>> 
>>>>>   - http://dataartisans.github.io/flink-training/overview/intro.html
>>>>>     https://www.youtube.com/watch?v=XgC6c4Wiqvs
>>>>> 
>>>>>   - http://dataartisans.github.io/flink-training/dataSetBasics/intro.html
>>>>>     https://www.youtube.com/watch?v=0EARqW15dDk
>>>>> 
>>>>> The third lecture 
>>>>> http://dataartisans.github.io/flink-training/dataSetAdvanced/intro.html 
>>>>> more or less corresponds to https://www.youtube.com/watch?v=1yWKZ26NQeU 
>>>>> but not exactly, and there are more lessons at 
>>>>> http://dataartisans.github.io/flink-training, for stream processing and 
>>>>> the table API for which I haven't found a video. Does anyone have 
>>>>> pointers to the missing videos?
>>>>> 
>>>>> Greetings, 
>>>>> 
>>>>> Juan
>>>>> 
>>>>> 2015-09-02 12:50 GMT+02:00 Juan Rodríguez Hortalá 
>>>>> <[email protected]>:
>>>>>> Hi list, 
>>>>>> 
>>>>>> I'm new to Flink, and I find this project very interesting. I have 
>>>>>> experience with Apache Spark, and for I've seen so far I find that Flink 
>>>>>> provides an API at a similar abstraction level but based on single 
>>>>>> record processing instead of batch processing. I've read in Quora that 
>>>>>> Flink extends stream processing to batch processing, while Spark extends 
>>>>>> batch processing to streaming. Therefore I find Flink specially 
>>>>>> attractive for low latency stream processing. Anyway, I would appreciate 
>>>>>> if someone could give some indication about where I could find a list of 
>>>>>> hardware requirements for the slave nodes in a Flink cluster. Something 
>>>>>> along the lines of 
>>>>>> https://spark.apache.org/docs/latest/hardware-provisioning.html. Spark 
>>>>>> is known for having quite high minimal memory requirements (8GB RAM and 
>>>>>> 8 cores minimum), and I was wondering if it is also the case for Flink. 
>>>>>> Lower memory requirements would be very interesting for building small 
>>>>>> Flink clusters for educational purposes, or for small projects. 
>>>>>> 
>>>>>> Apart from that, I wonder if there is some blog post by the comunity 
>>>>>> about transitioning from Spark to Flink. I think it could be 
>>>>>> interesting, as there are some similarities in the APIs, but also deep 
>>>>>> differences in the underlying approaches. I was thinking in something 
>>>>>> like Breeze's cheatsheet comparing its matrix operatations with those 
>>>>>> available in Matlab and Numpy 
>>>>>> https://github.com/scalanlp/breeze/wiki/Linear-Algebra-Cheat-Sheet, or 
>>>>>> like http://rosettacode.org/wiki/Factorial. Just an idea anyway. Also, 
>>>>>> any pointer to some online course, book or training for Flink besides 
>>>>>> the official programming guides would be much appreciated
>>>>>> 
>>>>>> Thanks in advance for help
>>>>>> 
>>>>>> Greetings, 
>>>>>> 
>>>>>> Juan
>> 
>> 
>> 
>> -- 
>> jay vyas
>

Re: Hardware requirements and learning resources

Reply via email to