Re: Hardware requirements and learning resources

Robert Metzger Wed, 02 Sep 2015 06:00:54 -0700

Hey jay,

How can I reproduce the error?


On Wed, Sep 2, 2015 at 2:56 PM, jay vyas <jayunit100.apa...@gmail.com>
wrote:

> We're also working on a bigpetstore implementation of flink which will
> help onboard spark/mapreduce folks.
>
> I have prototypical code here that runs a simple job in memory,
> contributions welcome,
>
> right now there is a serialization error
> https://github.com/bigpetstore/bigpetstore-flink .
>
> On Wed, Sep 2, 2015 at 8:50 AM, Robert Metzger <rmetz...@apache.org>
> wrote:
>
>> Hi Juan,
>>
>> I think the recommendations in the Spark guide are quite good, and are
>> similar to what I would recommend for Flink as well.
>> Depending on the workloads you are interested to run, you can certainly
>> use Flink with less than 8 GB per machine. I think you can start Flink
>> TaskManagers with 500 MB of heap space and they'll still be able to process
>> some GB of data.
>>
>> Everything above 2 GB is probably good enough for some initial
>> experimentation (again depending on your workloads, network, disk speed
>> etc.)
>>
>>
>>
>>
>> On Wed, Sep 2, 2015 at 2:30 PM, Kostas Tzoumas <ktzou...@apache.org>
>> wrote:
>>
>>> Hi Juan,
>>>
>>> Flink is quite nimble with hardware requirements; people have run it in
>>> old-ish laptops and also the largest instances available in cloud
>>> providers. I will let others chime in with more details.
>>>
>>> I am not aware of something along the lines of a cheatsheet that you
>>> mention. If you actually try to do this, I would love to see it, and it
>>> might be useful to others as well. Both use similar abstractions at the API
>>> level (i.e., parallel collections), so if you stay true to the functional
>>> paradigm and not try to "abuse" the system by exploiting knowledge of its
>>> internals things should be straightforward. These apply to the batch APIs;
>>> the streaming API in Flink follows a true streaming paradigm, where you get
>>> an unbounded stream of records and operators on these streams.
>>>
>>> Funny that you ask about a video for the DataStream slides. There is a
>>> Flink training happening as we speak, and a video is being recorded right
>>> now :-) Hopefully it will be made available soon.
>>>
>>> Best,
>>> Kostas
>>>
>>>
>>> On Wed, Sep 2, 2015 at 1:13 PM, Juan Rodríguez Hortalá <
>>> juan.rodriguez.hort...@gmail.com> wrote:
>>>
>>>> Answering to myself, I have found some nice training material at
>>>> http://dataartisans.github.io/flink-training. There are even videos at
>>>> youtube for some of the slides
>>>>
>>>>   - http://dataartisans.github.io/flink-training/overview/intro.html
>>>>     https://www.youtube.com/watch?v=XgC6c4Wiqvs
>>>>
>>>>   -
>>>> http://dataartisans.github.io/flink-training/dataSetBasics/intro.html
>>>>     https://www.youtube.com/watch?v=0EARqW15dDk
>>>>
>>>> The third lecture
>>>> http://dataartisans.github.io/flink-training/dataSetAdvanced/intro.html
>>>> more or less corresponds to https://www.youtube.com/watch?v=1yWKZ26NQeU
>>>> but not exactly, and there are more lessons at
>>>> http://dataartisans.github.io/flink-training, for stream processing
>>>> and the table API for which I haven't found a video. Does anyone have
>>>> pointers to the missing videos?
>>>>
>>>> Greetings,
>>>>
>>>> Juan
>>>>
>>>> 2015-09-02 12:50 GMT+02:00 Juan Rodríguez Hortalá <
>>>> juan.rodriguez.hort...@gmail.com>:
>>>>
>>>>> Hi list,
>>>>>
>>>>> I'm new to Flink, and I find this project very interesting. I have
>>>>> experience with Apache Spark, and for I've seen so far I find that Flink
>>>>> provides an API at a similar abstraction level but based on single record
>>>>> processing instead of batch processing. I've read in Quora that Flink
>>>>> extends stream processing to batch processing, while Spark extends batch
>>>>> processing to streaming. Therefore I find Flink specially attractive for
>>>>> low latency stream processing. Anyway, I would appreciate if someone could
>>>>> give some indication about where I could find a list of hardware
>>>>> requirements for the slave nodes in a Flink cluster. Something along the
>>>>> lines of
>>>>> https://spark.apache.org/docs/latest/hardware-provisioning.html.
>>>>> Spark is known for having quite high minimal memory requirements (8GB RAM
>>>>> and 8 cores minimum), and I was wondering if it is also the case for 
>>>>> Flink.
>>>>> Lower memory requirements would be very interesting for building small
>>>>> Flink clusters for educational purposes, or for small projects.
>>>>>
>>>>> Apart from that, I wonder if there is some blog post by the comunity
>>>>> about transitioning from Spark to Flink. I think it could be interesting,
>>>>> as there are some similarities in the APIs, but also deep differences in
>>>>> the underlying approaches. I was thinking in something like Breeze's
>>>>> cheatsheet comparing its matrix operatations with those available in 
>>>>> Matlab
>>>>> and Numpy
>>>>> https://github.com/scalanlp/breeze/wiki/Linear-Algebra-Cheat-Sheet,
>>>>> or like http://rosettacode.org/wiki/Factorial. Just an idea anyway.
>>>>> Also, any pointer to some online course, book or training for Flink 
>>>>> besides
>>>>> the official programming guides would be much appreciated
>>>>>
>>>>> Thanks in advance for help
>>>>>
>>>>> Greetings,
>>>>>
>>>>> Juan
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> jay vyas
>

Re: Hardware requirements and learning resources

Reply via email to