Hey jay, How can I reproduce the error?
On Wed, Sep 2, 2015 at 2:56 PM, jay vyas <jayunit100.apa...@gmail.com> wrote: > We're also working on a bigpetstore implementation of flink which will > help onboard spark/mapreduce folks. > > I have prototypical code here that runs a simple job in memory, > contributions welcome, > > right now there is a serialization error > https://github.com/bigpetstore/bigpetstore-flink . > > On Wed, Sep 2, 2015 at 8:50 AM, Robert Metzger <rmetz...@apache.org> > wrote: > >> Hi Juan, >> >> I think the recommendations in the Spark guide are quite good, and are >> similar to what I would recommend for Flink as well. >> Depending on the workloads you are interested to run, you can certainly >> use Flink with less than 8 GB per machine. I think you can start Flink >> TaskManagers with 500 MB of heap space and they'll still be able to process >> some GB of data. >> >> Everything above 2 GB is probably good enough for some initial >> experimentation (again depending on your workloads, network, disk speed >> etc.) >> >> >> >> >> On Wed, Sep 2, 2015 at 2:30 PM, Kostas Tzoumas <ktzou...@apache.org> >> wrote: >> >>> Hi Juan, >>> >>> Flink is quite nimble with hardware requirements; people have run it in >>> old-ish laptops and also the largest instances available in cloud >>> providers. I will let others chime in with more details. >>> >>> I am not aware of something along the lines of a cheatsheet that you >>> mention. If you actually try to do this, I would love to see it, and it >>> might be useful to others as well. Both use similar abstractions at the API >>> level (i.e., parallel collections), so if you stay true to the functional >>> paradigm and not try to "abuse" the system by exploiting knowledge of its >>> internals things should be straightforward. These apply to the batch APIs; >>> the streaming API in Flink follows a true streaming paradigm, where you get >>> an unbounded stream of records and operators on these streams. >>> >>> Funny that you ask about a video for the DataStream slides. There is a >>> Flink training happening as we speak, and a video is being recorded right >>> now :-) Hopefully it will be made available soon. >>> >>> Best, >>> Kostas >>> >>> >>> On Wed, Sep 2, 2015 at 1:13 PM, Juan Rodríguez Hortalá < >>> juan.rodriguez.hort...@gmail.com> wrote: >>> >>>> Answering to myself, I have found some nice training material at >>>> http://dataartisans.github.io/flink-training. There are even videos at >>>> youtube for some of the slides >>>> >>>> - http://dataartisans.github.io/flink-training/overview/intro.html >>>> https://www.youtube.com/watch?v=XgC6c4Wiqvs >>>> >>>> - >>>> http://dataartisans.github.io/flink-training/dataSetBasics/intro.html >>>> https://www.youtube.com/watch?v=0EARqW15dDk >>>> >>>> The third lecture >>>> http://dataartisans.github.io/flink-training/dataSetAdvanced/intro.html >>>> more or less corresponds to https://www.youtube.com/watch?v=1yWKZ26NQeU >>>> but not exactly, and there are more lessons at >>>> http://dataartisans.github.io/flink-training, for stream processing >>>> and the table API for which I haven't found a video. Does anyone have >>>> pointers to the missing videos? >>>> >>>> Greetings, >>>> >>>> Juan >>>> >>>> 2015-09-02 12:50 GMT+02:00 Juan Rodríguez Hortalá < >>>> juan.rodriguez.hort...@gmail.com>: >>>> >>>>> Hi list, >>>>> >>>>> I'm new to Flink, and I find this project very interesting. I have >>>>> experience with Apache Spark, and for I've seen so far I find that Flink >>>>> provides an API at a similar abstraction level but based on single record >>>>> processing instead of batch processing. I've read in Quora that Flink >>>>> extends stream processing to batch processing, while Spark extends batch >>>>> processing to streaming. Therefore I find Flink specially attractive for >>>>> low latency stream processing. Anyway, I would appreciate if someone could >>>>> give some indication about where I could find a list of hardware >>>>> requirements for the slave nodes in a Flink cluster. Something along the >>>>> lines of >>>>> https://spark.apache.org/docs/latest/hardware-provisioning.html. >>>>> Spark is known for having quite high minimal memory requirements (8GB RAM >>>>> and 8 cores minimum), and I was wondering if it is also the case for >>>>> Flink. >>>>> Lower memory requirements would be very interesting for building small >>>>> Flink clusters for educational purposes, or for small projects. >>>>> >>>>> Apart from that, I wonder if there is some blog post by the comunity >>>>> about transitioning from Spark to Flink. I think it could be interesting, >>>>> as there are some similarities in the APIs, but also deep differences in >>>>> the underlying approaches. I was thinking in something like Breeze's >>>>> cheatsheet comparing its matrix operatations with those available in >>>>> Matlab >>>>> and Numpy >>>>> https://github.com/scalanlp/breeze/wiki/Linear-Algebra-Cheat-Sheet, >>>>> or like http://rosettacode.org/wiki/Factorial. Just an idea anyway. >>>>> Also, any pointer to some online course, book or training for Flink >>>>> besides >>>>> the official programming guides would be much appreciated >>>>> >>>>> Thanks in advance for help >>>>> >>>>> Greetings, >>>>> >>>>> Juan >>>>> >>>>> >>>> >>> >> > > > -- > jay vyas >