@Jay: I've looked into your code, but I was not able to reproduce the issue. I'll start a new discussion thread on the user@flink list for the Flink-BigPetStore discussion. I don't want to take over Juan's hardware-requirements discussion ;)
On Wed, Sep 2, 2015 at 3:01 PM, Jay Vyas <jayunit100.apa...@gmail.com> wrote: > Just running the main class is sufficient > > On Sep 2, 2015, at 8:59 AM, Robert Metzger <rmetz...@apache.org> wrote: > > Hey jay, > > How can I reproduce the error? > > On Wed, Sep 2, 2015 at 2:56 PM, jay vyas <jayunit100.apa...@gmail.com> > wrote: > >> We're also working on a bigpetstore implementation of flink which will >> help onboard spark/mapreduce folks. >> >> I have prototypical code here that runs a simple job in memory, >> contributions welcome, >> >> right now there is a serialization error >> https://github.com/bigpetstore/bigpetstore-flink . >> >> On Wed, Sep 2, 2015 at 8:50 AM, Robert Metzger <rmetz...@apache.org> >> wrote: >> >>> Hi Juan, >>> >>> I think the recommendations in the Spark guide are quite good, and are >>> similar to what I would recommend for Flink as well. >>> Depending on the workloads you are interested to run, you can certainly >>> use Flink with less than 8 GB per machine. I think you can start Flink >>> TaskManagers with 500 MB of heap space and they'll still be able to process >>> some GB of data. >>> >>> Everything above 2 GB is probably good enough for some initial >>> experimentation (again depending on your workloads, network, disk speed >>> etc.) >>> >>> >>> >>> >>> On Wed, Sep 2, 2015 at 2:30 PM, Kostas Tzoumas <ktzou...@apache.org> >>> wrote: >>> >>>> Hi Juan, >>>> >>>> Flink is quite nimble with hardware requirements; people have run it in >>>> old-ish laptops and also the largest instances available in cloud >>>> providers. I will let others chime in with more details. >>>> >>>> I am not aware of something along the lines of a cheatsheet that you >>>> mention. If you actually try to do this, I would love to see it, and it >>>> might be useful to others as well. Both use similar abstractions at the API >>>> level (i.e., parallel collections), so if you stay true to the functional >>>> paradigm and not try to "abuse" the system by exploiting knowledge of its >>>> internals things should be straightforward. These apply to the batch APIs; >>>> the streaming API in Flink follows a true streaming paradigm, where you get >>>> an unbounded stream of records and operators on these streams. >>>> >>>> Funny that you ask about a video for the DataStream slides. There is a >>>> Flink training happening as we speak, and a video is being recorded right >>>> now :-) Hopefully it will be made available soon. >>>> >>>> Best, >>>> Kostas >>>> >>>> >>>> On Wed, Sep 2, 2015 at 1:13 PM, Juan Rodríguez Hortalá < >>>> juan.rodriguez.hort...@gmail.com> wrote: >>>> >>>>> Answering to myself, I have found some nice training material at >>>>> http://dataartisans.github.io/flink-training. There are even videos >>>>> at youtube for some of the slides >>>>> >>>>> - http://dataartisans.github.io/flink-training/overview/intro.html >>>>> https://www.youtube.com/watch?v=XgC6c4Wiqvs >>>>> >>>>> - >>>>> http://dataartisans.github.io/flink-training/dataSetBasics/intro.html >>>>> https://www.youtube.com/watch?v=0EARqW15dDk >>>>> >>>>> The third lecture >>>>> http://dataartisans.github.io/flink-training/dataSetAdvanced/intro.html >>>>> more or less corresponds to >>>>> https://www.youtube.com/watch?v=1yWKZ26NQeU but not exactly, and >>>>> there are more lessons at http://dataartisans.github.io/flink-training, >>>>> for stream processing and the table API for which I haven't found a >>>>> video. Does anyone have pointers to the missing videos? >>>>> >>>>> Greetings, >>>>> >>>>> Juan >>>>> >>>>> 2015-09-02 12:50 GMT+02:00 Juan Rodríguez Hortalá < >>>>> juan.rodriguez.hort...@gmail.com>: >>>>> >>>>>> Hi list, >>>>>> >>>>>> I'm new to Flink, and I find this project very interesting. I have >>>>>> experience with Apache Spark, and for I've seen so far I find that Flink >>>>>> provides an API at a similar abstraction level but based on single record >>>>>> processing instead of batch processing. I've read in Quora that Flink >>>>>> extends stream processing to batch processing, while Spark extends batch >>>>>> processing to streaming. Therefore I find Flink specially attractive for >>>>>> low latency stream processing. Anyway, I would appreciate if someone >>>>>> could >>>>>> give some indication about where I could find a list of hardware >>>>>> requirements for the slave nodes in a Flink cluster. Something along the >>>>>> lines of >>>>>> https://spark.apache.org/docs/latest/hardware-provisioning.html. >>>>>> Spark is known for having quite high minimal memory requirements (8GB RAM >>>>>> and 8 cores minimum), and I was wondering if it is also the case for >>>>>> Flink. >>>>>> Lower memory requirements would be very interesting for building small >>>>>> Flink clusters for educational purposes, or for small projects. >>>>>> >>>>>> Apart from that, I wonder if there is some blog post by the comunity >>>>>> about transitioning from Spark to Flink. I think it could be interesting, >>>>>> as there are some similarities in the APIs, but also deep differences in >>>>>> the underlying approaches. I was thinking in something like Breeze's >>>>>> cheatsheet comparing its matrix operatations with those available in >>>>>> Matlab >>>>>> and Numpy >>>>>> https://github.com/scalanlp/breeze/wiki/Linear-Algebra-Cheat-Sheet, >>>>>> or like http://rosettacode.org/wiki/Factorial. Just an idea anyway. >>>>>> Also, any pointer to some online course, book or training for Flink >>>>>> besides >>>>>> the official programming guides would be much appreciated >>>>>> >>>>>> Thanks in advance for help >>>>>> >>>>>> Greetings, >>>>>> >>>>>> Juan >>>>>> >>>>>> >>>>> >>>> >>> >> >> >> -- >> jay vyas >> > >