I have to agree with you Michael on the resources and availability in US versus everywhere else. It is a fact.
Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 17 April 2016 at 22:59, Michael Malak <michaelma...@yahoo.com> wrote: > As with all history, "what if"s are not scientifically testable > hypotheses, but my speculation is the energy (VCs, startups, big Internet > companies, universities) within Silicon Valley contrasted to Germany. > > > ------------------------------ > *From:* Mich Talebzadeh <mich.talebza...@gmail.com> > *To:* Michael Malak <michaelma...@yahoo.com>; "user @spark" < > user@spark.apache.org> > *Sent:* Sunday, April 17, 2016 3:55 PM > *Subject:* Re: Apache Flink > > Assuming that both Spark and Flink are contemporaries what are the reasons > that Flink has not been adopted widely? (this may sound obvious and or > prejudged). I mean Spark has surged in popularity in the past year if I am > correct > > Dr Mich Talebzadeh > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > http://talebzadehmich.wordpress.com > > > On 17 April 2016 at 22:49, Michael Malak <michaelma...@yahoo.com> wrote: > > In terms of publication date, a paper on Nephele was published in 2009, > prior to the 2010 USENIX paper on Spark. Nephele is the execution engine of > Stratosphere, which became Flink. > > > ------------------------------ > *From:* Mark Hamstra <m...@clearstorydata.com> > *To:* Mich Talebzadeh <mich.talebza...@gmail.com> > *Cc:* Corey Nolet <cjno...@gmail.com>; "user @spark" < > user@spark.apache.org> > *Sent:* Sunday, April 17, 2016 3:30 PM > *Subject:* Re: Apache Flink > > To be fair, the Stratosphere project from which Flink springs was started > as a collaborative university research project in Germany about the same > time that Spark was first released as Open Source, so they are near > contemporaries rather than Flink having been started only well after Spark > was an established and widely-used Apache project. > > On Sun, Apr 17, 2016 at 2:25 PM, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > > Also it always amazes me why they are so many tangential projects in Big > Data space? Would not it be easier if efforts were spent on adding to Spark > functionality rather than creating a new product like Flink? > > Dr Mich Talebzadeh > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > http://talebzadehmich.wordpress.com > > > On 17 April 2016 at 21:08, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > > Thanks Corey for the useful info. > > I have used Sybase Aleri and StreamBase as commercial CEPs engines. > However, there does not seem to be anything close to these products in > Hadoop Ecosystem. So I guess there is nothing there? > > Regards. > > > Dr Mich Talebzadeh > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > http://talebzadehmich.wordpress.com > > > On 17 April 2016 at 20:43, Corey Nolet <cjno...@gmail.com> wrote: > > i have not been intrigued at all by the microbatching concept in Spark. I > am used to CEP in real streams processing environments like Infosphere > Streams & Storm where the granularity of processing is at the level of each > individual tuple and processing units (workers) can react immediately to > events being received and processed. The closest Spark streaming comes to > this concept is the notion of "state" that that can be updated via the > "updateStateBykey()" functions which are only able to be run in a > microbatch. Looking at the expected design changes to Spark Streaming in > Spark 2.0.0, it also does not look like tuple-at-a-time processing is on > the radar for Spark, though I have seen articles stating that more effort > is going to go into the Spark SQL layer in Spark streaming which may make > it more reminiscent of Esper. > > For these reasons, I have not even tried to implement CEP in Spark. I feel > it's a waste of time without immediate tuple-at-a-time processing. Without > this, they avoid the whole problem of "back pressure" (though keep in mind, > it is still very possible to overload the Spark streaming layer with stages > that will continue to pile up and never get worked off) but they lose the > granular control that you get in CEP environments by allowing the rules & > processors to react with the receipt of each tuple, right away. > > Awhile back, I did attempt to implement an InfoSphere Streams-like API [1] > on top of Apache Storm as an example of what such a design may look like. > It looks like Storm is going to be replaced in the not so distant future by > Twitter's new design called Heron. IIRC, Heron does not have an open source > implementation as of yet. > > [1] https://github.com/calrissian/flowmix > > On Sun, Apr 17, 2016 at 3:11 PM, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > > Hi Corey, > > Can you please point me to docs on using Spark for CEP? Do we have a set > of CEP libraries somewhere. I am keen on getting hold of adaptor libraries > for Spark something like below > > > > > Thanks > > > Dr Mich Talebzadeh > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > http://talebzadehmich.wordpress.com > > > On 17 April 2016 at 16:07, Corey Nolet <cjno...@gmail.com> wrote: > > One thing I've noticed about Flink in my following of the project has been > that it has established, in a few cases, some novel ideas and improvements > over Spark. The problem with it, however, is that both the development team > and the community around it are very small and many of those novel > improvements have been rolled directly into Spark in subsequent versions. I > was considering changing over my architecture to Flink at one point to get > better, more real-time CEP streaming support, but in the end I decided to > stick with Spark and just watch Flink continue to pressure it into > improvement. > > On Sun, Apr 17, 2016 at 11:03 AM, Koert Kuipers <ko...@tresata.com> wrote: > > i never found much info that flink was actually designed to be fault > tolerant. if fault tolerance is more bolt-on/add-on/afterthought then that > doesn't bode well for large scale data processing. spark was designed with > fault tolerance in mind from the beginning. > > On Sun, Apr 17, 2016 at 9:52 AM, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > > Hi, > > I read the benchmark published by Yahoo. Obviously they already use Storm > and inevitably very familiar with that tool. To start with although these > benchmarks were somehow interesting IMO, it lend itself to an assurance > that the tool chosen for their platform is still the best choice. So > inevitably the benchmarks and the tests were done to support primary their > approach. > > In general anything which is not done through TCP Council or similar body > is questionable.. > Their argument is that because Spark handles data streaming in micro > batches then inevitably it introduces this in-built latency as per design. > In contrast, both Storm and Flink do not (at the face value) have this > issue. > > In addition as we already know Spark has far more capabilities compared to > Flink (know nothing about Storm). So really it boils down to the business > SLA to choose which tool one wants to deploy for your use case. IMO Spark > micro batching approach is probably OK for 99% of use cases. If we had in > built libraries for CEP for Spark (I am searching for it), I would not > bother with Flink. > > HTH > > > Dr Mich Talebzadeh > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > http://talebzadehmich.wordpress.com > > > On 17 April 2016 at 12:47, Ovidiu-Cristian MARCU < > ovidiu-cristian.ma...@inria.fr> wrote: > > You probably read this benchmark at Yahoo, any comments from Spark? > > https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at > <https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at?soc_src=mail&soc_trk=ma> > > > On 17 Apr 2016, at 12:41, andy petrella <andy.petre...@gmail.com> wrote: > > Just adding one thing to the mix: `that the latency for streaming data is > eliminated` is insane :-D > > On Sun, Apr 17, 2016 at 12:19 PM Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > > It seems that Flink argues that the latency for streaming data is > eliminated whereas with Spark RDD there is this latency. > > I noticed that Flink does not support interactive shell much like Spark > shell where you can add jars to it to do kafka testing. The advice was to > add the streaming Kafka jar file to CLASSPATH but that does not work. > > Most Flink documentation also rather sparce with the usual example of word > count which is not exactly what you want. > > Anyway I will have a look at it further. I have a Spark Scala streaming > Kafka program that works fine in Spark and I want to recode it using Scala > for Flink with Kafka but have difficulty importing and testing libraries. > > Cheers > > Dr Mich Talebzadeh > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > http://talebzadehmich.wordpress.com > > > On 17 April 2016 at 02:41, Ascot Moss <ascot.m...@gmail.com> wrote: > > I compared both last month, seems to me that Flink's MLLib is not yet > ready. > > On Sun, Apr 17, 2016 at 12:23 AM, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > > Thanks Ted. I was wondering if someone is using both :) > > Dr Mich Talebzadeh > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > http://talebzadehmich.wordpress.com > > > On 16 April 2016 at 17:08, Ted Yu <yuzhih...@gmail.com> wrote: > > Looks like this question is more relevant on flink mailing list :-) > > On Sat, Apr 16, 2016 at 8:52 AM, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > > Hi, > > Has anyone used Apache Flink instead of Spark by any chance > > I am interested in its set of libraries for Complex Event Processing. > > Frankly I don't know if it offers far more than Spark offers. > > Thanks > > Dr Mich Talebzadeh > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > http://talebzadehmich.wordpress.com > > > > > > > -- > andy > > > > > > > > > > > > > > > >