Hi List, I've been following several projects with quite some interest over the past few years, and I've continued to wonder, why they're not moving towards a degree of being supported by mainstream Spark-distributions, and more frequently mentioned when it comes to enterprise adoption of Spark.
The list of such "middleware" components, that each sounded like it would mostly succeed the classic APIs to Spark (at least in their respective domains), that I've com across, are the following [In order of appearance]: * Spark JobServer - This was one of the first examples that I came across, it was quite exciting at the time, but I've not heard much of it since. I assume the focus was on stabilizing the code base. * Oryx2 - This was more focused on a particular issue, and looked to be a very nice framework for deploying real-time analytics --- but again, no real traction. In fact, I've heard of PoCs being done by/for Cloudera, to demo Lambda-Architectures with Spark, and this was not showcased. * Livy - Although Livy still appears to live, I'm not really seeing the progress, that I anticipated after first hearing about it at the 2015 Spark Summit Europe. Maybe it's because the documentation isn't quite there yet, maybe it's because features are missing -- somehow from my last look at it, it's not enterprise-ready quite yet, while offering a feature-set that should be driving enterprise adoption. * Mist - Just discovered it today, thinking, "great, ANOTHER middleware" and prompting this post. It looks quite fully featured, but can it succeed? On the plus side, it's linked to a small, focused business, on the down side it's linked to a small, focused business. Positive, since that drives development along nicely; negative, since it inhibits adoption in the enterprise space. Now, with that said - why did these products not gain bigger traction? Is it because Spark isn't quite ready yet? Is it because of a missed marketing opportunity? And on another note: Should Spark integrate such a wrapper "by default"? It's a step further on from the SparkSQL Thrift interface, towards offering not just programming API's, but service-APIs. Considering that there are so many different interpretations of how this should be solved, bundling the effort into a default-implementation could be beneficial. On the other hand, feature creep of this magnitude probably isn't desirable. I'd hope to hear some community opinions, in particular from developers/users of these or other similar projects. If I overlooked your similar project: Please pitch it -- I think this part of the ecosystem is shaping up to be quite exciting. Also, I'm looking at this with my enterprise-glasses on: So fine-grained user authorization and authentication features are very important, as are consistency and resiliency features. Since long-running interactive Spark-jobs are still a mixed bag stability-wise, this layer of middleware should provide a necessary buffer between crashes of the driver program, and serving results. Ecosystem support is also a must - why aren't there Tableau connectors for (some of) these APIs? [Because they're too obscure...] A closing note: This could of course just be the open-source/enterprise egg/hen issue: Open Source projects without large scale vendor support aren't interesting for the enterprise, and enterprise features aren't interesting for the non-enterprise developer. And worse, I wonder how man in-house custom solutions/extensions of these projects exist in the wild, because enterprise developers aren't usually allowed to share code back into open source projects. Thanks for putting up with this post this far, Best Rick