Hi Nick, Any specific reason of choosing scalatra and not play/spray (now that they are getting integrated) ?
Sean, Would you be interested in a play and akka clustering based module in oryx2 and see how it compares against the servlets ? I am interested to understand the scalability.... Thanks. Deb On Sat, Oct 18, 2014 at 11:22 PM, Nick Pentreath <nick.pentre...@gmail.com> wrote: > We've built a model server internally, based on Scalatra and Akka > Clustering. Our use case is more geared towards serving possibly thousands > of smaller models. > > It's actually very basic, just reads models from S3 as strings (!!) (uses > HDFS FileSystem so can read from local, HDFS, S3) and uses Breeze for > linear algebra. (Technically it is also not dependent on Spark, it could be > reading models generated by any computation layer). > > It's designed to allow scaling via cluster sharding, by adding nodes (but > could also support a load-balanced approach). Not using persistent actors > as doing a model reload on node failure is not a disaster as we have > multiple levels of fallback. > > Currently it is a bit specific to our setup (and only focused on > recommendation models for now), but could with some work be made generic. > I'm certainly considering if we can find the time to make it a releasable > project. > > One major difference to Oryx is that it only handles the model loading and > vector computations, not the filtering-related and other things that come > as part of a recommender system (that is done elsewhere in our system). It > also does not handle the ingesting of data at all. > > On Sun, Oct 19, 2014 at 7:10 AM, Sean Owen <so...@cloudera.com> wrote: > >> Yes, that is exactly what the next 2.x version does. Still in progress but >> the recommender app and framework are code - complete. It is not even >> specific to MLlib and could plug in other model build functions. >> >> The current 1.x version will not use MLlib. Neither uses Play but is >> intended to scale just by adding web servers however you usually do. >> >> See graphflow too. >> On Oct 18, 2014 5:06 PM, "Rajiv Abraham" <rajiv.abra...@gmail.com> wrote: >> >> > Oryx 2 seems to be geared for Spark >> > >> > https://github.com/OryxProject/oryx >> > >> > 2014-10-18 11:46 GMT-04:00 Debasish Das <debasish.da...@gmail.com>: >> > >> > > Hi, >> > > >> > > Is someone working on a project on integrating Oryx model serving >> layer >> > > with Spark ? Models will be built using either Streaming data / Batch >> > data >> > > in HDFS and cross validated with mllib APIs but the model serving >> layer >> > > will give API endpoints like Oryx >> > > and read the models may be from hdfs/impala/SparkSQL >> > > >> > > One of the requirement is that the API layer should be scalable and >> > > elastic...as requests grow we should be able to add more nodes...using >> > play >> > > and akka clustering module... >> > > >> > > If there is a ongoing project on github please point to it... >> > > >> > > Is there a plan of adding model serving and experimentation layer to >> > mllib >> > > ? >> > > >> > > Thanks. >> > > Deb >> > > >> > >> > >> > >> > -- >> > Take care, >> > Rajiv >> > >> > >