Re: Oryx + Spark mllib

Debasish Das Sun, 19 Oct 2014 08:30:06 -0700

Hi Nick,

Any specific reason of choosing scalatra and not play/spray (now that they
are getting integrated) ?


Sean,

Would you be interested in a play and akka clustering based module in oryx2
and see how it compares against the servlets ? I am interested to
understand the scalability....

Thanks.
Deb

On Sat, Oct 18, 2014 at 11:22 PM, Nick Pentreath <nick.pentre...@gmail.com>
wrote:

> We've built a model server internally, based on Scalatra and Akka
> Clustering. Our use case is more geared towards serving possibly thousands
> of smaller models.
>
> It's actually very basic, just reads models from S3 as strings (!!) (uses
> HDFS FileSystem so can read from local, HDFS, S3) and uses Breeze for
> linear algebra. (Technically it is also not dependent on Spark, it could be
> reading models generated by any computation layer).
>
> It's designed to allow scaling via cluster sharding, by adding nodes (but
> could also support a load-balanced approach). Not using persistent actors
> as doing a model reload on node failure is not a disaster as we have
> multiple levels of fallback.
>
> Currently it is a bit specific to our setup (and only focused on
> recommendation models for now), but could with some work be made generic.
> I'm certainly considering if we can find the time to make it a releasable
> project.
>
> One major difference to Oryx is that it only handles the model loading and
> vector computations, not the filtering-related and other things that come
> as part of a recommender system (that is done elsewhere in our system). It
> also does not handle the ingesting of data at all.
>
> On Sun, Oct 19, 2014 at 7:10 AM, Sean Owen <so...@cloudera.com> wrote:
>
>> Yes, that is exactly what the next 2.x version does. Still in progress but
>> the recommender app and framework are code - complete. It is not even
>> specific to MLlib and could plug in other model build functions.
>>
>> The current 1.x version will not use MLlib. Neither uses Play but is
>> intended to scale just by adding web servers however you usually do.
>>
>> See graphflow too.
>> On Oct 18, 2014 5:06 PM, "Rajiv Abraham" <rajiv.abra...@gmail.com> wrote:
>>
>> > Oryx 2 seems to be geared for Spark
>> >
>> > https://github.com/OryxProject/oryx
>> >
>> > 2014-10-18 11:46 GMT-04:00 Debasish Das <debasish.da...@gmail.com>:
>> >
>> > > Hi,
>> > >
>> > > Is someone working on a project on integrating Oryx model serving
>> layer
>> > > with Spark ? Models will be built using either Streaming data / Batch
>> > data
>> > > in HDFS and cross validated with mllib APIs but the model serving
>> layer
>> > > will give API endpoints like Oryx
>> > > and read the models may be from hdfs/impala/SparkSQL
>> > >
>> > > One of the requirement is that the API layer should be scalable and
>> > > elastic...as requests grow we should be able to add more nodes...using
>> > play
>> > > and akka clustering module...
>> > >
>> > > If there is a ongoing project on github please point to it...
>> > >
>> > > Is there a plan of adding model serving and experimentation layer to
>> > mllib
>> > > ?
>> > >
>> > > Thanks.
>> > > Deb
>> > >
>> >
>> >
>> >
>> > --
>> > Take care,
>> > Rajiv
>> >
>>
>
>

Re: Oryx + Spark mllib

Reply via email to