If we expose an API to access the raw models out of PipelineModel can't we call predict directly on it from an API ? Is there a task open to expose the model out of PipelineModel so that predict can be called on it....there is no dependency of spark context in ml model... On Feb 4, 2017 9:11 AM, "Aseem Bansal" <asmbans...@gmail.com> wrote:
> > - In Spark 2.0 there is a class called PipelineModel. I know that the > title says pipeline but it is actually talking about PipelineModel trained > via using a Pipeline. > - Why PipelineModel instead of pipeline? Because usually there is a > series of stuff that needs to be done when doing ML which warrants an > ordered sequence of operations. Read the new spark ml docs or one of the > databricks blogs related to spark pipelines. If you have used python's > sklearn library the concept is inspired from there. > - "once model is deserialized as ml model from the store of choice > within ms" - The timing of loading the model was not what I was > referring to when I was talking about timing. > - "it can be used on incoming features to score through spark.ml.Model > predict API". The predict API is in the old mllib package not the new ml > package. > - "why r we using dataframe and not the ML model directly from API" - > Because as of now the new ml package does not have the direct API. > > > On Sat, Feb 4, 2017 at 10:24 PM, Debasish Das <debasish.da...@gmail.com> > wrote: > >> I am not sure why I will use pipeline to do scoring...idea is to build a >> model, use model ser/deser feature to put it in the row or column store of >> choice and provide a api access to the model...we support these primitives >> in github.com/Verizon/trapezium...the api has access to spark context in >> local or distributed mode...once model is deserialized as ml model from the >> store of choice within ms, it can be used on incoming features to score >> through spark.ml.Model predict API...I am not clear on 2200x speedup...why >> r we using dataframe and not the ML model directly from API ? >> On Feb 4, 2017 7:52 AM, "Aseem Bansal" <asmbans...@gmail.com> wrote: >> >>> Does this support Java 7? >>> What is your timezone in case someone wanted to talk? >>> >>> On Fri, Feb 3, 2017 at 10:23 PM, Hollin Wilkins <hol...@combust.ml> >>> wrote: >>> >>>> Hey Aseem, >>>> >>>> We have built pipelines that execute several string indexers, one hot >>>> encoders, scaling, and a random forest or linear regression at the end. >>>> Execution time for the linear regression was on the order of 11 >>>> microseconds, a bit longer for random forest. This can be further optimized >>>> by using row-based transformations if your pipeline is simple to around 2-3 >>>> microseconds. The pipeline operated on roughly 12 input features, and by >>>> the time all the processing was done, we had somewhere around 1000 features >>>> or so going into the linear regression after one hot encoding and >>>> everything else. >>>> >>>> Hope this helps, >>>> Hollin >>>> >>>> On Fri, Feb 3, 2017 at 4:05 AM, Aseem Bansal <asmbans...@gmail.com> >>>> wrote: >>>> >>>>> Does this support Java 7? >>>>> >>>>> On Fri, Feb 3, 2017 at 5:30 PM, Aseem Bansal <asmbans...@gmail.com> >>>>> wrote: >>>>> >>>>>> Is computational time for predictions on the order of few >>>>>> milliseconds (< 10 ms) like the old mllib library? >>>>>> >>>>>> On Thu, Feb 2, 2017 at 10:12 PM, Hollin Wilkins <hol...@combust.ml> >>>>>> wrote: >>>>>> >>>>>>> Hey everyone, >>>>>>> >>>>>>> >>>>>>> Some of you may have seen Mikhail and I talk at Spark/Hadoop Summits >>>>>>> about MLeap and how you can use it to build production services from >>>>>>> your >>>>>>> Spark-trained ML pipelines. MLeap is an open-source technology that >>>>>>> allows >>>>>>> Data Scientists and Engineers to deploy Spark-trained ML Pipelines and >>>>>>> Models to a scoring engine instantly. The MLeap execution engine has no >>>>>>> dependencies on a Spark context and the serialization format is entirely >>>>>>> based on Protobuf 3 and JSON. >>>>>>> >>>>>>> >>>>>>> The recent 0.5.0 release provides serialization and inference >>>>>>> support for close to 100% of Spark transformers (we don’t yet support >>>>>>> ALS >>>>>>> and LDA). >>>>>>> >>>>>>> >>>>>>> MLeap is open-source, take a look at our Github page: >>>>>>> >>>>>>> https://github.com/combust/mleap >>>>>>> >>>>>>> >>>>>>> Or join the conversation on Gitter: >>>>>>> >>>>>>> https://gitter.im/combust/mleap >>>>>>> >>>>>>> >>>>>>> We have a set of documentation to help get you started here: >>>>>>> >>>>>>> http://mleap-docs.combust.ml/ >>>>>>> >>>>>>> >>>>>>> We even have a set of demos, for training ML Pipelines and linear, >>>>>>> logistic and random forest models: >>>>>>> >>>>>>> https://github.com/combust/mleap-demo >>>>>>> >>>>>>> >>>>>>> Check out our latest MLeap-serving Docker image, which allows you to >>>>>>> expose a REST interface to your Spark ML pipeline models: >>>>>>> >>>>>>> http://mleap-docs.combust.ml/mleap-serving/ >>>>>>> >>>>>>> >>>>>>> Several companies are using MLeap in production and even more are >>>>>>> currently evaluating it. Take a look and tell us what you think! We >>>>>>> hope to >>>>>>> talk with you soon and welcome feedback/suggestions! >>>>>>> >>>>>>> >>>>>>> Sincerely, >>>>>>> >>>>>>> Hollin and Mikhail >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >