I understand your point.

Is there something like a bridge? Is it possible to convert the model
trained using Dataset<Row> (i.e. the distributed one) to the one which uses
vectors? In Spark 1.6 the mllib packages had everything as per vectors and
that should be faster as per my understanding. But in many spark blogs we
saw that spark is moving towards the ml package and mllib package will be
phased out. So how can someone train using huge data and then use it on a
row by row basis?

Thanks for your inputs.

On Thu, Sep 1, 2016 at 6:15 PM, Sean Owen <so...@cloudera.com> wrote:

> If you're trying to score a single example by way of an RDD or
> Dataset, then no it will never be that fast. It's a whole distributed
> operation, and while you might manage low latency for one job at a
> time, consider what will happen when hundreds of them are running at
> once. It's just huge overkill for scoring a single example (but,
> pretty fine for high-er latency, high throughput batch operations)
>
> However if you're scoring a Vector locally I can't imagine it's that
> slow. It does some linear algebra but it's not that complicated. Even
> something unoptimized should be fast.
>
> On Thu, Sep 1, 2016 at 1:37 PM, Aseem Bansal <asmbans...@gmail.com> wrote:
> > Hi
> >
> > Currently trying to use NaiveBayes to make predictions. But facing issues
> > that doing the predictions takes order of few seconds. I tried with other
> > model examples shipped with Spark but they also ran in minimum of 500 ms
> > when I used Scala API. With
> >
> > Has anyone used spark ML to do predictions for a single row under 20 ms?
> >
> > I am not doing premature optimization. The use case is that we are doing
> > real time predictions and we need results 20ms. Maximum 30ms. This is a
> hard
> > limit for our use case.
>

Reply via email to