Re: SPARK MLLib - How to tie back Model.predict output to original data?

janardhan shetty Thu, 18 Aug 2016 06:41:21 -0700

There is a spark-ts package developed by Sandy which has rdd version.
Not sure about the dataframe roadmap.


http://sryza.github.io/spark-timeseries/0.3.0/index.html
On Aug 18, 2016 12:42 AM, "ayan guha" <guha.a...@gmail.com> wrote:

> Thanks a lot. I resolved it using an UDF.
>
> Qs: does spark support any time series model? Is there any roadmap to know
> when a feature will be roughly available?
> On 18 Aug 2016 16:46, "Yanbo Liang" <yblia...@gmail.com> wrote:
>
>> If you want to tie them with other data, I think the best way is to use
>> DataFrame join operation on condition that they share an identity column.
>>
>> Thanks
>> Yanbo
>>
>> 2016-08-16 20:39 GMT-07:00 ayan guha <guha.a...@gmail.com>:
>>
>>> Hi
>>>
>>> Thank you for your reply. Yes, I can get prediction and original
>>> features together. My question is how to tie them back to other parts of
>>> the data, which was not in LP.
>>>
>>> For example, I have a bunch of other dimensions which are not part of
>>> features or label.
>>>
>>> Sorry if this is a stupid question.
>>>
>>> On Wed, Aug 17, 2016 at 12:57 PM, Yanbo Liang <yblia...@gmail.com>
>>> wrote:
>>>
>>>> MLlib will keep the original dataset during transformation, it just
>>>> append new columns to existing DataFrame. That is you can get both
>>>> prediction value and original features from the output DataFrame of
>>>> model.transform.
>>>>
>>>> Thanks
>>>> Yanbo
>>>>
>>>> 2016-08-16 17:48 GMT-07:00 ayan guha <guha.a...@gmail.com>:
>>>>
>>>>> Hi
>>>>>
>>>>> I have a dataset as follows:
>>>>>
>>>>> DF:
>>>>> amount:float
>>>>> date_read:date
>>>>> meter_number:string
>>>>>
>>>>> I am trying to predict future amount based on past 3 weeks consumption
>>>>> (and a heaps of weather data related to date).
>>>>>
>>>>> My Labelpoint looks like
>>>>>
>>>>> label (populated from DF.amount)
>>>>> features (populated from a bunch of other stuff)
>>>>>
>>>>> Model.predict output:
>>>>> label
>>>>> prediction
>>>>>
>>>>> Now, I am trying to put together this prediction value back to meter
>>>>> number and date_read from original DF?
>>>>>
>>>>> One way to assume order of records in DF and Model.predict will be
>>>>> exactly same and zip two RDDs. But any other (possibly better) solution?
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Ayan Guha
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>
>>

Re: SPARK MLLib - How to tie back Model.predict output to original data?

Reply via email to