Re: SPARK MLLib - How to tie back Model.predict output to original data?

Yanbo Liang Wed, 17 Aug 2016 23:46:45 -0700

If you want to tie them with other data, I think the best way is to use
DataFrame join operation on condition that they share an identity column.


Thanks
Yanbo

2016-08-16 20:39 GMT-07:00 ayan guha <guha.a...@gmail.com>:

> Hi
>
> Thank you for your reply. Yes, I can get prediction and original features
> together. My question is how to tie them back to other parts of the data,
> which was not in LP.
>
> For example, I have a bunch of other dimensions which are not part of
> features or label.
>
> Sorry if this is a stupid question.
>
> On Wed, Aug 17, 2016 at 12:57 PM, Yanbo Liang <yblia...@gmail.com> wrote:
>
>> MLlib will keep the original dataset during transformation, it just
>> append new columns to existing DataFrame. That is you can get both
>> prediction value and original features from the output DataFrame of
>> model.transform.
>>
>> Thanks
>> Yanbo
>>
>> 2016-08-16 17:48 GMT-07:00 ayan guha <guha.a...@gmail.com>:
>>
>>> Hi
>>>
>>> I have a dataset as follows:
>>>
>>> DF:
>>> amount:float
>>> date_read:date
>>> meter_number:string
>>>
>>> I am trying to predict future amount based on past 3 weeks consumption
>>> (and a heaps of weather data related to date).
>>>
>>> My Labelpoint looks like
>>>
>>> label (populated from DF.amount)
>>> features (populated from a bunch of other stuff)
>>>
>>> Model.predict output:
>>> label
>>> prediction
>>>
>>> Now, I am trying to put together this prediction value back to meter
>>> number and date_read from original DF?
>>>
>>> One way to assume order of records in DF and Model.predict will be
>>> exactly same and zip two RDDs. But any other (possibly better) solution?
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>

Re: SPARK MLLib - How to tie back Model.predict output to original data?

Reply via email to