Thanks a lot. I resolved it using an UDF. Qs: does spark support any time series model? Is there any roadmap to know when a feature will be roughly available? On 18 Aug 2016 16:46, "Yanbo Liang" <yblia...@gmail.com> wrote:
> If you want to tie them with other data, I think the best way is to use > DataFrame join operation on condition that they share an identity column. > > Thanks > Yanbo > > 2016-08-16 20:39 GMT-07:00 ayan guha <guha.a...@gmail.com>: > >> Hi >> >> Thank you for your reply. Yes, I can get prediction and original features >> together. My question is how to tie them back to other parts of the data, >> which was not in LP. >> >> For example, I have a bunch of other dimensions which are not part of >> features or label. >> >> Sorry if this is a stupid question. >> >> On Wed, Aug 17, 2016 at 12:57 PM, Yanbo Liang <yblia...@gmail.com> wrote: >> >>> MLlib will keep the original dataset during transformation, it just >>> append new columns to existing DataFrame. That is you can get both >>> prediction value and original features from the output DataFrame of >>> model.transform. >>> >>> Thanks >>> Yanbo >>> >>> 2016-08-16 17:48 GMT-07:00 ayan guha <guha.a...@gmail.com>: >>> >>>> Hi >>>> >>>> I have a dataset as follows: >>>> >>>> DF: >>>> amount:float >>>> date_read:date >>>> meter_number:string >>>> >>>> I am trying to predict future amount based on past 3 weeks consumption >>>> (and a heaps of weather data related to date). >>>> >>>> My Labelpoint looks like >>>> >>>> label (populated from DF.amount) >>>> features (populated from a bunch of other stuff) >>>> >>>> Model.predict output: >>>> label >>>> prediction >>>> >>>> Now, I am trying to put together this prediction value back to meter >>>> number and date_read from original DF? >>>> >>>> One way to assume order of records in DF and Model.predict will be >>>> exactly same and zip two RDDs. But any other (possibly better) solution? >>>> >>>> -- >>>> Best Regards, >>>> Ayan Guha >>>> >>> >>> >> >> >> -- >> Best Regards, >> Ayan Guha >> > >