Hi Guru, So First transform your Name pages with OneHotEncoder ( https://spark.apache.org/docs/latest/ml-features.html#onehotencoder <https://spark.apache.org/docs/latest/ml-features.html#onehotencoder>) then make the same thing for months:
You will end with something like: (first tree are the pagename, the other the month,) (0,0,1,0,0,1) then you have your label that is what you want to predict. At the end you will have an LabeledPoint with (10000 -> (0,0,1,0,0,1)) this will represent (10000 -> (PageA, UV_NOV)) After that try a regression tree with val model = DecisionTree.trainRegressor(trainingData, categoricalFeaturesInfo, impurity,maxDepth, maxBins) Regards Jorge > On 01/02/2016, at 12:29, diplomatic Guru <diplomaticg...@gmail.com> wrote: > > Any suggestions please? > > > On 29 January 2016 at 22:31, diplomatic Guru <diplomaticg...@gmail.com > <mailto:diplomaticg...@gmail.com>> wrote: > Hello guys, > > I'm trying understand how I could predict the next month page views based on > the previous access pattern. > > For example, I've collected statistics on page views: > > e.g. > Page,UniqueView > ------------------------- > pageA, 10000 > pageB, 999 > ... > pageZ,200 > > I aggregate the statistics monthly. > > I've prepared a file containing last 3 months as this: > > e.g. > Page,UV_NOV, UV_DEC, UV_JAN > --------------------------------------------------- > pageA, 10000,9989,11000 > pageB, 999,500,700 > ... > pageZ,200,50,34 > > > Based on above information, I want to predict the next month (FEB). > > Which alogrithm do you think will suit most, I think linear regression is the > safe bet. However, I'm struggling to prepare this data for LR ML, especially > how do I prepare the X,Y relationship. > > The Y is easy (uniqiue visitors), but not sure about the X(it should be > Page,right). However, how do I plot those three months of data. > > Could you give me an example based on above example data? > > > > Page,UV_NOV, UV_DEC, UV_JAN > --------------------------------------------------- > 1, 10000,9989,11000 > 2, 999,500,700 > ... > 26,200,50,34 > > > > >