Re: Machine learning question (suing spark)- removing redundant factors while doing clustering

2016-08-10 Thread Rohit Chaddha
t; PCA will not 'improve' clustering per se but can make it faster. > You may want to specify what you are actually trying to optimize. > > > On Tue, Aug 9, 2016, 03:23 Rohit Chaddha > wrote: > >> I would rather have less features to make better inferences on t

Re: Machine learning question (suing spark)- removing redundant factors while doing clustering

2016-08-08 Thread Rohit Chaddha
ore dominant in you classification, you can then run your > model again with the smaller set of features. > The two approaches are quite different, what I'm suggesting involves > training (supervised learning) in the context of a target function, with > SVD you are doing unsupervis

Re: Machine learning question (suing spark)- removing redundant factors while doing clustering

2016-08-08 Thread Rohit Chaddha
gt; > >> Great question Rohit. I am in my early days of ML as well and it would > be > >> great if we get some idea on this from other experts on this group. > >> > >> I know we can reduce dimensions by using PCA, but i think that does not > >> allow us

Machine learning question (suing spark)- removing redundant factors while doing clustering

2016-08-08 Thread Rohit Chaddha
I have a data-set where each data-point has 112 factors. I want to remove the factors which are not relevant, and say reduce to 20 factors out of these 112 and then do clustering of data-points using these 20 factors. How do I do these and how do I figure out which of the 20 factors are useful fo

Calling KmeansModel predict method

2016-08-03 Thread Rohit Chaddha
The predict method takes a Vector object I am unable to figure out how to make this spark vector object for getting predictions from my model. Does anyone has some code in java for this ? Thanks Rohit

build error - failing test- Error while building spark 2.0 trunk from github

2016-07-31 Thread Rohit Chaddha
--- T E S T S --- Running org.apache.spark.api.java.OptionalSuite Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.052 sec - in org.apache.spark.api.java.OptionalSuite Running o

calling dataset.show on a custom object - displays toString() value as first column and blank for rest

2016-07-31 Thread Rohit Chaddha
I have a custom object called A and corresponding Dataset when I call datasetA.show() method i get the following +++-+-+---+ |id|da|like|values|uid| +++-+-+---+ |A.toString()...| |A.toString().

Re: Spark 2.0 -- spark warehouse relative path in absolute URI error

2016-07-28 Thread Rohit Chaddha
After looking at the comments - I am not sure what the proposed fix is ? On Fri, Jul 29, 2016 at 12:47 AM, Sean Owen wrote: > Ah, right. This wasn't actually resolved. Yeah your input on 15899 > would be welcome. See if the proposed fix helps. > > On Thu, Jul 28, 2016 at 11:52

Re: Spark 2.0 -- spark warehouse relative path in absolute URI error

2016-07-28 Thread Rohit Chaddha
On Fri, Jul 29, 2016 at 12:06 AM, Rohit Chaddha wrote: > I am simply trying to do > session.read().json("file:///C:/data/a.json"); > > in 2.0.0-preview it was working fine with > sqlContext.read().json("C:/data/a.json"); > > > -Rohit > > On Fri, J

Re: Spark 2.0 -- spark warehouse relative path in absolute URI error

2016-07-28 Thread Rohit Chaddha
rtainly be an absolute > URI with an absolute path. What exactly is your input value for this > property? > > On Thu, Jul 28, 2016 at 11:28 AM, Rohit Chaddha > wrote: > > Hello Sean, > > > > I have tried both file:/ and file:/// > > Bit it does not work an

Re: Spark 2.0 -- spark warehouse relative path in absolute URI error

2016-07-28 Thread Rohit Chaddha
Hello Sean, I have tried both file:/ and file:/// Bit it does not work and give the same error -Rohit On Thu, Jul 28, 2016 at 11:51 PM, Sean Owen wrote: > IIRC that was fixed, in that this is actually an invalid URI. Use > file:/C:/... I think. > > On Thu, Jul 28, 2016 at 10:

Re: ClassTag variable in broadcast in spark 2.0 ? how to use

2016-07-28 Thread Rohit Chaddha
My bad. Please ignore this question. I accidentally reverted to sparkContext causing the issue On Thu, Jul 28, 2016 at 11:36 PM, Rohit Chaddha wrote: > In spark 2.0 there is an addtional parameter of type ClassTag in the > broadcast method of the sparkContext > > What is this vari

ClassTag variable in broadcast in spark 2.0 ? how to use

2016-07-28 Thread Rohit Chaddha
In spark 2.0 there is an addtional parameter of type ClassTag in the broadcast method of the sparkContext What is this variable and how to do broadcast now? here is my exisitng code with 2.0.0-preview Broadcast> b = jsc.broadcast(u.collectAsMap()); what changes needs to be done in 2.0 for this

Spark 2.0 -- spark warehouse relative path in absolute URI error

2016-07-28 Thread Rohit Chaddha
I upgraded from 2.0.0-preview to 2.0.0 and I started getting the following error Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:C:/ibm/spark-warehouse Any ideas how to fix this -Rohit

Is RowMatrix missing in org.apache.spark.ml package?

2016-07-26 Thread Rohit Chaddha
It is present in mlib but I don't seem to find it in ml package. Any suggestions please ? -Rohit

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

2016-07-25 Thread Rohit Chaddha
Hi Krishna, Great .. I had no idea about this. I tried your suggestion by using na.drop() and got a rmse = 1.5794048211812495 Any suggestions how this can be reduced and the model improved ? Regards, Rohit On Mon, Jul 25, 2016 at 4:12 AM, Krishna Sankar wrote: > Thanks Nick. I also ran into t

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

2016-07-24 Thread Rohit Chaddha
Great thanks both of you. I was struggling with this issue as well. -Rohit On Mon, Jul 25, 2016 at 4:12 AM, Krishna Sankar wrote: > Thanks Nick. I also ran into this issue. > VG, One workaround is to drop the NaN from predictions (df.na.drop()) and > then use the dataset for the evaluator. In