Re: running lda in spark throws exception

2016-01-14 Thread Li Li
I got it. I mistakenly thought that each line is a wordid list. On Fri, Jan 15, 2016 at 3:24 AM, Bryan Cutler wrote: > What I mean is the input to LDA.run() is a RDD[(Long, Vector)] and the > Vector is a vector of counts of each term and should be the same size as the > vocabulary (so if the voca

Re: [discuss] dropping Hadoop 2.2 and 2.3 support in Spark 2.0?

2016-01-14 Thread Reynold Xin
Thanks for chiming in. Note that an organization's agility in Spark upgrades can be very different from Hadoop upgrades. For many orgs, Hadoop is responsible for cluster resource scheduling (YARN) and data storage (HDFS). These two are notorious difficult to upgrade. It is all or nothing for a clu

Re: Random Forest FeatureImportance throwing NullPointerException

2016-01-14 Thread Bryan Cutler
If you are able to just train the RandomForestClassificationModel from ML directly instead of training the old model and converting, then that would be the way to go. On Thu, Jan 14, 2016 at 2:21 PM, wrote: > Thanks so much Bryan for your response. Is there any workaround? > > > > *From:* Bryan

Re: Random Forest FeatureImportance throwing NullPointerException

2016-01-14 Thread Bryan Cutler
Hi Rachana, I got the same exception. It is because computing the feature importance depends on impurity stats, which is not calculated with the old RandomForestModel in MLlib. Feel free to create a JIRA for this if you think it is necessary, otherwise I believe this problem will be eventually s

Re: Spark 1.6.0 and HDP 2.2 - problem

2016-01-14 Thread Steve Loughran
> On 13 Jan 2016, at 10:15, Maciej BryƄski wrote: > > Steve, > Thank you for the answer. > How Hortonworks deal with this problem internally ? Looks like this problem is very brittle to directory loading: if all of a single jackson versions' set of classes is loaded first, then, irrespective o

Re: [discuss] dropping Hadoop 2.2 and 2.3 support in Spark 2.0?

2016-01-14 Thread Steve Loughran
> On 14 Jan 2016, at 09:28, Steve Loughran wrote: >> > > 2.6.x is still having active releases, likely through 2016. It'll be the only > hadoop version where problems Spark encounters would get fixed Correction: minimum Hadoop version Any problem reported against older versions will probably

Re: running lda in spark throws exception

2016-01-14 Thread Bryan Cutler
What I mean is the input to LDA.run() is a RDD[(Long, Vector)] and the Vector is a vector of counts of each term and should be the same size as the vocabulary (so if the vocabulary, or dictionary has 10 words, each vector should have a size of 10). This probably means that there will be some eleme

Re: [discuss] dropping Hadoop 2.2 and 2.3 support in Spark 2.0?

2016-01-14 Thread Steve Loughran
> On 14 Jan 2016, at 02:17, Sean Owen wrote: > > I personally support this. I had suggest drawing the line at Hadoop > 2.6, but that's minor. More info: > > Hadoop 2.7: April 2015 > Hadoop 2.6: Nov 2014 > Hadoop 2.5: Aug 2014 > Hadoop 2.4: April 2014 > Hadoop 2.3: Feb 2014 > Hadoop 2.2: Oct 201

RE: Random Forest FeatureImportance throwing NullPointerException

2016-01-14 Thread Rachana Srivastava
Tried using 1.6 version of Spark that takes numberOfFeatures fifth argument in the API but still getting featureImportance as null. RandomForestClassifier rfc = getRandomForestClassifier( numTrees, maxBinSize, maxTreeDepth, seed, impurity); RandomForestClassificationModel rfm = RandomFores

Re: Usage of SparkContext within a Web container

2016-01-14 Thread Eugene Morozov
Praveen, Zeppelin uses Spark's REPL. I'm currently writing an app that is a web service, which is going to run spark jobs. So, at the init stage I just create JavaSparkContext and then use it for all users requests. Web service is stateless. The issue with stateless is that it's possible to run s

Re: [discuss] dropping Hadoop 2.2 and 2.3 support in Spark 2.0?

2016-01-14 Thread Sean Owen
I personally support this. I had suggest drawing the line at Hadoop 2.6, but that's minor. More info: Hadoop 2.7: April 2015 Hadoop 2.6: Nov 2014 Hadoop 2.5: Aug 2014 Hadoop 2.4: April 2014 Hadoop 2.3: Feb 2014 Hadoop 2.2: Oct 2013 CDH 5.0/5.1 = Hadoop 2.3 + backports CDH 5.2/5.3 = Hadoop 2.5 + b