> On Oct 2, 2016, at 1:04 AM, Pengcheng wrote:
>
> Dear Spark Users,
>
> I was wondering.
>
> I have a trained crossvalidator model
> model: CrossValidatorModel
>
> I wan to predict a score for features: RDD[Features]
>
> Right now I have to co
Dear Spark Users,
I was wondering.
I have a trained crossvalidator model
*model: CrossValidatorModel*
I wan to predict a score for *features: RDD[Features]*
Right now I have to convert features to dataframe and then perform
predictions as following:
"""
val sqlContext = new SQLContext(fea
.setInputCol(*"category"*)
.setOutputCol(*"label"*)
*val *pipeline = *new *Pipeline().setStages(*Array*(indexer, rf))
*val *model: PipelineModel = pipeline.fit(trainingData)
thanks,
pengcheng
looking into the work folder of problematic application, seems that the
application is continuing creating executors, and error log of worker is as
below:
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException:
Unknown exception in doAs
at
org.apache.hadoop.security.UserG
the same
time). One of the two applications will run to end, but the other will
always stay in running state and never exit and release resources.
Does anyone meet the same issue?
The spark version I am using is spark1.1.1.
Best Regards,
Pengcheng
--
View this message in context:
http://apache
1. But this could take
effect globally. My pipeline involves lots of other operations which I do not
want to set limit on. Is there any better solution to fulfil the purpose?
Thanks!
Pengcheng
persisted `newRdd`. My concern is that, if RDD is
not evaluated and persisted when persist() is called, I need to change the
position of persist()/unpersist() called to make it more efficient.
Thanks,
Pengcheng
-
To
example, suppose RDD[Seq[Int]] = [[1,2,3], [2,4,5], [1,2], [7,8,9]], the
result should be [[1,2,3,4,5], [7,8,9]].
Since RDD[Seq[Int]] is very large, I cannot do it in driver program. Is it
possible to get it done using distributed groupBy/map/reduce, etc?
Thanks in advance,
Pengcheng