I dumped the trees in the random forest model, and occasionally saw a leaf
node with strange stats:
- pred=1.00 prob=0.80 imp=-1.00
gain=-17976931348623157000
I am running LogisticRegressionWithLBFGS. I got these lines on my console:
2015-03-12 17:38:03,897 ERROR breeze.optimize.StrongWolfeLineSearch |
Encountered bad values in function evaluation. Decreasing step size to 0.5
2015-03-12 17:38:03,967 ERROR breeze.optimize.StrongWolfeLineSearch |
Encount
I am running LogisticRegressionWithLBFGS. I got these lines on my console:
2015-03-12 17:38:03,897 ERROR breeze.optimize.StrongWolfeLineSearch |
Encountered bad values in function evaluation. Decreasing step size to 0.5
2015-03-12 17:38:03,967 ERROR breeze.optimize.StrongWolfeLineSearch |
Encount
When I run Spark 1.2.1, I found these display that wasn't in the previous
releases:
[Stage 12:=> (6 + 1) /
16]
[Stage 12:>(8 + 1) /
16]
[Stage 12:==>
I wonder what algorithm is used to implement sortByKey? I assume it is some
O(n*log(n)) parallelized on x number of nodes, right?
Then, what size of data would make it worthwhile to use sortByKey on
multiple processors rather than use standard Scala sort functions on a
single processor (consider
I didn't know this restriction. Thank you.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Creating-an-RDD-in-another-RDD-causes-deadlock-tp13302p13304.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
My code seemed deadlock when I tried to do this:
object MoreRdd extends Serializable {
def apply(i: Int) = {
val rdd2 = sc.parallelize(0 to 10)
rdd2.map(j => i*10 + j).collect
}
}
val rdd1 = sc.parallelize(0 to 10)
val y = rdd1.map(i => MoreRdd(i)).
I need some advices regarding how data are stored in an RDD. I have millions
of records, called "Measures". They are bucketed with keys of String type.
I wonder if I need to store them as RDD[(String, Measure)] or RDD[(String,
Iterable[Measure])], and why?
Data in each bucket are not related mo
It would be nice if an RDD that was massaged by OrderedRDDFunction could know
its "neighbors".
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621p12664.html
Sent from the Apache Spark User List mailing
One way is to do zipWithIndex on the RDD. Then use the index as a key. Add
or subtract 1 for previous or next element. Then use cogroup or join to
bind them together.
val idx = input.zipWithIndex
val previous = idx.map(x => (x._2+1, x._1))
val current = idx.map(x => (x._2, x._1))
val next = idx
I have an RDD containing elements sorted in certain order. I would like to
map over the elements knowing the values of their respective previous and
next elements.
With regular List, I used to do this: ("input" is a List below)
// The first of the previous measures and the last of the next meas
Neither do they work in new 1.0.1 either
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/cores-option-in-spark-shell-tp6809p9690.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
I tried installing the latest Spark 1.0.1 and SparkR couldn't find the master
either. I restarted with Spark 0.9.1 and SparkR was able to find the
master. So, there seemed to be something that changed after Spark 1.0.0.
--
View this message in context:
http://apache-spark-user-list.1001560.n3
I restarted Spark Master with spark-0.9.1 and SparkR was able to communicate
with the Master. I am using the latest SparkR pkg-e1f95b6. Maybe it has
problem communicating to Spark 1.0.0?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-failed-to-co
Andrew,
Thanks for replying. I did the following and the result was still the same.
1. Added "spark.home /root/spark-1.0.0" to local conf/spark-defaults.conf,
where "/root" was the place in the cluster where I put Spark.
2. Ran "bin/spark-shell --master
spark://sjc1-eng-float01.carrieriq.co
I have a cluster running. I was able to run Spark Shell and submit programs.
But when I tried to use SparkR, I got these errors:
wifi-orcus:sparkR cwang$ MASTER=spark://wifi-orcus.dhcp.carrieriq.com:7077
sparkR
R version 3.1.0 (2014-04-10) -- "Spring Dance"
Copyright (C) 2014 The R Foundation f
Not sure that was what I want. I tried to run Spark Shell on a machine other
than the master and got the same error. The "192" was suppose to be a
simple shell script change that alters SPARK_HOME before submitting jobs.
Too bad it wasn't there anymore.
The build described in the pull request (
The link:
https://github.com/apache/incubator-spark/pull/192
is no longer available. Could someone attach the solution or point me
another location? Thanks.
(I am using 1.0.0)
C.J.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/executor-failed-cannot
18 matches
Mail list logo