from:"cjwang"

Garbage stats in Random Forest leaf node?

2015-03-16 Thread cjwang

I dumped the trees in the random forest model, and occasionally saw a leaf node with strange stats: - pred=1.00 prob=0.80 imp=-1.00 gain=-17976931348623157000

LogisticRegressionWithLBFGS shows ERRORs

2015-03-13 Thread cjwang

I am running LogisticRegressionWithLBFGS. I got these lines on my console: 2015-03-12 17:38:03,897 ERROR breeze.optimize.StrongWolfeLineSearch | Encountered bad values in function evaluation. Decreasing step size to 0.5 2015-03-12 17:38:03,967 ERROR breeze.optimize.StrongWolfeLineSearch | Encount

Logistic Regression displays ERRORs

2015-03-12 Thread cjwang

I am running LogisticRegressionWithLBFGS. I got these lines on my console: 2015-03-12 17:38:03,897 ERROR breeze.optimize.StrongWolfeLineSearch | Encountered bad values in function evaluation. Decreasing step size to 0.5 2015-03-12 17:38:03,967 ERROR breeze.optimize.StrongWolfeLineSearch | Encount

Extra output from Spark run

2015-03-04 Thread cjwang

When I run Spark 1.2.1, I found these display that wasn't in the previous releases: [Stage 12:=> (6 + 1) / 16] [Stage 12:>(8 + 1) / 16] [Stage 12:==>

Complexity/Efficiency of SortByKey

2014-09-15 Thread cjwang

I wonder what algorithm is used to implement sortByKey? I assume it is some O(n*log(n)) parallelized on x number of nodes, right? Then, what size of data would make it worthwhile to use sortByKey on multiple processors rather than use standard Scala sort functions on a single processor (consider

Re: Creating an RDD in another RDD causes deadlock

2014-09-02 Thread cjwang

I didn't know this restriction. Thank you. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Creating-an-RDD-in-another-RDD-causes-deadlock-tp13302p13304.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Creating an RDD in another RDD causes deadlock

2014-09-02 Thread cjwang

My code seemed deadlock when I tried to do this: object MoreRdd extends Serializable { def apply(i: Int) = { val rdd2 = sc.parallelize(0 to 10) rdd2.map(j => i*10 + j).collect } } val rdd1 = sc.parallelize(0 to 10) val y = rdd1.map(i => MoreRdd(i)).

What is the better data structure in an RDD

2014-08-29 Thread cjwang

I need some advices regarding how data are stored in an RDD. I have millions of records, called "Measures". They are bucketed with keys of String type. I wonder if I need to store them as RDD[(String, Measure)] or RDD[(String, Iterable[Measure])], and why? Data in each bucket are not related mo

Re: Finding previous and next element in a sorted RDD

2014-08-22 Thread cjwang

It would be nice if an RDD that was massaged by OrderedRDDFunction could know its "neighbors". -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621p12664.html Sent from the Apache Spark User List mailing

Re: Finding previous and next element in a sorted RDD

2014-08-21 Thread cjwang

One way is to do zipWithIndex on the RDD. Then use the index as a key. Add or subtract 1 for previous or next element. Then use cogroup or join to bind them together. val idx = input.zipWithIndex val previous = idx.map(x => (x._2+1, x._1)) val current = idx.map(x => (x._2, x._1)) val next = idx

Finding previous and next element in a sorted RDD

2014-08-21 Thread cjwang

I have an RDD containing elements sorted in certain order. I would like to map over the elements knowing the values of their respective previous and next elements. With regular List, I used to do this: ("input" is a List below) // The first of the previous measures and the last of the next meas

Re: ---cores option in spark-shell

2014-07-14 Thread cjwang

Neither do they work in new 1.0.1 either -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/cores-option-in-spark-shell-tp6809p9690.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: SparkR failed to connect to the master

2014-07-14 Thread cjwang

I tried installing the latest Spark 1.0.1 and SparkR couldn't find the master either. I restarted with Spark 0.9.1 and SparkR was able to find the master. So, there seemed to be something that changed after Spark 1.0.0. -- View this message in context: http://apache-spark-user-list.1001560.n3

Re: SparkR failed to connect to the master

2014-07-14 Thread cjwang

I restarted Spark Master with spark-0.9.1 and SparkR was able to communicate with the Master. I am using the latest SparkR pkg-e1f95b6. Maybe it has problem communicating to Spark 1.0.0? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-failed-to-co

Re: executor failed, cannot find compute-classpath.sh

2014-07-10 Thread cjwang

Andrew, Thanks for replying. I did the following and the result was still the same. 1. Added "spark.home /root/spark-1.0.0" to local conf/spark-defaults.conf, where "/root" was the place in the cluster where I put Spark. 2. Ran "bin/spark-shell --master spark://sjc1-eng-float01.carrieriq.co

SparkR failed to connect to the master

2014-07-10 Thread cjwang

I have a cluster running. I was able to run Spark Shell and submit programs. But when I tried to use SparkR, I got these errors: wifi-orcus:sparkR cwang$ MASTER=spark://wifi-orcus.dhcp.carrieriq.com:7077 sparkR R version 3.1.0 (2014-04-10) -- "Spring Dance" Copyright (C) 2014 The R Foundation f

Re: executor failed, cannot find compute-classpath.sh

2014-07-10 Thread cjwang

Not sure that was what I want. I tried to run Spark Shell on a machine other than the master and got the same error. The "192" was suppose to be a simple shell script change that alters SPARK_HOME before submitting jobs. Too bad it wasn't there anymore. The build described in the pull request (

Re: executor failed, cannot find compute-classpath.sh

2014-07-09 Thread cjwang

The link: https://github.com/apache/incubator-spark/pull/192 is no longer available. Could someone attach the solution or point me another location? Thanks. (I am using 1.0.0) C.J. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/executor-failed-cannot

Garbage stats in Random Forest leaf node?

LogisticRegressionWithLBFGS shows ERRORs

Logistic Regression displays ERRORs

Extra output from Spark run

Complexity/Efficiency of SortByKey

Re: Creating an RDD in another RDD causes deadlock

Creating an RDD in another RDD causes deadlock

What is the better data structure in an RDD

Re: Finding previous and next element in a sorted RDD

Re: Finding previous and next element in a sorted RDD

Finding previous and next element in a sorted RDD

Re: ---cores option in spark-shell

Re: SparkR failed to connect to the master

Re: SparkR failed to connect to the master

Re: executor failed, cannot find compute-classpath.sh

SparkR failed to connect to the master

Re: executor failed, cannot find compute-classpath.sh

Re: executor failed, cannot find compute-classpath.sh

18 matches

Site Navigation

Mail list logo

Footer information