Hi,
To connect to Spark from a remote location and submit jobs, you can try
Spark - Job Server.Its been open sourced now.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Integration-Patterns-tp26354p26357.html
Sent from the Apache Spark User List mail
Hi,
If your features are numeric, try feature scaling and feed it to Spark
Logistic Regression, It might increase rate%
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Mllib-Logistic-Regression-performance-relative-to-Mahout-tp26346p26358.html
Sent from the
Hi,
DStream->Discretized Streams are made up of multiple RDDs
You can unpersist each RDD by accessing the individual RDD's using
dstreamrdd.foreachRDD
{
rdd.unpersist().
}
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-unpersist-a-DStream-in-Sp
Hi vkutsenko,
Can you just give partitions to the input labeled rdd, like:
data = MLUtils.loadLibSVMFile(jsc.sc(),
"s3://somebucket/somekey/plaintext_libsvm_file").toJavaRDD().*repartition(5)*;
Here, i used 5, since you have have 5 cores.
Also for further benchmark and performance tuning:
h
Hi,
I guess, the double values are number of visits
rather than a visit flag (obviously it should be more useful than visit flag
i.e 1/0)
this is based on the assumption that while doing matrix factorisation,
rating trained using implicit cannot be binary, as it gives poor feature
values. In tur
Hi,
1. The main difference between SparkR and R is that "SparkR" can handle
bigdata.
Yes, you can use other core libraries inside SparkR(not algos like
lm(),glm(),kmean())
2.Yes, core R libraries will not be distributed. You can use function from
these libraries which are applicabe for mapper ki
HI,
In the first rdd transformation (eg: reading from a file
sc.textfile("path",partition)), the partition you specify will be applied to
all further transformations and actions from this rdd.
In few places repartitioning your rdd will give a added advantage.
Repartition is usually done during act