books/Scala/Untitled0.ipynb
How did you start the notebook?
Thanks & Regards,
Meethu M
On Wednesday, 12 November 2014 6:50 AM, "Laird, Benjamin"
mailto:benjamin.la...@capitalone.com>> wrote:
I've been experimenting with the ISpark extension to IScala
(https://git
I've been experimenting with the ISpark extension to IScala
(https://github.com/tribbloid/ISpark)
Objects created in the REPL are not being loaded correctly on worker nodes,
leading to a ClassNotFound exception. This does work correctly in spark-shell.
I was curious if anyone has used ISpark an
Something like this works and is how I create an RDD of specific records.
val avroRdd = sc.newAPIHadoopFile("twitter.avro",
classOf[AvroKeyInputFormat[twitter_schema]], classOf[AvroKey[twitter_schema]],
classOf[NullWritable], conf) (From
https://github.com/julianpeeters/avro-scala-macro-annotat
Thanks Akhil and Sean.
All three workers are doing the work and tasks stall simultaneously on all
three. I think Sean hit on my issue. I've been under the impression that each
application has one executor process per worker machine (not per core per
machine). Is that incorrect? If an executor i
Hi all,
I'm doing some testing on a small dataset (HadoopRDD, 2GB, ~10M records), with
a cluster of 3 nodes
Simple calculations like count take approximately 5s when using the default
value of executor.memory (512MB). When I scale this up to 2GB, several Tasks
take 1m or more (while most still
nd (I've done it for Cascading/Scalding).
>
>-----
>Chris
>
>
>
>From: Laird, Benjamin [benjamin.la...@capitalone.com]
>Sent: Tuesday, July 29, 2014 8:00 AM
>To: user@spark.apache.org; u...@spark.incubator.apache.org
>Subject: Avro Schema + GenericRecor
Hi all,
I can read in Avro files to Spark with HadoopRDD and submit the schema in
the jobConf, but with the guidance I've seen so far, I'm left with a avro
GenericRecord of Java objects without type. How do I actually use the
schema to have the types inferred?
Example:
scala> AvroJob.setInputSc
Hi all -
I’m using pySpark/MLLib ALS for user/item clustering and would like to directly
access the user/product RDDs (called userFeatures/productFeatures in class
MatrixFactorizationModel in mllib/recommendation/MatrixFactorizationModel.scala
This doesn’t seem to complex, but it doesn’t seem l
Good clarification Sean. Diana, I was also referring to this example when
setting up some of my bigger ALS runs. I don't this particular example is very
helpful, as it is creating the initial matrix locally in memory before
parallelizing in spark. So (unless I'm misunderstanding), it is an ok ex
Joe,
Do you have your SPARK_HOME variable set correctly in the spark-env.sh script?
I was getting that error when I was first setting up my cluster, turned out I
had to make some changes in the spark-env script to get things working
correctly.
Ben
-Original Message-
From: Joe L [mailt
Hello all -
I'm running the ALS/Collaborative Filtering code through pySpark on spark0.9.0.
(http://spark.apache.org/docs/0.9.0/mllib-guide.html#using-mllib-in-python)
My data file has about 27M tuples (User, Item, Rating). ALS.train(ratings,1,30)
runs on my 3 node cluster (24 cores, 60GB RAM)
11 matches
Mail list logo