Re: Error when run Spark on mesos

2014-04-02 Thread panfei
after upgrading to 0.9.1 , everything goes well now. thanks for the reply. 2014-04-03 13:47 GMT+08:00 andy petrella : > Hello, > It's indeed due to a known bug, but using another IP for the driver won't > be enough (other problems will pop up). > A easy solution would be to switch to 0.9.1 ...

Re: Error when run Spark on mesos

2014-04-02 Thread andy petrella
Hello, It's indeed due to a known bug, but using another IP for the driver won't be enough (other problems will pop up). A easy solution would be to switch to 0.9.1 ... see http://apache-spark-user-list.1001560.n3.nabble.com/ActorNotFound-problem-for-mesos-driver-td3636.html Hth Andy Le 3 avr. 20

Example of creating expressions for SchemaRDD methods

2014-04-02 Thread All In A Days Work
For various schemaRDD functions like select, where, orderby, groupby etc. I would like to create expression objects and pass these to the methods for execution. Can someone show some examples of how to create expressions for case class and execute ? E.g., how to create expressions for select, orde

Re: Spark streaming kafka _output_

2014-04-02 Thread Benjamin Black
please no. On Wed, Apr 2, 2014 at 9:47 PM, Tathagata Das wrote: > If any body is interested is doing this, may I suggested taking a look at > Twitter's Storehaus. It presents an abstract interface for pushing data to > many different backends, including Kafka, mongodb, hbase, etc. Integrating >

Re: Spark streaming kafka _output_

2014-04-02 Thread Tathagata Das
If any body is interested is doing this, may I suggested taking a look at Twitter's Storehaus. It presents an abstract interface for pushing data to many different backends, including Kafka, mongodb, hbase, etc. Integrating DStream.foreachRDD with Storehaus maybe a very very useful thing to do. So

Re: Error when run Spark on mesos

2014-04-02 Thread Ian Ferreira
I think this is related to a known issue (regression) in 0.9.0. Try using explicit IP other than loop back. Sent from a mobile device > On Apr 2, 2014, at 8:53 PM, "panfei" wrote: > > any advice ? > > > 2014-04-03 11:35 GMT+08:00 felix : >> I deployed mesos and test it using the exmaple/tes

Re: Spark streaming kafka _output_

2014-04-02 Thread Soren Macbeth
Anybody? Seems like a reasonable thing to be able to do no? On Fri, Mar 21, 2014 at 3:58 PM, Benjamin Black wrote: > Howdy, folks! > > Anybody out there having a working kafka _output_ for Spark streaming? > Perhaps one that doesn't involve instantiating a new producer for every > batch? > > Th

Re: Error when run Spark on mesos

2014-04-02 Thread panfei
any advice ? 2014-04-03 11:35 GMT+08:00 felix : > I deployed mesos and test it using the exmaple/test-framework script, > mesos seems OK. but when runing spark on the mesos cluster, the mesos slave > nodes report the following exception, any one can help me to fix this ? > thanks in advance: 14/

Error when run Spark on mesos

2014-04-02 Thread felix
I deployed mesos and test it using the exmaple/test-framework script, mesos seems OK.but when runing spark on the mesos cluster, the mesos slave nodes report the following exception, any one can help me to fix this ? thanks in advance:14/04/03 11:24:39 INFO Slf4jLogger: Slf4jLogger started14/04/03

Submitting to yarn cluster

2014-04-02 Thread Ron Gonzalez
Hi,   I have a small program but I cannot seem to make it connect to the right properties of the cluster.   I have the SPARK_YARN_APP_JAR, SPARK_JAR and SPARK_HOME set properly.   If I run this scala file, I am seeing that this is never using the yarn.resourcemanager.address property that I set o

Re: Shark Direct insert into table value (?)

2014-04-02 Thread qingyang li
for now , it does not support direct insert. 2014-04-03 10:52 GMT+08:00 abhietc31 : > Hi, > I'm trying to run script in SHARK(0.81) " insert into emp (id,name) > values (212,"Abhi") " but it doesn't work. > I urgently need direct insert as it is show stopper. > > I know that we can do " inser

Shark Direct insert into table value (?)

2014-04-02 Thread abhietc31
Hi, I'm trying to run script in SHARK(0.81) " insert into emp (id,name) values (212,"Abhi") " but it doesn't work. I urgently need direct insert as it is show stopper. I know that we can do " insert into emp select * from xyz". Here requirement is direct insert. Does any one tried it ? Or is t

Spark RDD to Shark table IN MEMORY conversion

2014-04-02 Thread abhietc31
Hi, We are placing business logic in incoming data stream using Spark streaming. Here I want to point Shark table to use data coming from Spark Streaming. Instead of storing Spark streaming to HDFS or other area, is there a way I can directly point Shark in-memory table to take data from Spark Str

Re: How to ask questions on Spark usage?

2014-04-02 Thread Andrew Or
Yes, please do. :) On Wed, Apr 2, 2014 at 7:36 PM, weida xu wrote: > Hi, > > Shall I send my questions to this Email address? > > Sorry for bothering, and thanks a lot! >

How to ask questions on Spark usage?

2014-04-02 Thread weida xu
Hi, Shall I send my questions to this Email address? Sorry for bothering, and thanks a lot!

Re: Optimal Server Design for Spark

2014-04-02 Thread Debasish Das
Hi Matei, How can I run multiple Spark workers per node ? I am running 8 core 10 node cluster but I do have 8 more cores on each nodeSo having 2 workers per node will definitely help my usecase. Thanks. Deb On Wed, Apr 2, 2014 at 3:58 PM, Matei Zaharia wrote: > Hey Steve, > > This config

Re: Optimal Server Design for Spark

2014-04-02 Thread Mayur Rustagi
I would suggest to start with cloud hosting if you can, depending on your usecase, memory requirement may vary a lot . Regards Mayur On Apr 2, 2014 3:59 PM, "Matei Zaharia" wrote: > Hey Steve, > > This configuration sounds pretty good. The one thing I would consider is > having more disks, for tw

Re: Status of MLI?

2014-04-02 Thread Evan R. Sparks
Targeting 0.9.0 should work out of the box (just a change to the build.sbt) - I'll push some changes I've been sitting on to the public repo in the next couple of days. On Wed, Apr 2, 2014 at 4:05 AM, Krakna H wrote: > Thanks for the update Evan! In terms of using MLI, I see that the Github > c

Re: Efficient way to aggregate event data at daily/weekly/monthly level

2014-04-02 Thread Nicholas Chammas
Watch out with loading data from gzipped files. Spark cannot parallelize the load of gzipped files, and if you do not explicitly repartition your RDD created from such a file, everything you do on that RDD will run on a single core. On Wed, Apr 2, 2014 at 8:22 PM, K Koh wrote: > Hi, > > I want

Re: Is there a way to get the current progress of the job?

2014-04-02 Thread Andrew Or
Hi Philip, In the upcoming release of Spark 1.0 there will be a feature that provides for exactly what you describe: capturing the information displayed on the UI in JSON. More details will be provided in the documentation, but for now, anything before 0.9.1 can only go through JobLogger.scala, wh

Re: Is there a way to get the current progress of the job?

2014-04-02 Thread Patrick Wendell
Hey Phillip, Right now there is no mechanism for this. You have to go in through the low level listener interface. We could consider exposing the JobProgressListener directly - I think it's been factored nicely so it's fairly decoupled from the UI. The concern is this is a semi-internal piece of

Re: Resilient nature of RDD

2014-04-02 Thread Patrick Wendell
The driver stores the meta-data associated with the partition, but the re-computation will occur on an executor. So if several partitions are lost, e.g. due to a few machines failing, the re-computation can be striped across the cluster making it fast. On Wed, Apr 2, 2014 at 11:27 AM, David Thoma

Efficient way to aggregate event data at daily/weekly/monthly level

2014-04-02 Thread K Koh
Hi, I want to aggregate (time-stamped) event data at daily, weekly and monthly level stored in a directory in data//mm/dd/dat.gz format. For example: Each dat.gz file contains tuples in (datetime, id, value) format. I can perform aggregation as follows: but this code doesn't seem to be

Regarding Sparkcontext object

2014-04-02 Thread yh18190
Hi Is it always needed that sparkcontext object be created in Main method of class.Is it necessary?Can we create "sc" object in other class and try to use it by passing this object through function and use it? Please clarify.. -- View this message in context: http://apache-spark-user-list.100

Re: Optimal Server Design for Spark

2014-04-02 Thread Matei Zaharia
Hey Steve, This configuration sounds pretty good. The one thing I would consider is having more disks, for two reasons — Spark uses the disks for large shuffles and out-of-core operations, and often it’s better to run HDFS or your storage system on the same nodes. But whether this is valuable w

Re: Cannot Access Web UI

2014-04-02 Thread Nicholas Chammas
Cool. That means the UI is being served up correctly. Now you'll need to find what's keeping you from accessing it from another machine... On Wed, Apr 2, 2014 at 6:34 PM, yxzhao wrote: > Thanks for your help. "lynx localhost:8080" works. > > > > > -- > View this message in context: > http://apa

Measure the Total Network I/O, Cpu and Memory Consumed by Spark Job

2014-04-02 Thread yxzhao
Hi All, I am intrested in measure the total network I/O, cpu and memory consumed by Spark job. I tried to find the related information in logs and Web UI. But there seems no sufficient information. Could anyone give me any suggestion? Thanks very much in advance. -- View this mess

Re: Is there a way to get the current progress of the job?

2014-04-02 Thread Philip Ogren
What I'd like is a way to capture the information provided on the stages page (i.e. cluster:4040/stages via IndexPage). Looking through the Spark code, it doesn't seem like it is possible to directly query for specific facts such as how many tasks have succeeded or how many total tasks there a

Re: Cannot Access Web UI

2014-04-02 Thread yxzhao
Thanks for your help. "lynx localhost:8080" works. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Cannot-Access-Web-UI-tp3599p3666.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: PySpark RDD.partitionBy() requires an RDD of tuples

2014-04-02 Thread Mark Hamstra
Will be in 1.0.0 On Wed, Apr 2, 2014 at 3:22 PM, Nicholas Chammas wrote: > Ah, now I see what Aaron was referring to. So I'm guessing we will get > this in the next release or two. Thank you. > > > > On Wed, Apr 2, 2014 at 6:09 PM, Mark Hamstra wrote: > >> There is a repartition method in pyspa

Re: PySpark RDD.partitionBy() requires an RDD of tuples

2014-04-02 Thread Nicholas Chammas
Ah, now I see what Aaron was referring to. So I'm guessing we will get this in the next release or two. Thank you. On Wed, Apr 2, 2014 at 6:09 PM, Mark Hamstra wrote: > There is a repartition method in pyspark master: > https://github.com/apache/spark/blob/master/python/pyspark/rdd.py#L1128 > >

Re: Spark output compression on HDFS

2014-04-02 Thread Nicholas Chammas
Thanks for pointing that out. On Wed, Apr 2, 2014 at 6:11 PM, Mark Hamstra wrote: > First, you shouldn't be using spark.incubator.apache.org anymore, just > spark.apache.org. Second, saveAsSequenceFile doesn't appear to exist in > the Python API at this point. > > > On Wed, Apr 2, 2014 at 3:00

Re: Spark output compression on HDFS

2014-04-02 Thread Mark Hamstra
First, you shouldn't be using spark.incubator.apache.org anymore, just spark.apache.org. Second, saveAsSequenceFile doesn't appear to exist in the Python API at this point. On Wed, Apr 2, 2014 at 3:00 PM, Nicholas Chammas wrote: > Is this a > Scala-only

Re: PySpark RDD.partitionBy() requires an RDD of tuples

2014-04-02 Thread Mark Hamstra
There is a repartition method in pyspark master: https://github.com/apache/spark/blob/master/python/pyspark/rdd.py#L1128 On Wed, Apr 2, 2014 at 2:44 PM, Nicholas Chammas wrote: > Update: I'm now using this ghetto function to partition the RDD I get back > when I call textFile() on a gzipped fil

Re: Spark output compression on HDFS

2014-04-02 Thread Nicholas Chammas
Is this a Scala-onlyfeature? On Wed, Apr 2, 2014 at 5:55 PM, Patrick Wendell wrote: > For textFile I believe we overload it and let you set a codec directly: > > > https://github.com/apache/spa

Optimal Server Design for Spark

2014-04-02 Thread Stephen Watt
Hi Folks I'm looking to buy some gear to run Spark. I'm quite well versed in Hadoop Server design but there does not seem to be much Spark related collateral around infrastructure guidelines (or at least I haven't been able to find them). My current thinking for server design is something along

Re: Spark output compression on HDFS

2014-04-02 Thread Patrick Wendell
For textFile I believe we overload it and let you set a codec directly: https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/FileSuite.scala#L59 For saveAsSequenceFile yep, I think Mark is right, you need an option. On Wed, Apr 2, 2014 at 12:36 PM, Mark Hamstra wrote

Re: Need suggestions

2014-04-02 Thread andy petrella
Sorry I was not clear perhaps, anyway, could you try with the path in the *List* to be the absolute one; e.g. List("/home/yh/src/pj/spark-stuffs/target/scala-2.10/simple-project_2.10-1.0.jar") In order to provide a relative path, you need first to figure out your CWD, so you can do (to be really s

Re: PySpark RDD.partitionBy() requires an RDD of tuples

2014-04-02 Thread Nicholas Chammas
Update: I'm now using this ghetto function to partition the RDD I get back when I call textFile() on a gzipped file: # Python 2.6 def partitionRDD(rdd, numPartitions): counter = {'a': 0} def count_up(x): counter['a'] += 1 return counter['a'] return (rdd.keyBy(count_up)

Re: Need suggestions

2014-04-02 Thread yh18190
Hi, Here is the sparkcontext feature.Do I need to any more extra jars to slaves separetely or this is enough? But i am able to see this created jar in my target directory.. val sc = new SparkContext("spark://spark-master-001:7077", "Simple App", utilclass.spark_home, List("target/sc

Re: Need suggestions

2014-04-02 Thread andy petrella
I cannot access your repo, however my gut feeling is that *"target/scala-2.10/simple-project_2.10-1.0.jar"* is not enough (say, that your cwd is not the folder containing *target*). I'd say that it'd be easier to put an absolute path... My2c - On Wed, Apr 2, 2014 at 11:07 PM, yh18190 wrote

Re: Need suggestions

2014-04-02 Thread yh18190
Its working under local mode..but not under cluster mode with 4 slaves -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Need-suggestions-tp3650p3653.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Need suggestions

2014-04-02 Thread yh18190
Hi, Thanks for response.Could you please look into my repo..Here Utils class is the class.I cannot paste the entire code..Thaswhy.. I have other class from where I would be calling Utils class for object creation.. package main.scala import org.apache.spark.SparkContext import org.apache.spark.S

Re: Need suggestions

2014-04-02 Thread andy petrella
TL;DR Your classes are missing on the workers, pass the jar containing the "class" main.scala.Utils to the SparkContext Longer: I miss some information, like how the SparkContext is configured but my best guess is that you didn't provided the jars (addJars on SparkConf or use the SC's constructor

Need suggestions

2014-04-02 Thread yh18190
Hi Guys, Currently I am facing this issue ..Not able to find erros.. here is sbt file. name := "Simple Project" version := "1.0" scalaVersion := "2.10.3" resolvers += "bintray/meetup" at "http://dl.bintray.com/meetup/maven"; resolvers += "Akka Repository" at "http://repo.akka.io/releases/"; r

Re: Spark output compression on HDFS

2014-04-02 Thread Mark Hamstra
http://www.scala-lang.org/api/2.10.3/index.html#scala.Option The signature is 'def saveAsSequenceFile(path: String, codec: Option[Class[_ <: CompressionCodec]] = None)', but you are providing a Class, not an Option[Class]. Try counts.saveAsSequenceFile(output, Some(classOf[org.apache.hadoop.io.co

Re: Spark output compression on HDFS

2014-04-02 Thread Nicholas Chammas
I'm also interested in this. On Wed, Apr 2, 2014 at 3:18 PM, Kostiantyn Kudriavtsev < kudryavtsev.konstan...@gmail.com> wrote: > Hi there, > > > I've started using Spark recently and evaluating possible use cases in our > company. > > I'm trying to save RDD as compressed Sequence file. I'm able

Spark output compression on HDFS

2014-04-02 Thread Kostiantyn Kudriavtsev
Hi there, I've started using Spark recently and evaluating possible use cases in our company. I'm trying to save RDD as compressed Sequence file. I'm able to save non-compressed file be calling: counts.saveAsSequenceFile(output) where counts is my RDD (IntWritable, Text). However, I didn't

Re: CDH5 Spark on EC2

2014-04-02 Thread Denny Lee
Thanks Mayur - I thought I had done those configurations but perhaps I'm pointing to the wrong master IP. > On Apr 2, 2014, at 9:34 AM, Mayur Rustagi wrote: > > The cluster is not running. You need to add MASTER environment variable & > point to your master IP to connect with it. > Also if

Print line in JavaNetworkWordCount

2014-04-02 Thread Eduardo Costa Alfaia
Hi Guys I would like printing the content inside of line in : JavaDStream lines = ssc.socketTextStream(args[1], Integer.parseInt(args[2])); JavaDStream words = lines.flatMap(new FlatMapFunction() { @Override public Iterable call(String x) { return Lists.newArrayList(x.

Resilient nature of RDD

2014-04-02 Thread David Thomas
Can someone explain how RDD is resilient? If one of the partition is lost, who is responsible to recreate that partition - is it the driver program?

Re: java.lang.NoClassDefFoundError: scala/tools/nsc/transform/UnCurry$UnCurryTransformer...

2014-04-02 Thread Manu Suryavansh
It says that it could be due to incompatible version of scala. Are you using the latest version of scala? I just build spark 0.9.0 yesterday and I installed latest version of scala and sbt and I didn't use this option - "SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true", I just did sbt/sbt assembly. Afte

Re: CDH5 Spark on EC2

2014-04-02 Thread Mayur Rustagi
The cluster is not running. You need to add MASTER environment variable & point to your master IP to connect with it. Also if you are running in distributed mode the workers should be registered. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi

Re: spark-streaming

2014-04-02 Thread Nathan Kronenfeld
We were using graph.zeroTime, to figure out which files were relevant to the DStream. It seems difficult to us to see how one would make a custom DStream without access to the graph in general though. And more egregious, the disparity between the privacy and documentation of clearMetadata and add

Re: ActorNotFound problem for mesos driver

2014-04-02 Thread andy petrella
np ;-) On Wed, Apr 2, 2014 at 5:50 PM, Leon Zhang wrote: > Aha, thank you for your kind reply. > > Upgrading to 0.9.1 is a good choice. :) > > > On Wed, Apr 2, 2014 at 11:35 PM, andy petrella wrote: > >> Heya, >> >> Yep this is a problem in the Mesos scheduler implementation that has been >> fi

Re: ActorNotFound problem for mesos driver

2014-04-02 Thread Leon Zhang
Aha, thank you for your kind reply. Upgrading to 0.9.1 is a good choice. :) On Wed, Apr 2, 2014 at 11:35 PM, andy petrella wrote: > Heya, > > Yep this is a problem in the Mesos scheduler implementation that has been > fixed after 0.9.0 (https://spark-project.atlassian.net/browse/SPARK-1052=> >

Re: ActorNotFound problem for mesos driver

2014-04-02 Thread andy petrella
Heya, Yep this is a problem in the Mesos scheduler implementation that has been fixed after 0.9.0 (https://spark-project.atlassian.net/browse/SPARK-1052 => MesosSchedulerBackend) So several options, like applying the patch, upgrading to 0.9.1 :-/ Cheers, Andy On Wed, Apr 2, 2014 at 5:30 PM, Le

ActorNotFound problem for mesos driver

2014-04-02 Thread Leon Zhang
Hi, Spark Devs: I encounter a problem which shows error message "akka.actor.ActorNotFound" on our mesos mini-cluster. mesos : 0.17.0 spark : spark-0.9.0-incubating spark-env.sh: #!/usr/bin/env bash export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so export SPARK_EXECUTOR_URI=hdfs:// 192.168.

Re: possible bug in Spark's ALS implementation...

2014-04-02 Thread Sean Owen
It should be kept in mind that different implementations are rarely strictly better, and that what works well in one type of data might not in another. It also bears keeping in mind that several of these differences just amount to different amounts of regularization, which need not be a difference.

Re: Status of MLI?

2014-04-02 Thread Krakna H
Thanks for the update Evan! In terms of using MLI, I see that the Github code is linked to Spark 0.8; will it not work with 0.9 (which is what I have set up) or higher versions? On Wed, Apr 2, 2014 at 1:44 AM, Evan R. Sparks [via Apache Spark User List] wrote: > Hi there, > > MLlib is the first

java.lang.NoClassDefFoundError: scala/tools/nsc/transform/UnCurry$UnCurryTransformer...

2014-04-02 Thread Francis . Hu
Hi, All I stuck in a NoClassDefFoundError. Any helps that would be appreciated. I download spark 0.9.0 source, and then run this command to build it : SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly then no error during the build of spark. After that I run the spark-shell for

Re: Protobuf 2.5 Mesos

2014-04-02 Thread Bharath Bhushan
Ian, I also faced a similar issue and the discussion is ongoing here: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassNotFoundException-spark-on-mesos-td3510.html Are you facing a ClassNotFoundException too? On 02/04/14 2:21 am, Ian Ferreira wrote: From what I can tell I ne

Re: java.lang.ClassNotFoundException - spark on mesos

2014-04-02 Thread Bharath Bhushan
I tried several things in order to get 1.0.0 git tree to work with mesos. All my efforts failed. I could run spark 0.9.0 on mesos but not spark 1.0.0. Please suggest any other things I can try. 1. Change project/SparkBuild.scala to use mesos 0.17.0 and then make_distribution.sh. 2. Try build

Re: Calling Spahk enthusiasts in Boston

2014-04-02 Thread andy petrella
Count me in! On Wed, Apr 2, 2014 at 10:32 AM, Pierre Borckmans < pierre.borckm...@realimpactanalytics.com> wrote: > There’s at least one in Johannesburg, I confirm that ;) > > We are actually both in Brussels(Belgium) and Jo’burg… > > We would love to host a meetup in Brussels, but we don’t know

Re: Calling Spahk enthusiasts in Boston

2014-04-02 Thread Pierre Borckmans
There’s at least one in Johannesburg, I confirm that ;) We are actually both in Brussels(Belgium) and Jo’burg… We would love to host a meetup in Brussels, but we don’t know if there is anyone interested here… Any Belgian Spark users out there? Pierre Borckmans RealImpact Analytics | Brussel

CDH5 Spark on EC2

2014-04-02 Thread Denny Lee
I’ve been able to get CDH5 up and running on EC2 and according to Cloudera Manager, Spark is running healthy. But when I try to run spark-shell, I eventually get the error: 14/04/02 07:18:18 INFO client.AppClient$ClientActor: Connecting to master  spark://ip-172-xxx-xxx-xxx:7077... 14/04/02 07:1

Re: How to index each map operation????

2014-04-02 Thread yh18190
Hi Therry, Thanks for the above responses..I implemented using RangePartitioner..we need to use any of the custom partitioners in orderto perform this task..Normally u cant maintain a counter becoz count operations should beperformed on each partitioned block ofdata... -- View this message in c

Re: possible bug in Spark's ALS implementation...

2014-04-02 Thread Nick Pentreath
The heuristic is multiply by number of ratings (i.e., the amount of training examples for that user/item combo). While the heuristic is used for CF settings, it is actually just penalizing, as Sean said, "complex" models more, where we have more data/connections between objects. I would say this c

Re: Using ProtoBuf 2.5 for messages with Spark Streaming

2014-04-02 Thread Vipul Pandey
I downloaded 0.9.0 fresh and ran the mvn command - the assembly jar thus generated also has both shaded and real version of protobuf classes Vipuls-MacBook-Pro-3:spark-0.9.0-incubating vipul$ jar -ftv ./assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop2.0.0-cdh4.2.1.jar | g

Re: possible bug in Spark's ALS implementation...

2014-04-02 Thread Debasish Das
I think multiply by ratings is a heuristic that worked on rating related problems like netflix dataset or any other ratings datasets but the scope of NMF is much more broad than that @Sean please correct me in case you don't agree... Definitely it's good to add all the rating dataset related

Re: possible bug in Spark's ALS implementation...

2014-04-02 Thread Michael Allman
Hi Nick, I don't have my spark clone in front of me, but OTOH the major differences are/were: 1. Oryx multiplies lambda by alpha. 2. Oryx uses a different matrix inverse algorithm. It maintains a certain symmetry which the Spark algo does not, however I don't think this difference has a real