Re: Apache Ignite vs Apache Spark

2015-02-26 Thread Ognen Duzlevski
as well under the ignite > project. > > > -Original Message- > From: Jay Vyas [mailto:jayunit100.apa...@gmail.com] > Sent: Thursday, February 26, 2015 3:40 PM > To: Sean Owen > Cc: Ognen Duzlevski; user@spark.apache.org > Subject: Re: Apache Ignite vs Apache Spark >

Apache Ignite vs Apache Spark

2015-02-26 Thread Ognen Duzlevski
Can someone with experience briefly share or summarize the differences between Ignite and Spark? Are they complementary? Totally unrelated? Overlapping? Seems like ignite has reached version 1.0, I have never heard of it until a few days ago and given what is advertised, it sounds pretty interestin

Re: Perf Prediction

2015-02-21 Thread Ognen Duzlevski
On Sat, Feb 21, 2015 at 8:54 AM, Deep Pradhan wrote: > No, I am talking about some work parallel to prediction works that are > done on GPUs. Like say, given the data for smaller number of nodes in a > Spark cluster, the prediction needs to be done about the time that the > application would take

Re: Spark or MR, Scala or Java?

2014-11-23 Thread Ognen Duzlevski
On Sun, Nov 23, 2014 at 1:03 PM, Ashish Rangole wrote: > Java or Scala : I knew Java already yet I learnt Scala when I came across > Spark. As others have said, you can get started with a little bit of Scala > and learn more as you progress. Once you have started using Scala for a few > weeks you

Re: Submitting Python Applications from Remote to Master

2014-11-15 Thread Ognen Duzlevski
Ashic, Thanks for your email. Two things: 1. I think a whole lot of data scientists and other people would love it if they could just fire off jobs from their laptops. It is, in my opinion, a common desired use case. 2. Did anyone actually get the Ooyala job server to work? I asked that questio

Re: Running spark-shell (or queries) over the network (not from master)

2014-09-08 Thread Ognen Duzlevski
in mind there is a non-trivial amount of traffic between the driver and cluster. It's not something I would do by default, running the driver so remotely. With enough ports open it should work though. On Sun, Sep 7, 2014 at 7:05 PM, Ognen Duzlevski wrote: Horacio, Thanks, I have not tried

Re: Running spark-shell (or queries) over the network (not from master)

2014-09-07 Thread Ognen Duzlevski
Horacio, Thanks, I have not tried that, however, I am not after security right now - I am just wondering why something so obvious won't work ;) Ognen On 9/7/2014 12:38 PM, Horacio G. de Oro wrote: Have you tryied with ssh? It will be much secure (only 1 port open), and you'll be able to run

Re: Running spark-shell (or queries) over the network (not from master)

2014-09-07 Thread Ognen Duzlevski
Have you actually tested this? I have two instances, one is standalone master and the other one just has spark installed, same versions of spark (1.0.0). The security group on the master allows all (0-65535) TCP and UDP traffic from the other machine and the other machine allows all TCP/UDP

Fwd: DELIVERY FAILURE: Error transferring to QCMBSJ601.HERMES.SI.SOCGEN; Maximum hop count exceeded. Message probably in a routing loop.

2014-09-07 Thread Ognen Duzlevski
I keep getting below reply every time I send a message to the Spark user list? Can this person be taken off the list by powers that be? Thanks! Ognen Forwarded Message Subject: DELIVERY FAILURE: Error transferring to QCMBSJ601.HERMES.SI.SOCGEN; Maximum hop count exceeded. M

Re: Adding quota to the ephemeral hdfs on a standalone spark cluster on ec2

2014-09-07 Thread Ognen Duzlevski
On 9/7/2014 7:27 AM, Tomer Benyamini wrote: 2. What should I do to increase the quota? Should I bring down the existing slaves and upgrade to ones with more storage? Is there a way to add disks to existing slaves? I'm using the default m1.large slaves set up using the spark-ec2 script. Take a l

Re: Running spark-shell (or queries) over the network (not from master)

2014-09-05 Thread Ognen Duzlevski
Ah. So there is some kind of a "back and forth" going on. Thanks! Ognen On 9/5/2014 5:34 PM, qihong wrote: Since you are using your home computer, so it's probably not reachable by EC2 from internet. You can try to set "spark.driver.host" to your WAN ip, "spark.driver.port" to a fixed port in S

Re: Running spark-shell (or queries) over the network (not from master)

2014-09-05 Thread Ognen Duzlevski
That is the command I ran and it still times out.Besides 7077 is there any other port that needs to be open? Thanks! Ognen On 9/5/2014 4:10 PM, qihong wrote: the command should be "spark-shell --master spark://:7077". -- View this message in context: http://apache-spark-user-list.1001560.n3

Re: Running spark-shell (or queries) over the network (not from master)

2014-09-05 Thread Ognen Duzlevski
On 9/5/2014 3:27 PM, anthonyjschu...@gmail.com wrote: I think that should be possible. Make sure spark is installed on your local machine and is the same version as on the cluster. It is the same version, I can telnet to master:7077 but when I run the spark-shell it times out. --

Running spark-shell (or queries) over the network (not from master)

2014-09-05 Thread Ognen Duzlevski
Is this possible? If i have a cluster set up on EC2 and I want to run spark-shell --master :7077 from my home computer - is this possible at all or am I wasting my time ;)? I am seeing a connection timeout when I try it. Thanks! Ognen --

Re: count vs countByValue in for/yield

2014-07-16 Thread Ognen Duzlevski
Hello all, Can anyone offer any insight on the below? Both are "legal" Spark but the first one works, the latter one does not. They both work on a local machine but in a standalone cluster the one with countByValue fails. Thanks! Ognen On 7/15/14, 2:23 PM, Ognen Duzlevski wrote:

count vs countByValue in for/yield

2014-07-15 Thread Ognen Duzlevski
Hello, I am curious about something: val result = for { (dt,evrdd) <- evrdds val ct = evrdd.count } yield (dt->ct) works. val result = for { (dt,evrdd) <- evrdds val ct = evrdd.countByValue } yield (dt->ct) does not work. I get: 14/07/15 16:46:33 WARN TaskSetMa

Re: Problem reading in LZO compressed files

2014-07-14 Thread Ognen Duzlevski
shell, I don’t have any more pointers for you. :( ​ On Sun, Jul 13, 2014 at 12:57 PM, Ognen Duzlevski mailto:ognen.duzlev...@gmail.com>> wrote: Nicholas, Thanks! How do I make spark assemble against a local version of Hadoop? I have 2.4.1 running on a test cluster and

Re: Problem reading in LZO compressed files

2014-07-13 Thread Ognen Duzlevski
.hadoop.io.LongWritable], classOf[org.apache.hadoop.io.Text]) | On a side note, here’s a related JIRA issue: SPARK-2394: Make it easier to read LZO-compressed files from EC2 clusters <https://issues.apache.org/jira/browse/SPARK-2394> Nick ​ On Sun, Jul 13, 2014 at 10:49 AM, Ognen Duzlevski

Problem reading in LZO compressed files

2014-07-13 Thread Ognen Duzlevski
Hello, I have been trying to play with the Google ngram dataset provided by Amazon in form of LZO compressed files. I am having trouble understanding what is going on ;). I have added the compression jar and native library to the underlying Hadoop/HDFS installation, restarted the name node a

Re: Running Spark alongside Hadoop

2014-06-20 Thread Ognen Duzlevski
I only ran HDFS on the same nodes as Spark and that worked out great performance and robustness wise. However, I did not run Hadoop itself to do any computations/jobs on the same nodes. My expectation is that if you actually ran both at the same time with your configuration, the performance wou

Re: Announcing Spark 1.0.0

2014-05-30 Thread Ognen Duzlevski
How exciting! Congratulations! :-) Ognen On 5/30/14, 5:12 AM, Patrick Wendell wrote: I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0 is a milestone release as the first in the 1.0 line of releases, providing API stability for Spark's core interfaces. Spark 1.0.0 is Spark's

Re: Spark with SSL?

2014-04-08 Thread Ognen Duzlevski
Ideally, you just run it in Amazon's VPC or whatever other providers' equivalent is. In this case running things over SSL would be an overkill. On 4/8/14, 3:31 PM, Andrew Ash wrote: Not that I know of, but it would be great if that was supported. The way I typically handle security now is to p

Calling Spark enthusiasts in Austin, TX

2014-03-31 Thread Ognen Duzlevski
In the spirit of everything being bigger and better in TX ;) => if anyone is in Austin and interested in meeting up over Spark - contact me! There seems to be a Spark meetup group in Austin that has never met and my initial email to organize the first gathering was never acknowledged. Ognen On

Re: Do all classes involving RDD operation need to be registered?

2014-03-28 Thread Ognen Duzlevski
There is also this quote from the Tuning guide (http://spark.incubator.apache.org/docs/latest/tuning.html): " Finally, if you don't register your classes, Kryo will still work, but it will have to store the full class name with each object, which is wasteful." It implies that you don't really

Re: GC overhead limit exceeded

2014-03-27 Thread Ognen Duzlevski
Look at the tuning guide on Spark's webpage for strategies to cope with this. I have run into quite a few memory issues like these, some are resolved by changing the StorageLevel strategy and employing things like Kryo, some are solved by specifying the number of tasks to break down a given ope

Re: Announcing Spark SQL

2014-03-26 Thread Ognen Duzlevski
Wow! Ognen On 3/26/14, 4:58 PM, Michael Armbrust wrote: Hey Everyone, This already went out to the dev list, but I wanted to put a pointer here as well to a new feature we are pretty excited about for Spark 1.0. http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-usi

Re: java.lang.ClassNotFoundException

2014-03-26 Thread Ognen Duzlevski
0/value-spark_2.10-1.0.jar On Wed, Mar 26, 2014 at 3:34 PM, Ognen Duzlevski mailto:og...@plainvanillagames.com>> wrote: Have you looked at the individual nodes logs? Can you post a bit more of the exception's output? On 3/26/14, 8:42 AM, Jaonary

Re: java.lang.ClassNotFoundException

2014-03-26 Thread Ognen Duzlevski
Have you looked at the individual nodes logs? Can you post a bit more of the exception's output? On 3/26/14, 8:42 AM, Jaonary Rabarisoa wrote: Hi all, I got java.lang.ClassNotFoundException even with "addJar" called. The jar file is present in each node. I use the version of spark from gith

Re: Writing RDDs to HDFS

2014-03-25 Thread Ognen Duzlevski
.cores.max _and_ spark.executor.memory. Just curious if I did something wrong. On Mon, Mar 24, 2014 at 7:48 PM, Ognen Duzlevski wrote: Just so I can close this thread (in case anyone else runs into this stuff) - I did sleep through the basics of Spark ;). The answer on why my job is in waiting stat

Re: Writing RDDs to HDFS

2014-03-24 Thread Ognen Duzlevski
part-1 etc.) (Presumably it does this because it allows each partition to be saved on the local disk, to minimize network traffic. It's how Hadoop works, too.) On Mon, Mar 24, 2014 at 5:00 PM, Ognen Duzlevski mailto:og...@nengoiksvelzud.com>> wrote: Is someRDD.save

Re: Writing RDDs to HDFS

2014-03-24 Thread Ognen Duzlevski
the local disk, to minimize network traffic. It's how Hadoop works, too.) On Mon, Mar 24, 2014 at 5:00 PM, Ognen Duzlevski mailto:og...@nengoiksvelzud.com>> wrote: Is someRDD.saveAsTextFile("hdfs://ip:port/path/final_filename.txt") supposed to work? Meaning, c

Re: Writing RDDs to HDFS

2014-03-24 Thread Ognen Duzlevski
expectation would be that I can submit multiple jobs at the same time and there would be some kind of a fair strategy to run them in turn. What Spark (basics) have a slept through? :) Thanks! Ognen On 3/24/14, 4:00 PM, Ognen Duzlevski wrote: Is someRDD.saveAsTextFile("hdfs://ip:port

Writing RDDs to HDFS

2014-03-24 Thread Ognen Duzlevski
Is someRDD.saveAsTextFile("hdfs://ip:port/path/final_filename.txt") supposed to work? Meaning, can I save files to the HDFS fs this way? I tried: val r = sc.parallelize(List(1,2,3,4,5,6,7,8)) r.saveAsTextFile("hdfs://ip:port/path/file.txt") and it is just hanging. At the same time on my HDFS i

Re: quick start guide: building a standalone scala program

2014-03-24 Thread Ognen Duzlevski
ch-.jar: No such file or directory /usr/lib/spark/sbt/sbt: line 33: sbt/sbt-launch-.jar: No such file or directory Our attempt to download sbt locally to sbt/sbt-launch-.jar failed. Please install sbt manually from http://www.scala-sbt.org/ On Mon, Mar 24, 2014 at 4:25 PM, Ognen Duzlevski

Re: quick start guide: building a standalone scala program

2014-03-24 Thread Ognen Duzlevski
bt". Did you download and install sbt separately? In following the Quick Start guide, that was not stated as a requirement, and I'm trying to run through the guide word for word. Diana On Mon, Mar 24, 2014 at 4:12 PM, Ognen Duzlevski mailto:og...@plainvanillagames.com>> wrote

Re: quick start guide: building a standalone scala program

2014-03-24 Thread Ognen Duzlevski
Diana, Anywhere on the filesystem you have read/write access (you need not be in your spark home directory): mkdir myproject cd myproject mkdir project mkdir target mkdir -p src/main/scala cp $mypath/$mymysource.scala src/main/scala/ cp $mypath/myproject.sbt . Make sure that myproject.sbt has

Re: No space left on device exception

2014-03-24 Thread Ognen Duzlevski
the rest of the slaves+master) and increasing. Ognen On 3/24/14, 7:00 AM, Ognen Duzlevski wrote: Patrick, correct. I have a 16 node cluster. On 14 machines out of 16, the inode usage was about 50%. On two of the slaves, one had inode usage of 96% and on the other it was 100%. When i went into

Re: No space left on device exception

2014-03-24 Thread Ognen Duzlevski
correct? If so, that's good to know because it's definitely counter intuitive. On Sun, Mar 23, 2014 at 8:36 PM, Ognen Duzlevski wrote: I would love to work on this (and other) stuff if I can bother someone with questions offline or on a dev mailing list. Ognen On 3/23/14, 10:04

Re: No space left on device exception

2014-03-23 Thread Ognen Duzlevski
issue which is not on our current roadmap for state cleanup (cleaning up data which was not fully cleaned up from a crashed process). On Sun, Mar 23, 2014 at 7:57 PM, Ognen Duzlevski mailto:og...@plainvanillagames.com>> wrote: Bleh, strike that, one of my slaves was at 100% inode

Re: No space left on device exception

2014-03-23 Thread Ognen Duzlevski
(and sorry for the noise)! Ognen On 3/23/14, 9:52 PM, Ognen Duzlevski wrote: Aaron, thanks for replying. I am very much puzzled as to what is going on. A job that used to run on the same cluster is failing with this mysterious message about not having enough disk space when in fact I can see

Re: No space left on device exception

2014-03-23 Thread Ognen Duzlevski
nless each partition is particularly small. You might look at the actual executors' logs, as it's possible that this error was caused by an earlier exception, such as "too many open files". On Sun, Mar 23, 2014 at 4:46 PM, Ognen Duzlevski mailto:og...@plainvanillagames.com&g

Re: No space left on device exception

2014-03-23 Thread Ognen Duzlevski
On 3/23/14, 5:49 PM, Matei Zaharia wrote: You can set spark.local.dir to put this data somewhere other than /tmp if /tmp is full. Actually it’s recommended to have multiple local disks and set to to a comma-separated list of directories, one per disk. Matei, does the number of tasks/partitions i

Re: No space left on device exception

2014-03-23 Thread Ognen Duzlevski
On 3/23/14, 5:35 PM, Aaron Davidson wrote: On some systems, /tmp/ is an in-memory tmpfs file system, with its own size limit. It's possible that this limit has been exceeded. You might try running the "df" command to check to free space of "/tmp" or root if tmp isn't listed. 3 GB also seems

No space left on device exception

2014-03-23 Thread Ognen Duzlevski
Hello, I have a weird error showing up when I run a job on my Spark cluster. The version of spark is 0.9 and I have 3+ GB free on the disk when this error shows up. Any ideas what I should be looking for? [error] (run-main-0) org.apache.spark.SparkException: Job aborted: Task 167.0:3 failed

Parallelizing job execution

2014-03-21 Thread Ognen Duzlevski
Hello, I have a task that runs on a week's worth of data (let's say) and produces a Set of tuples such as Set[(String,Long)] (essentially output of countByValue.toMap) I want to produce 4 sets, one each for a different week and run an intersection of the 4 sets. I have the sequential appro

Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1

2014-03-18 Thread Ognen Duzlevski
On 3/18/14, 4:49 AM, dmpou...@gmail.com wrote: On Sunday, 2 March 2014 19:19:49 UTC+2, Aureliano Buendia wrote: Is there a reason for spark using the older akka? On Sun, Mar 2, 2014 at 1:53 PM, 1esha wrote: The problem is in akka remote. It contains files compiled with 2.4.*. When you r

Re: parson json within rdd's filter()

2014-03-13 Thread Ognen Duzlevski
OK, problem solved. Interesting thing - I separated the jsonMatches function below and put it in as a method to a separate file/object. Once done that way, it all serializes and works. Ognen On 3/13/14, 11:52 AM, Ognen Duzlevski wrote: I even tried this: def jsonMatches(line:String

Re: parson json within rdd's filter()

2014-03-13 Thread Ognen Duzlevski
I even tried this: def jsonMatches(line:String):Boolean = true It is still failing with the same error. Ognen On 3/13/14, 11:45 AM, Ognen Duzlevski wrote: I must be really dense! :) Here is the most simplified version of the code, I removed a bunch of stuff and hard-coded the "event

Re: parson json within rdd's filter()

2014-03-13 Thread Ognen Duzlevski
the non-serializable bits and you'll get the exception you're seeing now. — p...@mult.ifario.us <mailto:p...@mult.ifario.us> | Multifarious, Inc. | http://mult.ifario.us/ On Thu, Mar 13, 2014 at 9:20 AM, Ognen Duzlevski mailto:og...@plainvanillagames.com>> wrote: Hmm.

Re: parson json within rdd's filter()

2014-03-13 Thread Ognen Duzlevski
races. — p...@mult.ifario.us <mailto:p...@mult.ifario.us> | Multifarious, Inc. | http://mult.ifario.us/ On Thu, Mar 13, 2014 at 8:04 AM, Ognen Duzlevski mailto:og...@nengoiksvelzud.com>> wrote: Hello, Is there anything special about calling functions that parse json

parson json within rdd's filter()

2014-03-13 Thread Ognen Duzlevski
Hello, Is there anything special about calling functions that parse json lines from filter? I have code that looks like this: jsonMatches(line:String):Boolean = { take a line in json format val jline=parse(line) val je = jline \ "event" if (je != JNothing && je.values.toString == user

Re: Sharing SparkContext

2014-03-10 Thread Ognen Duzlevski
Are you using it with HDFS? What version of Hadoop? 1.0.4? Ognen On 3/10/14, 8:49 PM, abhinav chowdary wrote: for any one who is interested to know about job server from Ooyala.. we started using it recently and been working great so far.. On Feb 25, 2014 9:23 PM, "Ognen Duzl

Re: Room for rent in Aptos

2014-03-10 Thread Ognen Duzlevski
Probably unintentional :) Ognen P.S. I have a house for rent avail nah, just kidding! :) On 3/10/14, 1:54 PM, Muttineni, Vinay wrote: Why's this here? *From:*vaquar khan [mailto:vaquar.k...@gmail.com] *Sent:* Monday, March 10, 2014 11:43 AM *To:* user@spark.apache.org *Subject:* Re: Room

Re: Running actions in loops

2014-03-07 Thread Ognen Duzlevski
tps://twitter.com/mayur_rustagi> On Thu, Mar 6, 2014 at 9:50 PM, Ognen Duzlevski mailto:og...@plainvanillagames.com>> wrote: It looks like the problem is in the filter task - is there anything special about filter()? I have removed the filter line from the loops just to see if

Re: [BLOG] Spark on Cassandra w/ Calliope

2014-03-07 Thread Ognen Duzlevski
Nice, thanks :) Ognen On 3/7/14, 2:48 PM, Brian O'Neill wrote: FWIW - I posted some notes to help people get started quickly with Spark on C*. http://brianoneill.blogspot.com/2014/03/spark-on-cassandra-w-calliope.html (tnx again to Rohit and team for all of their help) -brian -- Brian ONei

Re: Can anyone offer any insight at all?

2014-03-07 Thread Ognen Duzlevski
ayur Rustagi wrote: the issue was with print? printing on worker? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Fri, Mar 7, 2014 at 10:43 AM, Ognen Duzlevski mailto:og...@plainvanillagames.com>> wrote: Stri

Re: Can anyone offer any insight at all?

2014-03-07 Thread Ognen Duzlevski
Strike that. Figured it out. Don't you just hate it when you fire off an email and you figure it out as it is being sent? ;) Ognen On 3/7/14, 12:41 PM, Ognen Duzlevski wrote: What is wrong with this code? A condensed set of this code works in the spark-shell. It does not work when dep

Can anyone offer any insight at all?

2014-03-07 Thread Ognen Duzlevski
What is wrong with this code? A condensed set of this code works in the spark-shell. It does not work when deployed via a jar. def calcSimpleRetention(start:String,end:String,event1:String,event2:String):List[Double] = { val spd = new PipelineDate(start) val epd = new PipelineDate(en

Re: Running actions in loops

2014-03-06 Thread Ognen Duzlevski
It looks like the problem is in the filter task - is there anything special about filter()? I have removed the filter line from the loops just to see if things will work and they do. Anyone has any ideas? Thanks! Ognen On 3/6/14, 9:39 PM, Ognen Duzlevski wrote: Hello, What is the general

Running actions in loops

2014-03-06 Thread Ognen Duzlevski
Hello, What is the general approach people take when trying to do analysis across multiple large files where the data to be extracted from a successive file depends on the data extracted from a previous file or set of files? For example: I have the following: a group of HDFS files each 20+GB

Re: Spark Worker crashing and Master not seeing recovered worker

2014-03-05 Thread Ognen Duzlevski
Rob, I have seen this too. I have 16 nodes in my spark cluster and for some reason (after app failures) one of the workers will go offline. I will ssh to the machine in question and find that the java process is running but for some reason the master is not noticing this. I have not had the t

Re: Actors and sparkcontext actions

2014-03-04 Thread Ognen Duzlevski
Deb, On 3/4/14, 9:02 AM, Debasish Das wrote: Hi Ognen, Any particular reason of choosing scalatra over options like play or spray ? Is scalatra much better in serving apis or is it due to similarity with ruby's sinatra ? Did you try the other options and then pick scalatra ? Not really. I

Re: Actors and sparkcontext actions

2014-03-04 Thread Ognen Duzlevski
def count(){ println(rdd.count) //do the counting } } Thanks and Regards, Suraj Sheth -Original Message- From: Ognen Duzlevski [mailto:og...@nengoiksvelzud.com] Sent: 27 February 2014 01:09 To: u...@spark.incubator.apache.org Subject: Actors and sparkcontext actions Can

Re: Missing Spark URL after staring the master

2014-03-03 Thread Ognen Duzlevski
et of scripts to automate it all... Ognen On 3/3/14, 3:02 PM, Ognen Duzlevski wrote: I have a Standalone spark cluster running in an Amazon VPC that I set up by hand. All I did was provision the machines from a common AMI image (my underlying distribution is Ubuntu), I created a "sparkuser

Re: Missing Spark URL after staring the master

2014-03-03 Thread Ognen Duzlevski
I have a Standalone spark cluster running in an Amazon VPC that I set up by hand. All I did was provision the machines from a common AMI image (my underlying distribution is Ubuntu), I created a "sparkuser" on each machine and I have a /home/sparkuser/spark folder where I downladed spark. I did

Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1

2014-02-28 Thread Ognen Duzlevski
A stupid question, by the way, you did compile Spark with Hadoop 2.2.0 support? Ognen On 2/28/14, 10:51 AM, Prasad wrote: Hi I am getting the protobuf error while reading HDFS file using spark 0.9.0 -- i am running on hadoop 2.2.0 . When i look thru, i find that i have both 2.4.1 and 2.5 a

Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1

2014-02-28 Thread Ognen Duzlevski
I run a 2.2.0 based HDFS cluster and I use Spark-0.9.0 without any problems to read the files. Ognen On 2/28/14, 10:51 AM, Prasad wrote: Hi I am getting the protobuf error while reading HDFS file using spark 0.9.0 -- i am running on hadoop 2.2.0 . When i look thru, i find that i have both

Scalatra servlet with actors and SparkContext

2014-02-27 Thread Ognen Duzlevski
I spent a week trying to figure this out and I think I finally did. My write up is here: http://corripio.blogspot.com/2014/02/scalatra-with-actors-command-spark-at.html I am sure for most of you this is basic - sorry for wasting bandwidth if this is the case. I am a Scala/Spark noob so for me t

Actors and sparkcontext actions

2014-02-26 Thread Ognen Duzlevski
Can someone point me to a simple, short code example of creating a basic Actor that gets a context and runs an operation such as .textFile.count? I am trying to figure out how to create just a basic actor that gets a message like this: case class Msg(filename:String, ctx: SparkContext) and th

Re: Sharing SparkContext

2014-02-25 Thread Ognen Duzlevski
m> https://twitter.com/mayur_rustagi On Tue, Feb 25, 2014 at 10:24 AM, Ognen Duzlevski mailto:og...@nengoiksvelzud.com>> wrote: Doesn't the fair scheduler solve this? Ognen On 2/25/14, 12:08 PM, abhinav chowdary wrote: Sorry for not being clear earlier how do y

Re: Sharing SparkContext

2014-02-25 Thread Ognen Duzlevski
On 2/25/14, 12:24 PM, Mayur Rustagi wrote: So there is no way to share context currently, 1. you can try jobserver by Ooyala but I havnt used it & frankly nobody has shared feedback on it. One of the major show stoppers for me is that when compiled with Hadoop 2.2.0 - Ooyala standalone serve

Re: Sharing SparkContext

2014-02-25 Thread Ognen Duzlevski
Doesn't the fair scheduler solve this? Ognen On 2/25/14, 12:08 PM, abhinav chowdary wrote: Sorry for not being clear earlier how do you want to pass the operations to the spark context? this is partly what i am looking for . How to access the active spark context and possible ways to pass opera