as well under the ignite
> project.
>
>
> -Original Message-
> From: Jay Vyas [mailto:jayunit100.apa...@gmail.com]
> Sent: Thursday, February 26, 2015 3:40 PM
> To: Sean Owen
> Cc: Ognen Duzlevski; user@spark.apache.org
> Subject: Re: Apache Ignite vs Apache Spark
>
Can someone with experience briefly share or summarize the differences
between Ignite and Spark? Are they complementary? Totally unrelated?
Overlapping? Seems like ignite has reached version 1.0, I have never heard
of it until a few days ago and given what is advertised, it sounds pretty
interestin
On Sat, Feb 21, 2015 at 8:54 AM, Deep Pradhan
wrote:
> No, I am talking about some work parallel to prediction works that are
> done on GPUs. Like say, given the data for smaller number of nodes in a
> Spark cluster, the prediction needs to be done about the time that the
> application would take
On Sun, Nov 23, 2014 at 1:03 PM, Ashish Rangole wrote:
> Java or Scala : I knew Java already yet I learnt Scala when I came across
> Spark. As others have said, you can get started with a little bit of Scala
> and learn more as you progress. Once you have started using Scala for a few
> weeks you
Ashic,
Thanks for your email.
Two things:
1. I think a whole lot of data scientists and other people would love
it if they could just fire off jobs from their laptops. It is, in my
opinion, a common desired use case.
2. Did anyone actually get the Ooyala job server to work? I asked that
questio
in mind there is a non-trivial amount of traffic between the
driver and cluster. It's not something I would do by default, running
the driver so remotely. With enough ports open it should work though.
On Sun, Sep 7, 2014 at 7:05 PM, Ognen Duzlevski
wrote:
Horacio,
Thanks, I have not tried
Horacio,
Thanks, I have not tried that, however, I am not after security right
now - I am just wondering why something so obvious won't work ;)
Ognen
On 9/7/2014 12:38 PM, Horacio G. de Oro wrote:
Have you tryied with ssh? It will be much secure (only 1 port open),
and you'll be able to run
Have you actually tested this?
I have two instances, one is standalone master and the other one just
has spark installed, same versions of spark (1.0.0).
The security group on the master allows all (0-65535) TCP and UDP
traffic from the other machine and the other machine allows all TCP/UDP
I keep getting below reply every time I send a message to the Spark user
list? Can this person be taken off the list by powers that be?
Thanks!
Ognen
Forwarded Message
Subject: DELIVERY FAILURE: Error transferring to
QCMBSJ601.HERMES.SI.SOCGEN; Maximum hop count exceeded. M
On 9/7/2014 7:27 AM, Tomer Benyamini wrote:
2. What should I do to increase the quota? Should I bring down the
existing slaves and upgrade to ones with more storage? Is there a way
to add disks to existing slaves? I'm using the default m1.large slaves
set up using the spark-ec2 script.
Take a l
Ah. So there is some kind of a "back and forth" going on. Thanks!
Ognen
On 9/5/2014 5:34 PM, qihong wrote:
Since you are using your home computer, so it's probably not reachable by EC2
from internet.
You can try to set "spark.driver.host" to your WAN ip, "spark.driver.port"
to a fixed port in S
That is the command I ran and it still times out.Besides 7077 is there
any other port that needs to be open?
Thanks!
Ognen
On 9/5/2014 4:10 PM, qihong wrote:
the command should be "spark-shell --master spark://:7077".
--
View this message in context:
http://apache-spark-user-list.1001560.n3
On 9/5/2014 3:27 PM, anthonyjschu...@gmail.com wrote:
I think that should be possible. Make sure spark is installed on your local
machine and is the same version as on the cluster.
It is the same version, I can telnet to master:7077 but when I run the
spark-shell it times out.
--
Is this possible? If i have a cluster set up on EC2 and I want to run
spark-shell --master :7077 from my home computer -
is this possible at all or am I wasting my time ;)? I am seeing a
connection timeout when I try it.
Thanks!
Ognen
--
Hello all,
Can anyone offer any insight on the below?
Both are "legal" Spark but the first one works, the latter one does not.
They both work on a local machine but in a standalone cluster the one
with countByValue fails.
Thanks!
Ognen
On 7/15/14, 2:23 PM, Ognen Duzlevski wrote:
Hello,
I am curious about something:
val result = for {
(dt,evrdd) <- evrdds
val ct = evrdd.count
} yield (dt->ct)
works.
val result = for {
(dt,evrdd) <- evrdds
val ct = evrdd.countByValue
} yield (dt->ct)
does not work. I get:
14/07/15 16:46:33 WARN TaskSetMa
shell, I
don’t have any more pointers for you. :(
On Sun, Jul 13, 2014 at 12:57 PM, Ognen Duzlevski
mailto:ognen.duzlev...@gmail.com>> wrote:
Nicholas,
Thanks!
How do I make spark assemble against a local version of Hadoop?
I have 2.4.1 running on a test cluster and
.hadoop.io.LongWritable],
classOf[org.apache.hadoop.io.Text])
|
On a side note, here’s a related JIRA issue: SPARK-2394: Make it
easier to read LZO-compressed files from EC2 clusters
<https://issues.apache.org/jira/browse/SPARK-2394>
Nick
On Sun, Jul 13, 2014 at 10:49 AM, Ognen Duzlevski
Hello,
I have been trying to play with the Google ngram dataset provided by
Amazon in form of LZO compressed files.
I am having trouble understanding what is going on ;). I have added the
compression jar and native library to the underlying Hadoop/HDFS
installation, restarted the name node a
I only ran HDFS on the same nodes as Spark and that worked out great
performance and robustness wise. However, I did not run Hadoop itself to
do any computations/jobs on the same nodes. My expectation is that if
you actually ran both at the same time with your configuration, the
performance wou
How exciting! Congratulations! :-)
Ognen
On 5/30/14, 5:12 AM, Patrick Wendell wrote:
I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0
is a milestone release as the first in the 1.0 line of releases,
providing API stability for Spark's core interfaces.
Spark 1.0.0 is Spark's
Ideally, you just run it in Amazon's VPC or whatever other providers'
equivalent is. In this case running things over SSL would be an overkill.
On 4/8/14, 3:31 PM, Andrew Ash wrote:
Not that I know of, but it would be great if that was supported. The
way I typically handle security now is to p
In the spirit of everything being bigger and better in TX ;) => if
anyone is in Austin and interested in meeting up over Spark - contact
me! There seems to be a Spark meetup group in Austin that has never met
and my initial email to organize the first gathering was never acknowledged.
Ognen
On
There is also this quote from the Tuning guide
(http://spark.incubator.apache.org/docs/latest/tuning.html):
" Finally, if you don't register your classes, Kryo will still work, but
it will have to store the full class name with each object, which is
wasteful."
It implies that you don't really
Look at the tuning guide on Spark's webpage for strategies to cope with
this.
I have run into quite a few memory issues like these, some are resolved
by changing the StorageLevel strategy and employing things like Kryo,
some are solved by specifying the number of tasks to break down a given
ope
Wow!
Ognen
On 3/26/14, 4:58 PM, Michael Armbrust wrote:
Hey Everyone,
This already went out to the dev list, but I wanted to put a pointer
here as well to a new feature we are pretty excited about for Spark 1.0.
http://databricks.com/blog/2014/03/26/Spark-SQL-manipulating-structured-data-usi
0/value-spark_2.10-1.0.jar
On Wed, Mar 26, 2014 at 3:34 PM, Ognen Duzlevski
mailto:og...@plainvanillagames.com>>
wrote:
Have you looked at the individual nodes logs? Can you post a
bit more of the exception's output?
On 3/26/14, 8:42 AM, Jaonary
Have you looked at the individual nodes logs? Can you post a bit more of
the exception's output?
On 3/26/14, 8:42 AM, Jaonary Rabarisoa wrote:
Hi all,
I got java.lang.ClassNotFoundException even with "addJar" called. The
jar file is present in each node.
I use the version of spark from gith
.cores.max
_and_ spark.executor.memory. Just curious if I did something wrong.
On Mon, Mar 24, 2014 at 7:48 PM, Ognen Duzlevski
wrote:
Just so I can close this thread (in case anyone else runs into this stuff) -
I did sleep through the basics of Spark ;). The answer on why my job is in
waiting stat
part-1 etc.)
(Presumably it does this because it allows each partition to be saved
on the local disk, to minimize network traffic. It's how Hadoop
works, too.)
On Mon, Mar 24, 2014 at 5:00 PM, Ognen Duzlevski
mailto:og...@nengoiksvelzud.com>> wrote:
Is
someRDD.save
the local disk, to minimize network traffic. It's how Hadoop
works, too.)
On Mon, Mar 24, 2014 at 5:00 PM, Ognen Duzlevski
mailto:og...@nengoiksvelzud.com>> wrote:
Is
someRDD.saveAsTextFile("hdfs://ip:port/path/final_filename.txt")
supposed to work? Meaning, c
expectation would be that I can submit multiple jobs at the same time
and there would be some kind of a fair strategy to run them in turn.
What Spark (basics) have a slept through? :)
Thanks!
Ognen
On 3/24/14, 4:00 PM, Ognen Duzlevski wrote:
Is someRDD.saveAsTextFile("hdfs://ip:port
Is someRDD.saveAsTextFile("hdfs://ip:port/path/final_filename.txt")
supposed to work? Meaning, can I save files to the HDFS fs this way?
I tried:
val r = sc.parallelize(List(1,2,3,4,5,6,7,8))
r.saveAsTextFile("hdfs://ip:port/path/file.txt")
and it is just hanging. At the same time on my HDFS i
ch-.jar: No such file or
directory
/usr/lib/spark/sbt/sbt: line 33: sbt/sbt-launch-.jar: No such file or
directory
Our attempt to download sbt locally to sbt/sbt-launch-.jar failed.
Please install sbt manually from http://www.scala-sbt.org/
On Mon, Mar 24, 2014 at 4:25 PM, Ognen Duzlevski
bt".
Did you download and install sbt separately? In following the Quick
Start guide, that was not stated as a requirement, and I'm trying to
run through the guide word for word.
Diana
On Mon, Mar 24, 2014 at 4:12 PM, Ognen Duzlevski
mailto:og...@plainvanillagames.com>> wrote
Diana,
Anywhere on the filesystem you have read/write access (you need not be
in your spark home directory):
mkdir myproject
cd myproject
mkdir project
mkdir target
mkdir -p src/main/scala
cp $mypath/$mymysource.scala src/main/scala/
cp $mypath/myproject.sbt .
Make sure that myproject.sbt has
the rest of the slaves+master) and increasing.
Ognen
On 3/24/14, 7:00 AM, Ognen Duzlevski wrote:
Patrick, correct. I have a 16 node cluster. On 14 machines out of 16,
the inode usage was about 50%. On two of the slaves, one had inode
usage of 96% and on the other it was 100%. When i went into
correct? If so, that's good to know because it's definitely counter
intuitive.
On Sun, Mar 23, 2014 at 8:36 PM, Ognen Duzlevski
wrote:
I would love to work on this (and other) stuff if I can bother someone with
questions offline or on a dev mailing list.
Ognen
On 3/23/14, 10:04
issue which is not
on our current roadmap for state cleanup (cleaning up data which was
not fully cleaned up from a crashed process).
On Sun, Mar 23, 2014 at 7:57 PM, Ognen Duzlevski
mailto:og...@plainvanillagames.com>> wrote:
Bleh, strike that, one of my slaves was at 100% inode
(and sorry for the noise)!
Ognen
On 3/23/14, 9:52 PM, Ognen Duzlevski wrote:
Aaron, thanks for replying. I am very much puzzled as to what is going
on. A job that used to run on the same cluster is failing with this
mysterious message about not having enough disk space when in fact I
can see
nless each partition is particularly small.
You might look at the actual executors' logs, as it's possible that
this error was caused by an earlier exception, such as "too many open
files".
On Sun, Mar 23, 2014 at 4:46 PM, Ognen Duzlevski
mailto:og...@plainvanillagames.com&g
On 3/23/14, 5:49 PM, Matei Zaharia wrote:
You can set spark.local.dir to put this data somewhere other than /tmp
if /tmp is full. Actually it’s recommended to have multiple local
disks and set to to a comma-separated list of directories, one per disk.
Matei, does the number of tasks/partitions i
On 3/23/14, 5:35 PM, Aaron Davidson wrote:
On some systems, /tmp/ is an in-memory tmpfs file system, with its own
size limit. It's possible that this limit has been exceeded. You might
try running the "df" command to check to free space of "/tmp" or root
if tmp isn't listed.
3 GB also seems
Hello,
I have a weird error showing up when I run a job on my Spark cluster.
The version of spark is 0.9 and I have 3+ GB free on the disk when this
error shows up. Any ideas what I should be looking for?
[error] (run-main-0) org.apache.spark.SparkException: Job aborted: Task
167.0:3 failed
Hello,
I have a task that runs on a week's worth of data (let's say) and
produces a Set of tuples such as Set[(String,Long)] (essentially output
of countByValue.toMap)
I want to produce 4 sets, one each for a different week and run an
intersection of the 4 sets.
I have the sequential appro
On 3/18/14, 4:49 AM, dmpou...@gmail.com wrote:
On Sunday, 2 March 2014 19:19:49 UTC+2, Aureliano Buendia wrote:
Is there a reason for spark using the older akka?
On Sun, Mar 2, 2014 at 1:53 PM, 1esha wrote:
The problem is in akka remote. It contains files compiled with 2.4.*. When
you r
OK, problem solved.
Interesting thing - I separated the jsonMatches function below and put
it in as a method to a separate file/object. Once done that way, it all
serializes and works.
Ognen
On 3/13/14, 11:52 AM, Ognen Duzlevski wrote:
I even tried this:
def jsonMatches(line:String
I even tried this:
def jsonMatches(line:String):Boolean = true
It is still failing with the same error.
Ognen
On 3/13/14, 11:45 AM, Ognen Duzlevski wrote:
I must be really dense! :)
Here is the most simplified version of the code, I removed a bunch of
stuff and hard-coded the "event
the non-serializable bits and you'll get the
exception you're seeing now.
—
p...@mult.ifario.us <mailto:p...@mult.ifario.us> | Multifarious, Inc. |
http://mult.ifario.us/
On Thu, Mar 13, 2014 at 9:20 AM, Ognen Duzlevski
mailto:og...@plainvanillagames.com>> wrote:
Hmm.
races.
—
p...@mult.ifario.us <mailto:p...@mult.ifario.us> | Multifarious, Inc. |
http://mult.ifario.us/
On Thu, Mar 13, 2014 at 8:04 AM, Ognen Duzlevski
mailto:og...@nengoiksvelzud.com>> wrote:
Hello,
Is there anything special about calling functions that parse json
Hello,
Is there anything special about calling functions that parse json lines
from filter?
I have code that looks like this:
jsonMatches(line:String):Boolean = {
take a line in json format
val jline=parse(line)
val je = jline \ "event"
if (je != JNothing && je.values.toString == user
Are you using it with HDFS? What version of Hadoop? 1.0.4?
Ognen
On 3/10/14, 8:49 PM, abhinav chowdary wrote:
for any one who is interested to know about job server from Ooyala..
we started using it recently and been working great so far..
On Feb 25, 2014 9:23 PM, "Ognen Duzl
Probably unintentional :)
Ognen
P.S. I have a house for rent avail nah, just kidding! :)
On 3/10/14, 1:54 PM, Muttineni, Vinay wrote:
Why's this here?
*From:*vaquar khan [mailto:vaquar.k...@gmail.com]
*Sent:* Monday, March 10, 2014 11:43 AM
*To:* user@spark.apache.org
*Subject:* Re: Room
tps://twitter.com/mayur_rustagi>
On Thu, Mar 6, 2014 at 9:50 PM, Ognen Duzlevski
mailto:og...@plainvanillagames.com>> wrote:
It looks like the problem is in the filter task - is there
anything special about filter()?
I have removed the filter line from the loops just to see if
Nice, thanks :)
Ognen
On 3/7/14, 2:48 PM, Brian O'Neill wrote:
FWIW - I posted some notes to help people get started quickly with
Spark on C*.
http://brianoneill.blogspot.com/2014/03/spark-on-cassandra-w-calliope.html
(tnx again to Rohit and team for all of their help)
-brian
--
Brian ONei
ayur Rustagi wrote:
the issue was with print?
printing on worker?
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>
On Fri, Mar 7, 2014 at 10:43 AM, Ognen Duzlevski
mailto:og...@plainvanillagames.com>> wrote:
Stri
Strike that. Figured it out. Don't you just hate it when you fire off an
email and you figure it out as it is being sent? ;)
Ognen
On 3/7/14, 12:41 PM, Ognen Duzlevski wrote:
What is wrong with this code?
A condensed set of this code works in the spark-shell.
It does not work when dep
What is wrong with this code?
A condensed set of this code works in the spark-shell.
It does not work when deployed via a jar.
def
calcSimpleRetention(start:String,end:String,event1:String,event2:String):List[Double]
= {
val spd = new PipelineDate(start)
val epd = new PipelineDate(en
It looks like the problem is in the filter task - is there anything
special about filter()?
I have removed the filter line from the loops just to see if things will
work and they do.
Anyone has any ideas?
Thanks!
Ognen
On 3/6/14, 9:39 PM, Ognen Duzlevski wrote:
Hello,
What is the general
Hello,
What is the general approach people take when trying to do analysis
across multiple large files where the data to be extracted from a
successive file depends on the data extracted from a previous file or
set of files?
For example:
I have the following: a group of HDFS files each 20+GB
Rob,
I have seen this too. I have 16 nodes in my spark cluster and for some
reason (after app failures) one of the workers will go offline. I will
ssh to the machine in question and find that the java process is running
but for some reason the master is not noticing this. I have not had the
t
Deb,
On 3/4/14, 9:02 AM, Debasish Das wrote:
Hi Ognen,
Any particular reason of choosing scalatra over options like play or
spray ?
Is scalatra much better in serving apis or is it due to similarity
with ruby's sinatra ?
Did you try the other options and then pick scalatra ?
Not really. I
def count(){
println(rdd.count)
//do the counting
}
}
Thanks and Regards,
Suraj Sheth
-Original Message-
From: Ognen Duzlevski [mailto:og...@nengoiksvelzud.com]
Sent: 27 February 2014 01:09
To: u...@spark.incubator.apache.org
Subject: Actors and sparkcontext actions
Can
et of scripts to automate it all...
Ognen
On 3/3/14, 3:02 PM, Ognen Duzlevski wrote:
I have a Standalone spark cluster running in an Amazon VPC that I set
up by hand. All I did was provision the machines from a common AMI
image (my underlying distribution is Ubuntu), I created a "sparkuser
I have a Standalone spark cluster running in an Amazon VPC that I set up
by hand. All I did was provision the machines from a common AMI image
(my underlying distribution is Ubuntu), I created a "sparkuser" on each
machine and I have a /home/sparkuser/spark folder where I downladed
spark. I did
A stupid question, by the way, you did compile Spark with Hadoop 2.2.0
support?
Ognen
On 2/28/14, 10:51 AM, Prasad wrote:
Hi
I am getting the protobuf error while reading HDFS file using spark
0.9.0 -- i am running on hadoop 2.2.0 .
When i look thru, i find that i have both 2.4.1 and 2.5 a
I run a 2.2.0 based HDFS cluster and I use Spark-0.9.0 without any
problems to read the files.
Ognen
On 2/28/14, 10:51 AM, Prasad wrote:
Hi
I am getting the protobuf error while reading HDFS file using spark
0.9.0 -- i am running on hadoop 2.2.0 .
When i look thru, i find that i have both
I spent a week trying to figure this out and I think I finally did.
My write up is here:
http://corripio.blogspot.com/2014/02/scalatra-with-actors-command-spark-at.html
I am sure for most of you this is basic - sorry for wasting bandwidth if
this is the case. I am a Scala/Spark noob so for me t
Can someone point me to a simple, short code example of creating a basic
Actor that gets a context and runs an operation such as .textFile.count?
I am trying to figure out how to create just a basic actor that gets a
message like this:
case class Msg(filename:String, ctx: SparkContext)
and th
m>
https://twitter.com/mayur_rustagi
On Tue, Feb 25, 2014 at 10:24 AM, Ognen Duzlevski
mailto:og...@nengoiksvelzud.com>> wrote:
Doesn't the fair scheduler solve this?
Ognen
On 2/25/14, 12:08 PM, abhinav chowdary wrote:
Sorry for not being clear earlier
how do y
On 2/25/14, 12:24 PM, Mayur Rustagi wrote:
So there is no way to share context currently,
1. you can try jobserver by Ooyala but I havnt used it & frankly
nobody has shared feedback on it.
One of the major show stoppers for me is that when compiled with Hadoop
2.2.0 - Ooyala standalone serve
Doesn't the fair scheduler solve this?
Ognen
On 2/25/14, 12:08 PM, abhinav chowdary wrote:
Sorry for not being clear earlier
how do you want to pass the operations to the spark context?
this is partly what i am looking for . How to access the active spark
context and possible ways to pass opera
72 matches
Mail list logo