Ok Thanks!
On Mon, May 19, 2014 at 10:09 PM, Matei Zaharia wrote:
> This is the patch for it: https://github.com/apache/spark/pull/50/. It
> might be possible to backport it to 0.8.
>
> Matei
>
> On May 19, 2014, at 2:04 AM, Sai Prasanna wrote:
>
> Matei, I am using 0.8
lso be
> in 0.9.0).
>
> Matei
>
> On May 19, 2014, at 12:41 AM, Sai Prasanna
> wrote:
>
> > Hi all,
> >
> > When i gave the persist level as DISK_ONLY, still Spark tries to use
> memory and caches.
> > Any reason ?
> > Do i need to override some parameter elsewhere ?
> >
> > Thanks !
>
>
Hi all,
When i gave the persist level as DISK_ONLY, still Spark tries to use memory
and caches.
Any reason ?
Do i need to override some parameter elsewhere ?
Thanks !
Hi Everyone,
I think all are pretty busy, the response time in this group has slightly
increased.
But anyways, this is a pretty silly problem, but could not get over.
I have a file in my localFS, but when i try to create an RDD out of it,
tasks fails with file not found exception is thrown at th
Hi,
Is there any lower-bound on the size of RDD to optimally utilize the
in-memory framework Spark.
Say creating RDD for very small data set of some 64 MB is not as efficient
as that of some 256 MB, then accordingly the application can be tuned.
So is there a soft-lowerbound related to hadoop-blo
Hi,
Can we override the default file-replication factor while using
saveAsTextFile() to HDFS.
My default repl.factor is >1. But intermediate files that i want to put in
HDFS while running a SPARK query need not be replicated, so is there a way ?
Thanks !
Hi All,
I wanted to launch Spark on Yarn, interactive - yarn client mode.
With default settings of yarn-site.xml and spark-env.sh, i followed the
given link
http://spark.apache.org/docs/0.8.1/running-on-yarn.html
I get the pi value correct when i run without launching the shell.
When i launch t
, create an RDD out of it operate *
Is there any way out ??
Thanks in advance !
On Fri, May 9, 2014 at 12:18 AM, Sai Prasanna wrote:
> Hi Everyone,
>
> I think all are pretty busy, the response time in this group has slightly
> increased.
>
> But anyways, this is a pretty silly
I executed the following commands to launch spark app with yarn client
mode. I have Hadoop 2.3.0, Spark 0.8.1 and Scala 2.9.3
SPARK_HADOOP_VERSION=2.3.0 SPARK_YARN=true sbt/sbt assembly
SPARK_YARN_MODE=true \
SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.3.0.jar
Hi, Any suggestion to the following issue ??
I have replication factor 3 in my HDFS.
With 3 datanodes, i ran my experiments. Now i just added another node to it
with no data in it.
When i ran, SPARK launches non-local tasks in it and the time taken is more
than what it took for 3 node cluster.
He
Hi All,
I have replication factor 3 in my HDFS.
With 3 datanodes, i ran my experiments. Now i just added another node to it
with no data in it.
When i ran, SPARK launches non-local tasks in it and the time taken is more
than what it took for 3 node cluster.
Here delayed scheduling fails i think b
Hi All,
I want to store a csv-text file in Parquet format in HDFS and then do some
processing in Spark.
Somehow my search to find the way to do was futile. More help was available
for parquet with impala.
Any guidance here? Thanks !!
astOptionis
> used instead of
> last to deal with empty file.
>
>
> On Thu, Apr 24, 2014 at 7:38 PM, Sai Prasanna wrote:
>
>> Hi All, Finally i wrote the following code, which is felt does optimally
>> if not the most optimum one.
>> Using file pointers, seeking
=new String(bytes); /*bdd contains the last line*/
On Thu, Apr 24, 2014 at 11:42 AM, Sai Prasanna wrote:
> Thanks Guys !
>
>
> On Thu, Apr 24, 2014 at 11:29 AM, Sourav Chandra <
> sourav.chan...@livestream.com> wrote:
>
>> Also same thing can be done using rdd.to
gt; except for the last element in your iterator. This should leave one
>>> element, which is your last element.
>>>
>>> Frank Austin Nothaft
>>> fnoth...@berkeley.edu
>>> fnoth...@eecs.berkeley.edu
>>> 202-340-0466
>>>
>>> On
ple:
>
> RDD.take(RDD.count()).last
>
>
> On Thu, Apr 24, 2014 at 10:28 AM, Sai Prasanna wrote:
>
>> Adnan, but RDD.take(RDD.count()) returns all the elements of the RDD.
>>
>> I want only to access the last element.
>>
>>
>> On Thu, Apr 2
Adnan, but RDD.take(RDD.count()) returns all the elements of the RDD.
I want only to access the last element.
On Thu, Apr 24, 2014 at 10:33 AM, Sai Prasanna wrote:
> Oh ya, Thanks Adnan.
>
>
> On Thu, Apr 24, 2014 at 10:30 AM, Adnan Yaqoob wrote:
>
>> You c
Oh ya, Thanks Adnan.
On Thu, Apr 24, 2014 at 10:30 AM, Adnan Yaqoob wrote:
> You can use following code:
>
> RDD.take(RDD.count())
>
>
> On Thu, Apr 24, 2014 at 9:51 AM, Sai Prasanna wrote:
>
>> Hi All, Some help !
>> RDD.first or RDD.take(1) gives th
Hi All, Some help !
RDD.first or RDD.take(1) gives the first item, is there a straight forward
way to access the last element in a similar way ?
I coudnt fine a tail/last method for RDD. !!
Hi All,
I want to access a particular column of a DB table stored in a CSV format
and perform some aggregate queries over it. I wrote the following query in
scala as a first step.
*var add=(x:String)=>x.split("\\s+)(2).toInt*
*var result=List[Int]()*
*input.split("\\n").foreach(x=>result::=add(x
Hi All,
In the interactive shell the spark context remains same. So if run a query
multiple times, the RDDs created by previous runs will be reused in the
subsequent runs and not recomputed until i exit and restart the shell again
right?
Or is there a way to force to reuse/recompute in the presen
Hi All,
In the interactive shell the spark context remains same. So if run a query
multiple times, the RDDs created by previous runs will be reused in the
subsequent runs and not recomputed until i exit and restart the shell again
right?
Or is there a way to force to reuse/recompute in the presen
http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
>
> Thanks,
> Rahul Singhal
>
> From: Sai Prasanna
> Reply-To: "user@spark.apache.org"
> Date: Monday 7 April 2014 6:56 PM
> To: "user@spark.apache.org"
> Subje
Hi All,
I wanted Spark on Yarn to up and running.
I did "*SPARK_HADOOP_VERSION=2.3.0 SPARK_YARN=true ./sbt/sbt assembly*"
Then i ran
"*SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.3.0.jar
SPARK_YARN_APP_JAR=examples/target/scala-2.9.3/spark-examples_2.9.3-0.8.1
Hi All,
I have a five node spark cluster, Master, s1,s2,s3,s4.
I have passwordless ssh to all slaves from master and vice-versa.
But only one machine, s2, what happens is after 2-3 minutes of my
connection from master to slave, the write-pipe is broken. So if try to
connect again from master i ge
Oh sorry, that was a mistake, the default level is MEMORY_ONLY !!
My doubt was, between two different experiments, are the RDDs cached in
memory need to be unpersisted???
Or it doesnt matter ?
ing? I am guessing it is MEMORY_ONLY. In
> large datasets, MEMORY_AND_DISK or MEMORY_AND_DISK_SER work better.
>
> You can call unpersist on an RDD to remove it from Cache though.
>
>
> On Thu, Mar 27, 2014 at 11:57 AM, Sai Prasanna wrote:
>
>> No i am running on 0.8.1.
&
e a few memory issues like these, some are resolved
>> by changing the StorageLevel strategy and employing things like Kryo, some
>> are solved by specifying the number of tasks to break down a given
>> operation into etc.
>>
>> Ognen
>>
>>
>> On 3/27/14, 10:
an someone throw some light on it ??
--
*Sai Prasanna. AN*
*II M.Tech (CS), SSSIHL*
*Entire water in the ocean can never sink a ship, Unless it gets inside.All
the pressures of life can never hurt you, Unless you let them in.*
Thanks Chen, its a bit clear now and it up and running...
1) In the WebUI, only memory used per node is given. Though in logs i can
find out, but does there exist a port over which i can monitor memory
usage, GC memory overhead, RDD creation in UI.
master URL
>
> if the later case, also yes, you can observe the distributed task in the
> Spark UI
>
> --
> Nan Zhu
>
> On Wednesday, March 26, 2014 at 8:54 AM, Sai Prasanna wrote:
>
> Is it possible to run across cluster using Spark Interactive Shell ?
>
> To
similar ???
--
*Sai Prasanna. AN*
*II M.Tech (CS), SSSIHL*
*Entire water in the ocean can never sink a ship, Unless it gets inside.All
the pressures of life can never hurt you, Unless you let them in.*
Hi All,
Does number of worker threads bear any relationship to setting executor
memory ?.
I have a 16 GB RAM, with an 8-core processor. I had set SPARK_MEM to 12g
and was running locally with default 1 thread.
So this means there can be maximum one executor in one node scheduled at
any point of tim
f you want to have 8 GB executors then, yes, only two can run on each
> 16 GB node. (In fact, you should also keep a significant amount of memory
> free for the OS to use for buffer caching and such.)
> An executor may use many cores, though, so this shouldn't be an issue.
>
>
> On Mo
EAMON" - its DAEMON. Thanks Latin.
> On Mar 24, 2014 7:25 AM, "Sai Prasanna" wrote:
>
>> Hi All !! I am getting the following error in interactive spark-shell
>> [0.8.1]
>>
>>
>> *org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed mo
-env.sh
export SPARK_DEAMON_MEMORY=8g
export SPARK_WORKER_MEMORY=8g
export SPARK_DEAMON_JAVA_OPTS="-Xms8g -Xmx8g"
export SPARK_JAVA_OPTS="-Xms8g -Xmx8g"
export HADOOP_HEAPSIZE=4000
Any suggestions ??
--
*Sai Prasanna. AN*
*II M.Tech (CS), SSSIHL*
node
> if not & data in hdfs is not critical
> hadoop namenode -format
> & restart hdfs
>
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>
>
>
> On Tue, Mar 18, 20
Connection.setupConnection(Client.java:434)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1206)
at org.apache.hadoop.ipc.Client.call(Client.java:10
Solved...but dont know whats the difference...
just giving ./spark-shell fixes it all...but dont know why !!
On Mon, Mar 17, 2014 at 1:32 PM, Sai Prasanna wrote:
> Hi everyone !!
>
> I installed scala 2.9.3, spark 0.8.1, oracle java 7...
>
> I launched master and logged on to
need to set somewhere a timeout ???
Thank you !!
--
*Sai Prasanna. AN*
*II M.Tech (CS), SSSIHL*
40 matches
Mail list logo