Hi,
I had to setup a cron job for cleanup in $SPARK_HOME/work and in
$SPARK_LOCAL_DIRS.
Here are the cron lines. Unfortunately it's for *nix machines, I guess
you will have to adapt it seriously for Windows.
12 * * * * find $SPARK_HOME/work -cmin +1440 -prune -exec rm -rf {} \+
32 * * * *
Hi Gerard,
thanks for the hint with the Singleton object. Seems very interesting.
However, when my singleton object (e.g. handle to my DB) is supposed to
have a member variable that is non-serializable I again will have a
problem, won’t I? At least I always run into issues that Python tries to
pic
Depends... The heartbeat you received happens due to GC pressure (probably
due to Full GC). If you increase the memory too much, the GC's may be less
frequent, but the Full GC's may take longer. Try increasing the following
confs:
spark.executor.heartbeatInterval
spark.core.connection.ack.wait.tim
I will increase memory for the job...that will also fix it right ?
On Apr 10, 2015 12:43 PM, "Reza Zadeh" wrote:
> You should pull in this PR: https://github.com/apache/spark/pull/5364
> It should resolve that. It is in master.
> Best,
> Reza
>
> On Fri, Apr 10, 2015 at 8:32 AM, Debasish Das
> w
Hi All I am running below code before calling foreach i did 3
transformation using MapTopair. In my application there are 16 executed but
no executed running anything.
rddWithscore.foreach(new
VoidFunction>>() {
@Override
public void call(Tuple2> t)
throws Exception {
Entry maxEntry = null;
for
Hello,
Are there any restriction in the column name? I tried to use ".", but
sqlContext.sql cannot find the column. I would guess that "." is tricky as
this affects accessing StructType, but are there any more restriction on
column name?
scala> case class A(a: Int)
defined class A
scala> sqlCont
Sean,
I do agree about the "inside out" parallelization but my curiosity is
mostly in what type of performance I can expect to have by piping out to R.
I'm playing with Twitter's new Anomaly Detection library btw, this could be
a solution if I can get the calls to R to stand up to the massive data
Coalesce tries to reduce the number of partitions into smaller number of
partitions, without moving the data around (as much as possible). Since
most of received data is in a few machines (those running receivers),
coallesce just makes bigger merged partitions in those.
Without coalesce
Machine 1:
Does anybody have an answer for this?
Thanks
Ningjun
From: Wang, Ningjun (LNG-NPV)
Sent: Thursday, April 02, 2015 12:14 PM
To: user@spark.apache.org
Subject: Is the disk space in SPARK_LOCAL_DIRS cleanned up?
I set SPARK_LOCAL_DIRS to C:\temp\spark-temp. When RDDs are shuffled, spark
writes
Hi,
Suppose I have a command and I pass the --files arg as below:
bin/spark-submit --class com.test.HelloWorld --master yarn-cluster
--num-executors 8 --driver-memory 512m --executor-memory 2048m
--executor-cores 4 --queue public * --files $HOME/myfile.txt* --name
test_1 ~/test_code-1.0-SNAPSHOT
Hi Marcelo,
I am not including Spark's classes. When I used the userClasspathFirst
flag, I started getting those errors.
Been there, done that. Removing guava classes was one of the first things I
tried.
I saw your replies to a similar problem from Sept.
http://apache-spark-developers-list.10
Hello,
The DataFrame documentation always uses $"columnX" to annotates a column.
But I cannot find much information about it. Maybe I have missed something.
Can anyone point me to the doc about the "$", if there is any?
Thanks.
Justin
Hi,
I'm reading data stored in S3 and aggregating and storing it in Cassandra
using a spark job.
When I run the job with approx 3Mil records (about 3-4 GB of data) stored
in text files, I get the following error:
(11529/14925)15/04/10 19:32:43 INFO TaskSetManager: Starting task 11609.0
in stage
You should pull in this PR: https://github.com/apache/spark/pull/5364
It should resolve that. It is in master.
Best,
Reza
On Fri, Apr 10, 2015 at 8:32 AM, Debasish Das
wrote:
> Hi,
>
> I am benchmarking row vs col similarity flow on 60M x 10M matrices...
>
> Details are in this JIRA:
>
> https:/
14 matches
Mail list logo