Hi,
I have this in my spark-defaults.conf (same for hdfs):
spark.eventLog.enabled true
spark.eventLog.dir file:/tmp/spark-events
spark.history.fs.logDirectory file:/tmp/spark-events
While the app is running, there is a “.inprogress” directory. However when the
job complet
Forgot to mention this is on standalone mode.
Is my configuration wrong?
Thanks,
Liming
On 15 Jun, 2015, at 11:26 pm, Tsai Li Ming wrote:
> Hi,
>
> I have this in my spark-defaults.conf (same for hdfs):
> spark.eventLog.enabled true
> spark.eventLog.dir
Hi,
I downloaded the source from Downloads page and ran the make-distribution.sh
script.
# ./make-distribution.sh --tgz -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests
clean package
The script has “-x” set in the beginning.
++ /tmp/a/spark-1.4.0/build/mvn help:evaluate -Dexpression=project.ve
Hi,
I can’t seem to find any documentation on this feature in 1.4.0?
Regards,
Liming
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
Hi,
I found out that the instructions for OpenBLAS has been changed by the author
of netlib-java in:
https://github.com/apache/spark/pull/4448 since Spark 1.3.0
In that PR, I asked whether there’s still a need to compile OpenBLAS with
USE_THREAD=0, and also about Intel MKL.
Is it still applica
I’m getting the same issue on Spark 1.2.0. Despite having set
“spark.core.connection.ack.wait.timeout” in spark-defaults.conf and verified in
the job UI (port 4040) environment tab, I still get the “no heartbeat in 60
seconds” error.
spark.core.connection.ack.wait.timeout=3600
15/01/22 07:29:
I have been using a logstash alternative - fluentd to ingest the data into hdfs.
I had to configure fluentd to not append the data so that spark streaming will
be able to pick up the new logs.
-Liming
On 2 Feb, 2015, at 6:05 am, NORD SC wrote:
> Hi,
>
> I plan to have logstash send log even
Hi,
In the standalone mode, how can we check data locality is working as expected
when tasks are assigned?
Thanks!
On 23 Jul, 2014, at 12:49 am, Sandy Ryza wrote:
> On standalone there is still special handling for assigning tasks within
> executors. There just isn't special handling for w
Another observation I had was reading over local filesystem with “file://“. it
was stated as PROCESS_LOCAL which was confusing.
Regards,
Liming
On 13 Sep, 2014, at 3:12 am, Nicholas Chammas
wrote:
> Andrew,
>
> This email was pretty helpful. I feel like this stuff should be summarized in
>
Hi,
This is on version 1.1.0.
I’m did a simple test on MEMORY_AND_DISK storage level.
> var file =
> sc.textFile(“file:///path/to/file.txt”).persit(StorageLevel.MEMORY_AND_DISK)
> file.count()
The file is 1.5GB and there is only 1 worker. I have requested for 1GB of
worker memory per node:
Hi,
I have the classic word count example:
> file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_ +
> _).collect()
From the Job UI, I can only see 2 stages: 0-collect and 1-map.
What happened to ShuffledRDD in reduceByKey? And both flatMap and map
operations is collapsed i
Hi,
I'm confused about the -Dspark.local.dir and SPARK_WORKER_DIR(--work-dir).
What's the difference?
I have set -Dspark.local.dir for all my worker nodes but I'm still seeing
directories being created in /tmp when the job is running.
I have also tried setting -Dspark.local.dir when I run the
>> spark.local.dir can and should be set both on the executors and on the
>> driver (if the driver broadcast variables, the files will be stored in this
>> directory)
Do you mean the worker nodes?
Don’t think they are jetty connectors and the directories are empty:
/tmp/spark-3e330cdc-7540-4313-
Hi,
Couple of questions here:
0. I modified SparkLR.scala to change the N(# of data points) and D (# of
dimensions) , and ran it with:
# bin/run-example -Dspark.executor.memory=40g org.apache.spark.examples.SparkLR
local[23] 500
And here’s the process table:
/net/home/ltsai/jdk1.7.0_51/bin/jav
Hi,
Each of my worker node has its own unique spark.local.dir.
However, when I run spark-shell, the shuffle writes are always written to /tmp
despite being set when the worker node is started.
By specifying the spark.local.dir for the driver program, it seems to override
the executor? Is there
Hi,
At the reduceBuyKey stage, it takes a few minutes before the tasks start
working.
I have -Dspark.default.parallelism=127 cores (n-1).
CPU/Network/IO is idling across all nodes when this is happening.
And there is nothing particular on the master log file. From the spark-shell:
14/03/23 1
gt; Xiangrui
>
> On Sun, Mar 23, 2014 at 3:15 AM, Tsai Li Ming wrote:
>> Hi,
>>
>> At the reduceBuyKey stage, it takes a few minutes before the tasks start
>> working.
>>
>> I have -Dspark.default.parallelism=127 cores (n-1).
>>
>> CPU/Net
n, Mar 23, 2014 at 11:53 PM, Tsai Li Ming wrote:
>> Hi,
>>
>> This is on a 4 nodes cluster each with 32 cores/256GB Ram.
>>
>> (0.9.0) is deployed in a stand alone mode.
>>
>> Each worker is configured with 192GB. Spark executor memory is also 192GB.
&
> the initialization stage. If your data is sparse, the latest change to
> KMeans will help with the speed, depending on how sparse your data is.
> -Xiangrui
>
> On Mon, Mar 24, 2014 at 12:44 AM, Tsai Li Ming wrote:
>> Thanks, Let me try with a smaller K.
>>
>&
Anyone can help?
How can I configure a different spark.local.dir for each executor?
On 23 Mar, 2014, at 12:11 am, Tsai Li Ming wrote:
> Hi,
>
> Each of my worker node has its own unique spark.local.dir.
>
> However, when I run spark-shell, the shuffle writes are always wri
conf/spark-env.sh on those workers.
>
> Matei
>
> On Mar 27, 2014, at 9:04 PM, Tsai Li Ming wrote:
>
>> Anyone can help?
>>
>> How can I configure a different spark.local.dir for each executor?
>>
>>
>> On 23 Mar, 2014, at 12:11 am, Tsai L
Hi,
My worker nodes have more memory than the host that I’m submitting my driver
program, but it seems that SPARK_MEM is also setting the Xmx of the spark shell?
$ SPARK_MEM=100g MASTER=spark://XXX:7077 bin/spark-shell
Java HotSpot(TM) 64-Bit Server VM warning: INFO:
os::commit_memory(0x7f
rsion of Spark.
>
>
> On Thu, Mar 27, 2014 at 10:48 PM, Tsai Li Ming wrote:
> Hi,
>
> My worker nodes have more memory than the host that I’m submitting my driver
> program, but it seems that SPARK_MEM is also setting the Xmx of the spark
> shell?
>
> $ SPARK_MEM=
far as I can tell, spark.local.dir should *not*
> be set there, so workers should get it from their spark-env.sh. It’s true
> that if you set spark.local.dir in the driver it would pass that on to the
> workers for that job.
>
> Matei
>
> On Mar 27, 2014, at 9:57 PM, Tsai Li Ming
I’m interested in obtaining the data set too.
Thanks!
On 27 Mar, 2014, at 9:45 pm, Niko Stahl wrote:
> Hello,
>
> I would like to run the WikipediaPageRank example, but the Wikipedia dump XML
> files are no longer available on Freebase. Does anyone know an alternative
> source for the data?
>
Hi,
Is the code available for Hadoop to calculate the Logistic Regression
hyperplane?
I’m looking at the Examples:
http://spark.apache.org/examples.html,
where there is the 110s vs 0.9s in Hadoop vs Spark comparison.
Thanks!
--
> Web: http://alpinenow.com/
>
>
> On Mon, Mar 31, 2014 at 11:38 PM, Tsai Li Ming wrote:
> Hi,
>
> Is the code available for Hadoop to calculate the Logistic Regression
> hyperplane?
>
> I’m looking at the Examples:
> http:
27 matches
Mail list logo