Spark on YARN can use History Server by setting the configuration
spark.yarn.historyServer.address. But, I can't find similar config for
Mesos. Is History Server supported by Spark on Mesos? Thanks.
Kelvin
Joe, I also use S3 and gzip. So far the I/O is not a problem. In my case,
the operation is SQLContext.JsonFile() and I can see from Ganglia that the
whole cluster is CPU bound (99% saturated). I have 160 cores and I can see
the network can sustain about 150MBit/s.
Kelvin
On Wed, Feb 4, 2015 at 10
I guess you may set the parameters below to clean the directories:
spark.worker.cleanup.enabled
spark.worker.cleanup.interval
spark.worker.cleanup.appDataTtl
They are described here:
http://spark.apache.org/docs/1.2.0/spark-standalone.html
Kelvin
On Sun, Feb 8, 2015 at 5:15 PM, ey-chih chow wr
Maybe, try with "local:" under the heading of Advanced Dependency
Management here:
https://spark.apache.org/docs/1.1.0/submitting-applications.html
It seems this is what you want. Hope this help.
Kelvin
On Sun, Feb 8, 2015 at 9:13 PM, ey-chih chow wrote:
> Is there any way we can disable Spark
Since the stacktrace shows kryo is being used, maybe, you could also try
increasing spark.kryoserializer.buffer.max.mb. Hope this help.
Kelvin
On Tue, Feb 10, 2015 at 1:26 AM, Akhil Das
wrote:
> You could try increasing the driver memory. Also, can you be more specific
> about the data volume?
Hi Su,
Out of the box, no. But, I know people integrate it with Spark Streaming to
do real-time visualization. It will take some work though.
Kelvin
On Mon, Feb 9, 2015 at 5:04 PM, Su She wrote:
> Hello Everyone,
>
> I was reading this blog post:
> http://homes.esat.kuleuven.be/~bioiuser/blog/
I had a similar use case before. I found:
1. textFile() produced one partition per file. It can result in many
partitions. I found that calling coalecse() without shuffle helped.
2. If you used persist(), count() will do I/O and put the result into
cache. Transformation later did computation out
Hi Mohammed,
Did you use --jars to specify your jdbc driver when you submitted your job?
Take a look of this link:
http://spark.apache.org/docs/1.2.0/submitting-applications.html
Hope this help!
Kelvin
On Thu, Feb 19, 2015 at 7:24 PM, Mohammed Guller
wrote:
> Hi –
>
> I am trying to use Bone
Hi,
Currently, there is only one executor per worker. There is jira ticket to
relax this:
https://issues.apache.org/jira/browse/SPARK-1706
But, if you want to use more cores, maybe, you can try increasing
SPARK_WORKER_INSTANCES. It increases the number of workers per machine.
Take a look here:
h
Hi Sandy,
I am also doing memory tuning on YARN. Just want to confirm, is it correct
to say:
spark.executor.memory - spark.yarn.executor.memoryOverhead = the memory I
can actually use in my jvm application
If it is not, what is the correct relationship? Any other variables or
config parameters i
0.2), and the rest is for basic
> Spark bookkeeping and anything the user does inside UDFs.
>
> -Sandy
>
>
>
> On Fri, Feb 20, 2015 at 11:44 AM, Kelvin Chu <2dot7kel...@gmail.com>
> wrote:
>
>> Hi Sandy,
>>
>> I am also doing memory
Hi Joe, you might increase spark.yarn.executor.memoryOverhead to see if it
fixes the problem. Please take a look of this report:
https://issues.apache.org/jira/browse/SPARK-4996
Hope this helps.
On Tue, Feb 24, 2015 at 2:05 PM, Yiannis Gkoufas
wrote:
> No problem, Joe. There you go
> https://is
Hi Darin, you might increase spark.yarn.executor.memoryOverhead to see if
it fixes the problem. Please take a look of this report:
https://issues.apache.org/jira/browse/SPARK-4996
On Fri, Feb 27, 2015 at 12:38 AM, Arush Kharbanda <
ar...@sigmoidanalytics.com> wrote:
> Can you share what error you
Hi, I used union() before and yes it may be slow sometimes. I _guess_ your
variable 'data' is a Scala collection and compute() returns an RDD. Right?
If yes, I tried the approach below to operate on one RDD only during the
whole computation (Yes, I also saw that too many RDD hurt performance).
Cha
Hi Andy,
It sounds great! Quick questions: I have been using IPython + PySpark. I
crunch the data by PySpark and then visualize the data by Python libraries
like matplotlib and basemap. Could I still use these Python libraries in
the Scala Notebook? If not, what is suggested approaches for visuali
15 matches
Mail list logo