I've discovered that one of the anomalies I encountered was due to a
(embarrassing? humorous?) user error. See the user list thread "Failed
RC-10 yarn-cluster job for FS closed error when cleaning up staging
directory" for my discussion. With the user error corrected, the FS
closed exception
Hi DB,
I found it is a little hard to implement the solution I mentioned:
> Do not send the primary jar and secondary jars to executors'
> distributed cache. Instead, add them to "spark.jars" in SparkSubmit
> and serve them via http by called sc.addJar in SparkContext.
If you look at Application
Hi,
I would like to do some contributions towards the MLlib .I've a few concerns
regarding the same.
1. Is there any reason for implementing the algorithms supported by MLlib in
Scala
2. Will you accept if the contributions are done in Python or Java
Thanks,
Meethu M
Sure. Should I create a Jira as well?
I saw there's already a broader ticket regarding the ambiguous use of
SPARK_HOME [1] (cc: Patrick as owner of that ticket)
I don't know if it would be more relevant to remove the use of SPARK_HOME
when using mesos and have the assembly as the only way forwa
I retested several different cases...
1. FS closed exception shows up ONLY in RC-10, not in Spark 0.9.1, with
both Hadoop 2.2 and 2.3.
2. SPARK-1898 has no effect for my use cases.
3. The failure to report that the underlying application is "RUNNING"
and that it has succeeded is due ONLY to my
Hi Kevin,
On Thu, May 22, 2014 at 9:49 AM, Kevin Markey wrote:
> The FS closed exception only effects the cleanup of the staging directory,
> not the final success or failure. I've not yet tested the effect of
> changing my application's initialization, use, or closing of FileSystem.
Without go
Hi Meethu,
Thanks for asking! Scala is the native language in Spark. Implementing
algorithms in Scala can utilize the full power of Spark Core. Also,
Scala's syntax is very concise. Implementing ML algorithms using
different languages would increase the maintenance cost. However,
there are still m
The FileSystem cache is something that has caused a lot of pain over the
years. Unfortunately we (in Hadoop core) can't change the way it works now
because there are too many users depending on the current behavior.
Basically, the idea is that when you request a FileSystem with certain
options wi
Fixing the immediate issue of requiring SPARK_HOME to be set when it's not
actually used is a separate ticket in my mind from a larger cleanup of what
SPARK_HOME means across the cluster.
I think you should file a new ticket for just this particular issue.
On Thu, May 22, 2014 at 11:03 AM, Gerar
In Spark 0.9.0 and 0.9.1, we stopped using the FileSystem cache correctly,
and we just recently resumed using it in 1.0 (and in 0.9.2) when this issue
was fixed: https://issues.apache.org/jira/browse/SPARK-1676
Prior to this fix, each Spark task created and cached its own FileSystems
due to a bug
Thank you, all! This is quite helpful.
We have been arguing how to handle this issue across a growing
application. Unfortunately the Hadoop FileSystem java doc should say
all this but doesn't!
Kevin
On 05/22/2014 01:48 PM, Aaron Davidson wrote:
In Spark 0.9.0 and 0.9.1, we stopped using t
Hey all,
On further testing, I came across a bug that breaks execution of
pyspark scripts on YARN.
https://issues.apache.org/jira/browse/SPARK-1900
This is a blocker and worth cutting a new RC.
We also found a fix for a known issue that prevents additional jar
files to be specified through spark-
On Thu, May 22, 2014 at 12:48 PM, Aaron Davidson wrote:
> In Spark 0.9.0 and 0.9.1, we stopped using the FileSystem cache correctly,
> and we just recently resumed using it in 1.0 (and in 0.9.2) when this issue
> was fixed: https://issues.apache.org/jira/browse/SPARK-1676
>
Interesting...
> Pr
Looks like SPARK-1900 is a blocker for YARN and might as well add
SPARK-1870 while at it.
TD or Patrick, could you kindly send [CANCEL] prefixed in the subject
email out for the RC10 Vote to help people follow the active VOTE
threads? The VOTE emails are getting a bit hard to follow.
- Henry
O
Right! Doing that.
TD
On Thu, May 22, 2014 at 3:07 PM, Henry Saputra wrote:
> Looks like SPARK-1900 is a blocker for YARN and might as well add
> SPARK-1870 while at it.
>
> TD or Patrick, could you kindly send [CANCEL] prefixed in the subject
> email out for the RC10 Vote to help people follow
Hey all,
We are canceling the vote on RC10 because of a blocker bug in pyspark on Yarn.
https://issues.apache.org/jira/browse/SPARK-1900
Thanks everyone for testing! We will post RC11 soon.
TD
ack
On Thu, May 22, 2014 at 9:26 PM, Andrew Ash wrote:
> Fixing the immediate issue of requiring SPARK_HOME to be set when it's not
> actually used is a separate ticket in my mind from a larger cleanup of what
> SPARK_HOME means across the cluster.
>
> I think you should file a new ticket for j
Hi,
I am trying to apply inner join in shark using 64MB and 27MB files. I am
able to run the following queris on Mesos
- "SELECT * FROM geoLocation1 "
- """ SELECT * FROM geoLocation1 WHERE country = '"US"' """
But while trying inner join as
"SELECT * FROM geoLocation1 g1 INNER
Hi Prabeesh,
Do a export _JAVA_OPTIONS="-Xmx10g" before starting the shark. Also you can
do a ps aux | grep shark and see how much memory it is being allocated,
mostly it should be 512mb, in that case increase the limit.
Thanks
Best Regards
On Fri, May 23, 2014 at 10:22 AM, prabeesh k wrote:
19 matches
Mail list logo