Hey Cody,
In terms of Spark 1.1.1 - we wouldn't change a default value in a spot
release. Changing this to default is slotted for 1.2.0:
https://issues.apache.org/jira/browse/SPARK-3280
- Patrick
On Mon, Sep 22, 2014 at 9:08 AM, Cody Koeninger wrote:
> Unfortunately we were somewhat rushed to
After commit 8856c3d8 switched from gzip to snappy as default parquet
compression codec, I'm seeing the following when trying to read parquet
files saved using the new default (same schema and roughly same size as
files that were previously working):
java.lang.OutOfMemoryError: Direct buffer memor
FYI I filed SPARK-3647 to track the fix (some people internally have
bumped into this also).
On Mon, Sep 22, 2014 at 1:28 PM, Cody Koeninger wrote:
> We've worked around it for the meantime by excluding guava from transitive
> dependencies in the job assembly and specifying the same version of gu
We've worked around it for the meantime by excluding guava from transitive
dependencies in the job assembly and specifying the same version of guava
14 that spark is using. Obviously things break whenever a guava 15 / 16
feature is used at runtime, so a long term solution is needed.
On Mon, Sep 2
Hmmm, a quick look at the code indicates this should work for
executors, but not for the driver... (maybe this deserves a bug being
filed, if there isn't one already?)
If it's feasible for you, you could remove the Optional.class file
from the Spark assembly you're using.
On Mon, Sep 22, 2014 at
We're using Mesos, is there a reasonable expectation that
spark.files.userClassPathFirst will actually work?
On Mon, Sep 22, 2014 at 1:42 PM, Marcelo Vanzin wrote:
> Hi Cody,
>
> I'm still writing a test to make sure I understood exactly what's
> going on here, but from looking at the stack trac
Hi Cody,
There are currently no concrete plans for adding buckets to Spark SQL, but
thats mostly due to lack of resources / demand for this feature. Adding
full support is probably a fair amount of work since you'd have to make
changes throughout parsing/optimization/execution. That said, there
Hi Cody,
I'm still writing a test to make sure I understood exactly what's
going on here, but from looking at the stack trace, it seems like the
newer Guava library is picking up the "Optional" class from the Spark
assembly.
Could you try one of the options that put the user's classpath before
th
I see, thanks for pointing this out
--
Nan Zhu
On Monday, September 22, 2014 at 12:08 PM, Sandy Ryza wrote:
> MapReduce counters do not count duplications. In MapReduce, if a task needs
> to be re-run, the value of the counter from the second task overwrites the
> value from the first t
Hi:
I notice the scalatest-maven-plugin set SPARK_CLASSPATH environment
variable for testing. But in the SparkConf.scala, this is deprecated in Spark
1.0+.
So what this variable for? should we just remove this variable?
--
Ye Xianjin
Sent with Sparrow (http://www.sparrowmailapp.com
Hi Marcelo,
Interested to hear the approach to be taken. Shading guava itself seems
extreme, but that might make sense.
Gary
On Sat, Sep 20, 2014 at 9:38 PM, Marcelo Vanzin wrote:
> Hmm, looks like the hack to maintain backwards compatibility in the
> Java API didn't work that well. I'll take
Unfortunately we were somewhat rushed to get things working again and did
not keep the exact stacktraces, but one of the issues we saw was similar to
that reported in
https://issues.apache.org/jira/browse/SPARK-3032
We also saw FAILED_TO_UNCOMPRESS errors from snappy when reading the
shuffle file
MapReduce counters do not count duplications. In MapReduce, if a task
needs to be re-run, the value of the counter from the second task
overwrites the value from the first task.
-Sandy
On Mon, Sep 22, 2014 at 4:55 AM, Nan Zhu wrote:
> If you think it as necessary to fix, I would like to resub
Thanks for the heads up Cody. Any indication of what was going wrong?
On Mon, Sep 22, 2014 at 7:16 AM, Cody Koeninger wrote:
> Just as a heads up, we deployed 471e6a3a of master (in order to get some
> sql fixes), and were seeing jobs fail until we set
>
> spark.shuffle.manager=HASH
>
> I'd be
Just as a heads up, we deployed 471e6a3a of master (in order to get some
sql fixes), and were seeing jobs fail until we set
spark.shuffle.manager=HASH
I'd be reluctant to change the default to sort for the 1.1.1 release
If you think it as necessary to fix, I would like to resubmit that PR (seems to
have some conflicts with the current DAGScheduler)
My suggestion is to make it as an option in accumulator, e.g. some algorithms
utilizing accumulator for result calculation, it needs a deterministic
accumulator,
I have submitted a defect in JIRA for this:
https://issues.apache.org/jira/browse/SPARK-3638 and have submitted a PR (
https://github.com/apache/spark/pull/2489) that temporarily fixes the
issue. Users would have to build spark with kinesis-asl to get the
compatible httpclient added to spark assemb
Another data point on the 1.1.0 FetchFailures:
Running this SQL command works on 1.0.2 but fails on 1.1.0 due to the
exceptions mentioned earlier in this thread: "SELECT stringCol,
SUM(doubleCol) FROM parquetTable GROUP BY stringCol"
The FetchFailure exception has the remote block manager that fa
FWD to dev mail list for helps
From: Haopu Wang
Sent: 2014年9月22日 16:35
To: u...@spark.apache.org
Subject: Spark SQL 1.1.0: NPE when join two cached table
I have two data sets and want to join them on each first field. Sample data are
below:
data set 1
I've run into this with large shuffles - I assumed that there was
contention between the shuffle output files and the JVM for memory.
Whenever we start getting these fetch failures, it corresponds with high
load on the machines the blocks are being fetched from, and in some cases
complete unrespons
Hey all. We had also the same problem described by Nishkam almost in the
same big data setting. We fixed the fetch failure by increasing the timeout
for acks in the driver:
set("spark.core.connection.ack.wait.timeout", "600") // 10 minutes timeout
for acks between nodes
Cheers, Christoph
2014-09
Hello,
In my case, I manually deleted org/apache/http directory in the
spark-assembly jar file..
I think if we use the latest version of httpclient (httpcore) library, we
can resolve the problem.
How about upgrading httpclient? (or jets3t?)
2014-09-11 19:09 GMT+09:00 Aniket Bhatnagar :
> Thanks
Actually I met similar issue when doing groupByKey and then count if the
shuffle size is big e.g. 1tb.
Thanks.
Zhan Zhang
Sent from my iPhone
> On Sep 21, 2014, at 10:56 PM, Nishkam Ravi wrote:
>
> Thanks for the quick follow up Reynold and Patrick. Tried a run with
> significantly higher ul
23 matches
Mail list logo