GitHub user ksakellis opened a pull request:
https://github.com/apache/spark/pull/3119
[SPARK-4079] [CORE] Default to LZF if Snappy not available
By default, snappy is the compression codec used. If Snappy is not
available, Spark currently throws a stack trace. Now Spark falls back to LZF if
Snappy is not available on the cluster and logs a warning message.
The only exception is if the user has explicitly set
spark.io.compression.codec=snappy. In this case, if snappy is not available, an
IllegalArgumentException is thrown.
Because of the way the Snappy library uses static initialization, it was
very difficult in a unit test to simulate Snappy not being available. The only
way I could think of was to create multiple classloaders which seemed
excessive. As a result, most of this was tested adhoc on a test cluster by
modifying the system property: org.xerial.snappy.use.systemlib=true which
caused Snappy to not load and thus triggering this logic.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ksakellis/spark kostas-spark-4079
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/3119.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3119
----
commit c8bc9db38f461ed4652ebf1ec70ef967b0d8f040
Author: Kostas Sakellis <[email protected]>
Date: 2014-11-05T02:26:12Z
[SPARK-4079] [CORE] Default to LZF if Snappy not available
By default, snappy is the compression codec used.
If Snappy is not available, Spark currently throws
a stack trace. Now Spark falls back to LZF
if Snappy is not available on the cluster and logs
a warning message.
The only exception is if the user has explicitly
set spark.io.compression.codec=snappy. In this
case, if snappy is not available, an
IllegalArgumentException is thrown.
Because of the way the Snappy library uses static
initialization, it was very difficult in a unit test to
simulate Snappy not being available. The only way I
could think of was to create multiple classloaders
which seemed excessive. As a result, most of this was tested
adhoc on a test cluster by modifying the system property:
org.xerial.snappy.use.systemlib=true which caused Snappy
to not load and thus triggering this logic.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]