Hi, Reynold and others
I agree with your comments on mid-tenured objects and GC. In fact, dealing with
mid-tenured objects are the major challenge for all java GC implementations.
I am wondering if anyone has played -XX:+PrintTenuringDistribution flags and
see how exactly ages distribution look
Please vote on releasing the following candidate as Apache Spark version
1.5.0. The vote is open until Friday, Aug 29, 2015 at 5:00 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.
[ ] +1 Release this package as Apache Spark 1.5.0
[ ] -1 Do not release this package because ...
To
This isn't really answering the question, but for what it is worth, I
manage several different branches of Spark and publish custom named
versions regularly to an internal repository, and this is *much* easier
with SBT than with maven. You can actually link the Spark SBT build into
an external SBT
There are a lot of GC activity due to the non-code-gen path being sloppy
about garbage creation. This is not actually what happens, but just as an
example:
rdd.map { i: Int => i + 1 }
This under the hood becomes a closure that boxes on every input and every
output, creating two extra objects.
Th
Thank you for the explanation. The size if the 100M data is ~1.4GB in memory
and each worker has 32GB of memory. It seems to be a lot of free memory
available. I wonder how Spark can hit GC with such setup?
Reynold Xin mailto:r...@databricks.com>>
On Fri, Aug 21, 2015 at 11:07 AM, Ulanov, Alex
On Fri, Aug 21, 2015 at 11:07 AM, Ulanov, Alexander wrote:
>
>
> It seems that there is a nice improvement with Tungsten enabled given that
> data is persisted in memory 2x and 3x. However, the improvement is not that
> nice for parquet, it is 1.5x. What’s interesting, with Tungsten enabled
> per
It works for me in cluster mode.
I’m running on Hortonworks 2.2.4.12 in secure mode with Hive 0.14
I built with
./make-distribution —tgz -Phive -Phive-thriftserver -Phbase-provided -Pyarn
-Phadoop-2.6
Doug
> On Aug 25, 2015, at 4:56 PM, Tom Graves wrote:
>
> Anyone using HiveContext with
I chatted with Patrick briefly offline. It would be interesting to
know whether the scripts have some way of saying "run a smaller
version of certain tests" (e.g. by setting a system property that the
tests look at to decide what to run). That way, if there are no
changes under sql/, we could still
Anyone using HiveContext with secure Hive with Spark 1.5 and have it working?
We have a non standard version of hive but was pulling our hive jars and its
failing to authenticate. It could be something in our hive version but
wondering if spark isn't forwarding credentials properly.
Tom
I'd be okay skipping the HiveCompatibilitySuite for core-only changes.
They do often catch bugs in changes to catalyst or sql though. Same for
HashJoinCompatibilitySuite/VersionsSuite.
HiveSparkSubmitSuite/CliSuite should probably stay, as they do test things
like addJar that have been broken by
There is already code in place that restricts which tests run
depending on which code is modified. However, changes inside of
Spark's core currently require running all dependent tests. If you
have some ideas about how to improve that heuristic, it would be
great.
- Patrick
On Tue, Aug 25, 2015 a
Hello y'all,
So I've been getting kinda annoyed with how many PR tests have been
timing out. I took one of the logs from one of my PRs and started to
do some crunching on the data from the output, and here's a list of
the 5 slowest suites:
307.14s HiveSparkSubmitSuite
382.641s VersionsSuite
398s
Is there a jira to update the sql hive docs?Spark SQL and DataFrames - Spark
1.5.0 Documentation
| |
| | | | | |
| Spark SQL and DataFrames - Spark 1.5.0 DocumentationSpark SQL and DataFrame
Guide Overview DataFrames Starting Point: SQLContext Creating DataFrames
DataFrame Operation
Final chance to fill out the survey!
http://goo.gl/forms/erct2s6KRR
I'm gonna close it to new responses tonight and send out a summary of the
results.
Nick
On Thu, Aug 20, 2015 at 2:08 PM Nicholas Chammas
wrote:
> I'm planning to close the survey to further responses early next week.
>
> If y
On Tue, Aug 25, 2015 at 2:17 AM, wrote:
> Then, if I wanted to do a build against a specific profile, I could also
> pass in a -Dspark.version=1.4.1-custom-string and have the output artifacts
> correctly named. The default behaviour should be the same. Child pom files
> would need to reference $
This probably means your app is failing and the second attempt is
hitting that issue. You may fix the "directory already exists" error
by setting
spark.eventLog.overwrite=true in your conf, but most probably that
will just expose the actual error in your app.
On Tue, Aug 25, 2015 at 9:37 AM, Varad
Here is the error
yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User
class threw exception: Log directory
hdfs://Sandbox/user/spark/applicationHistory/application_1438113296105_0302
already exists!)
I am using cloudera 5.3.2 with Spark 1.2.0
Any help is appreciated.
Th
Thank you for the suggestions, actually this project is already on
spark-packages for 1~2 months.
Then I think what I need is some promotions :P
2015-08-25 23:51 GMT+08:00 saurfang [via Apache Spark Developers List] <
ml-node+s1001551n1380...@n3.nabble.com>:
> This is very cool. I also have a sbt
You can add it to the spark packages i guess http://spark-packages.org/
Thanks
Best Regards
On Fri, Aug 14, 2015 at 1:45 PM, pishen tsai wrote:
> Sorry for previous line-breaking format, try to resend the mail again.
>
> I have written a sbt plugin called spark-deployer, which is able to deploy
I've got an interesting challenge in building Spark. For various reasons we
do a few different builds of spark, typically with a few different profile
options (e.g. against different versions of Hadoop, some with/without Hive
etc.). We mirror the spark repo internally and have a buildserver that
bu
20 matches
Mail list logo