UPDATE:
The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1023/
The previous repo contains exactly the same content but mutable.
Thanks Patrick for pointing it out!
-Xiangrui
On Thu, Jul 17, 2014 at 7:52 PM, Reynold Xin w
Hey Stephen,
The only change the build was that we ask users to run -Phive and
-Pyarn of --with-hive and --with-yarn (which internally just set
-Phive and -Pyarn). I don't think this should affect the dependency
graph.
Just to test this, what happens if you run *without* the CDH profile
and build
Hi Will,
These three environment variables are needed [1].
I have had success with Hive 0.12 and Hadoop 1.0.4. For Hive, getting
the source distribution seems to be required. Docs contribution will
be much appreciated!
[1]
https://github.com/apache/spark/tree/master/sql#other-dependencies-for-d
+1
On Thursday, July 17, 2014, Matei Zaharia wrote:
> +1
>
> Tested on Mac, verified CHANGES.txt is good, verified several of the bug
> fixes.
>
> Matei
>
> On Jul 17, 2014, at 11:12 AM, Xiangrui Meng > wrote:
>
> > I start the voting with a +1.
> >
> > Ran tests on the release candidates and s
Hi all,
What's the preferred environment for generating golden test outputs for new
Hive tests? In particular:
* what Hadoop version and Hive version should I be using,
* are there particular distributions people have run successfully, and
* are there any system properties or environment varia
+1
Tested with my Ubuntu Linux.
Sincerely,
DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
On Thu, Jul 17, 2014 at 6:36 PM, Matei Zaharia wrote:
> +1
>
> Tested on Mac, verified CHANGES.txt is good, v
+1
Tested on Mac, verified CHANGES.txt is good, verified several of the bug fixes.
Matei
On Jul 17, 2014, at 11:12 AM, Xiangrui Meng wrote:
> I start the voting with a +1.
>
> Ran tests on the release candidates and some basic operations in
> spark-shell and pyspark (local and standalone).
>
Hi all,
Cool discussion! I agree that a more standardized API for clustering, and
easy access to underlying routines, would be useful (we've also been
discussing this when trying to develop streaming clustering algorithms,
similar to https://github.com/apache/spark/pull/1361)
For divisive, hier
Having looked at trunk make-distribution.sh the --with-hive and --with-yarn
are now deprecated.
Here is the way I have built it:
Added to pom.xml:
cdh5
false
2.3.0-cdh5.0.0
2.3.0-cdh5.0.0
0.96.1.1-cdh5.0.0
3.4.5-cdh5.0.0
Hello,
I am currently trying to extend some custom InputSplit and RecordReader
classes to provide to SparkContext's hadoopRDD() function.
My question is the following:
Does the value returned by InpuSplit.getLenght() and/or
RecordReader.getProgress() affect the execution of a map() function in t
I start the voting with a +1.
Ran tests on the release candidates and some basic operations in
spark-shell and pyspark (local and standalone).
-Xiangrui
On Thu, Jul 17, 2014 at 3:16 AM, Xiangrui Meng wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 0.9.2!
>
>
On Thu, Jul 17, 2014 at 1:23 AM, Stephen Haberman <
stephen.haber...@gmail.com> wrote:
> I'd be ecstatic if more major changes were this well/succinctly
> explained
>
Ditto on that. The summary of user impact was very nice. It would be good
to repeat that on the user list or release notes when th
Should be an easy rebase for your PR, so I went ahead just to get this fixed up:
https://github.com/apache/spark/pull/1466
On Thu, Jul 17, 2014 at 5:32 PM, Ted Malaska wrote:
> Don't make this change yet. I have a 1642 that needs to get through around
> the same code.
>
> I can make this change
OK I will create PR.
thanks
On Thu, Jul 17, 2014 at 7:58 AM, Sean Owen wrote:
> Looks like a real problem. I see it too. I think the same workaround
> found in ClientBase.scala needs to be used here. There, the fact that
> this field can be a String or String[] is handled explicitly. In fac
Don't make this change yet. I have a 1642 that needs to get through around
the same code.
I can make this change after 1642 is through.
On Thu, Jul 17, 2014 at 12:25 PM, Sean Owen wrote:
> CC tmalaska since he touched the line in question. This is a fun one.
> So, here's the line of code adde
CC tmalaska since he touched the line in question. This is a fun one.
So, here's the line of code added last week:
val channelFactory = new NioServerSocketChannelFactory
(Executors.newCachedThreadPool(), Executors.newCachedThreadPool());
Scala parses this as two statements, one invoking a no-ar
Thank you, TD !
Fang, Yan
yanfang...@gmail.com
+1 (206) 849-4108
On Wed, Jul 16, 2014 at 6:53 PM, Tathagata Das
wrote:
> After every checkpointing interval, the latest state RDD is stored to HDFS
> in its entirety. Along with that, the series of DStream transformations
> that was setup with th
er, that line being in toDebugString, where it really shouldn't affect
anything (no signature changes or the like)
On Thu, Jul 17, 2014 at 10:58 AM, Nathan Kronenfeld <
nkronenf...@oculusinfo.com> wrote:
> My full build command is:
> ./sbt/sbt -Dhadoop.version=2.0.0-mr1-cdh4.6.0 clean assembly
>
My full build command is:
./sbt/sbt -Dhadoop.version=2.0.0-mr1-cdh4.6.0 clean assembly
I've changed one line in RDD.scala, nothing else.
On Thu, Jul 17, 2014 at 10:56 AM, Sean Owen wrote:
> This looks like a Jetty version problem actually. Are you bringing in
> something that might be changi
Looks like a real problem. I see it too. I think the same workaround
found in ClientBase.scala needs to be used here. There, the fact that
this field can be a String or String[] is handled explicitly. In fact
I think you can just call to ClientBase for this? PR it, I say.
On Thu, Jul 17, 2014 at 3
This looks like a Jetty version problem actually. Are you bringing in
something that might be changing the version of Jetty used by Spark?
It depends a lot on how you are building things.
Good to specify exactly how your'e building here.
On Thu, Jul 17, 2014 at 3:43 PM, Nathan Kronenfeld
wrote:
I'm trying to compile the latest code, with the hadoop-version set for
2.0.0-mr1-cdh4.6.0.
I'm getting the following error, which I don't get when I don't set the
hadoop version:
[error]
/data/hdfs/1/home/nkronenfeld/git/spark-ndk/external/flume/src/main/scala/org/apache/spark/streaming/flume/Flu
@Sean and @Sandy
Thanks for the reply. I used to be able to see yarn-alpha and yarn
directories which corresponding to the modules.
I guess due to the recent SparkBuild.scala changes, I did not see
yarn-alpha (by default) and I thought yarn-alpha is renamed to "yarn" and
"yarn-stable" is th
Please vote on releasing the following candidate as Apache Spark version 0.9.2!
The tag to be voted on is v0.9.2-rc1 (commit 4322c0ba):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4322c0ba7f411cf9a2483895091440011742246b
The release files, including signatures, digests, etc. ca
To add, we've made some effort to yarn-alpha to work with the 2.0.x line,
but this was a time when YARN went through wild API changes. The only line
that the yarn-alpha profile is guaranteed to work against is the 0.23 line.
On Thu, Jul 17, 2014 at 12:40 AM, Sean Owen wrote:
> Are you setting
Are you setting -Pyarn-alpha? ./sbt/sbt -Pyarn-alpha, followed by
"projects", shows it as a module. You should only build yarn-stable
*or* yarn-alpha at any given time.
I don't remember the modules changing in a while. 'yarn-alpha' is for
YARN before it stabilized, circa early Hadoop 2.0.x. 'yarn-
26 matches
Mail list logo