Hi everyone,
I am interest to contribute new algorithms and optimize
existing algorithms in the area of graph algorithms and machine learning.
Please give me some ideas where to start. Is it possible for me to introduce
the notion of neural network in the apache spark
--
View th
Hi Juan,
I have created a ticket for this:
https://issues.apache.org/jira/browse/SPARK-8337
Thanks!
Amit
On Fri, Jun 12, 2015 at 3:17 PM, Juan Rodríguez Hortalá <
juan.rodriguez.hort...@gmail.com> wrote:
> Hi,
>
> If you want I would be happy to work in this. I have worked with
> KafkaUtils.cr
Hi,
If you want I would be happy to work in this. I have worked with
KafkaUtils.createDirectStream before, in a pull request that wasn't
accepted https://github.com/apache/spark/pull/5367. I'm fluent with Python
and I'm starting to feel comfortable with Scala, so if someone opens a JIRA
I can take
>
> 1. Custom aggregators that do map-side combine.
>
This is something I'd hoping to add in Spark 1.5
> 2. UDFs with more than 22 arguments which is not supported by ScalaUdf,
> and to avoid wrapping a Java function interface in one of 22 different
> Scala function interfaces depending on the n
We are using Expression for two things.
1. Custom aggregators that do map-side combine.
2. UDFs with more than 22 arguments which is not supported by ScalaUdf, and to
avoid wrapping a Java function interface in one of 22 different Scala function
interfaces depending on the number of parameters.
0.23 (and hive 0.12) code base in Spark works well from our perspective, so
not sure what you are referring to. As I said, I'm happy to maintain my own
plugins but as it stands there is no sane way to do so in Spark because
there is no clear separation/developer APIs for these.
cheers,
Tom
On Fri
I don't like the idea of removing Hadoop 1 unless it becomes a significant
maintenance burden, which I don't think it is. You'll always be surprised how
many people use old software, even though various companies may no longer
support them.
With Hadoop 2 in particular, I may be misremembering,
I don't imagine that can be guaranteed to be supported anyway... the
0.x branch has never necessarily worked with Spark, even if it might
happen to. Is this really something you would veto for everyone
because of your deployment?
On Fri, Jun 12, 2015 at 7:18 PM, Thomas Dudziak wrote:
> -1 to this
-1 to this, we use it with an old Hadoop version (well, a fork of an old
version, 0.23). That being said, if there were a nice developer api that
separates Spark from Hadoop (or rather, two APIs, one for scheduling and
one for HDFS), then we'd be happy to maintain our own plugins for those.
cheers
On Fri, Jun 12, 2015 at 5:12 PM, Patrick Wendell wrote:
> I would like to understand though Sean - what is the proposal exactly?
> Hadoop 2 itself supports all of the Hadoop 1 API's, so things like
> removing the Hadoop 1 variant of sc.hadoopFile, etc, I don't think
Not entirely; you can see some
My 2 cents: The biggest reason from my view for keeping Hadoop 1 support
was that our EC2 scripts which launch an environment for benchmarking /
testing / research only supported Hadoop 1 variants till very recently. We
did add Hadoop 2.4 support a few weeks back but that it is still not the
defau
+1 for Hadoop 2.2+
On Fri, Jun 12, 2015 at 8:45 AM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:
> I'm personally in favor, but I don't have a sense of how many people still
> rely on Hadoop 1.
>
> Nick
>
> 2015년 6월 12일 (금) 오전 9:13, Steve Loughran
> ste...@hortonworks.com>님이 작성:
>
> +1 f
I feel this is quite different from the Java 6 decision and personally
I don't see sufficient cause to do it.
I would like to understand though Sean - what is the proposal exactly?
Hadoop 2 itself supports all of the Hadoop 1 API's, so things like
removing the Hadoop 1 variant of sc.hadoopFile, et
I'm personally in favor, but I don't have a sense of how many people still
rely on Hadoop 1.
Nick
2015년 6월 12일 (금) 오전 9:13, Steve Loughran
ste...@hortonworks.com>님이 작성:
+1 for 2.2+
>
> Not only are the APis in Hadoop 2 better, there's more people testing
> Hadoop 2.x & spark, and bugs in Hadoop
The scala api has 2 ways of calling createDirectStream. One of them allows
you to pass a message handler that gets full access to the kafka
MessageAndMetadata, including offset.
I don't know why the python api was developed with only one way to call
createDirectStream, but the first thing I'd loo
+1 for 2.2+
Not only are the APis in Hadoop 2 better, there's more people testing Hadoop
2.x & spark, and bugs in Hadoop itself being fixed.
(usual disclaimers, I work off branch-2.7 snapshots I build nightly, etc)
> On 12 Jun 2015, at 11:09, Sean Owen wrote:
>
> How does the idea of removing
1, Yes, because the issues are in JIRA.
2. Nope, (at least as far as MLlib is concerned) because most if it are
just wrappers to the underlying Scala functions or methods and are not
implemented in pure Python.
3. I'm not sure about this. It seems to work fine for me!
HTH
On Fri, Jun 12, 2015 at
Would you mind to file a JIRA for this? Thanks!
Cheng
On 6/11/15 2:40 PM, Dong Lei wrote:
I think in standalone cluster mode, spark is supposed to do:
1.Download jars, files to driver
2.Set the driver’s class path
3.Driver setup a http file server to distribute these files
4.Worker downloa
How does the idea of removing support for Hadoop 1.x for Spark 1.5
strike everyone? Really, I mean, Hadoop < 2.2, as 2.2 seems to me more
consistent with the modern 2.x line than 2.1 or 2.0.
The arguments against are simply, well, someone out there might be
using these versions.
The arguments for
Hi all,
I encounter an error at spark 1.4.0, and I make an error example as
following. Both of the code can run OK on spark-shell, but the second code
encounter an error using spark-submit. The only different is that the
second code uses a literal function in the map(). but the first code uses a
d
20 matches
Mail list logo