Re: time for Apache Spark 3.0?

2018-06-15 Thread Andy
/ BigDL / ……)* 3.0 is a very important version for an good open source project. It should be better to drift away the historical burden and *focus in new area*. Spark has been widely used all over the world as a successful big data framework. And it can be better than that. *Andy* On Thu, Apr 5

Re: Question on Spark's graph libraries roadmap

2017-03-14 Thread Andy
GraphFrame is just a Graph Analytics/Query Engine, not a Graph Engine which GraphX used to be. And I'm sorry to say, it doesn’t fit most scenarioes at all in fact. Enzo, I don’t think there is any roadmap of Graph libraries for Spark for now. *Andy* On Tue, Mar 14, 2017 at 7:28 AM, Tim H

Re: How to hint Spark to use HashAggregate() for UDAF

2017-01-10 Thread Andy Dang
Thanks. It appears that TypedImperativeAggregate won't be available till 2.2.x. I'm stuck with my RDD approach then :( --- Regards, Andy On Tue, Jan 10, 2017 at 2:01 AM, Liang-Chi Hsieh wrote: > > Hi Andy, > > Because hash-based aggregate uses unsafe row as aggr

Re: How to hint Spark to use HashAggregate() for UDAF

2017-01-09 Thread Andy Dang
based aggregate, but I could be missing something here :). I could achieve hash-based aggregate by turning this query to RDD mode, but that is counter intuitive IMO. --- Regards, Andy On Mon, Jan 9, 2017 at 2:05 PM, Takeshi Yamamuro wrote: > Hi, > > Spark always uses hash-based agg

How to hint Spark to use HashAggregate() for UDAF

2017-01-09 Thread Andy Dang
,value#4L] How can I make Spark to use HashAggregate (like the count(*) expression) instead of SortAggregate with my UDAF? Is it intentional? Is there an issue tracking this? --- Regards, Andy

Re: Converting an InternalRow to a Row

2017-01-07 Thread Andy Dang
Ah, I missed that bit of documentation my bad :). That totally explains the behavior! Thanks a lot! --- Regards, Andy On Sat, Jan 7, 2017 at 10:11 AM, Liang-Chi Hsieh wrote: > > Hi Andy, > > Thanks for sharing the code snippet. > > I am not sure if you miss someth

Re: Converting an InternalRow to a Row

2017-01-06 Thread Andy Dang
gards, Andy On Fri, Jan 6, 2017 at 3:48 AM, Liang-Chi Hsieh wrote: > > Can you show how you use the encoder in your UDAF? > > > Andy Dang wrote > > One more question about the behavior of ExpressionEncoder > > > > . > > > > I have a UDAF that ha

Re: Converting an InternalRow to a Row

2017-01-05 Thread Andy Dang
the expected behavior of Encoders? --- Regards, Andy On Thu, Jan 5, 2017 at 10:55 AM, Andy Dang wrote: > Perfect. The API in Java is bit clumsy though > > What I ended up doing in Java (the val is from lombok, if anyone's > wondering): >

Re: Converting an InternalRow to a Row

2017-01-05 Thread Andy Dang
coder = RowEncoder.apply(schema).resolveAndBind(ScalaUtils.scalaSeq(attributes), SimpleAnalyzer$.MODULE$); --- Regards, Andy On Thu, Jan 5, 2017 at 2:53 AM, Liang-Chi Hsieh wrote: > > You need to resolve and bind the encoder. > > ExpressionEncoder enconder = RowEncoder.apply(struct).

Converting an InternalRow to a Row

2017-01-04 Thread Andy Dang
Row roundTrip = enconder.fromRow(internalRow); System.out.println("Round trip: " + roundTrip.size()); } The code fails at the line encoder.fromRow() with the exception: > Caused by: java.lang.UnsupportedOperationException: Cannot evaluate expression: getcolumnbyordinal(0, IntegerType) --- Regards, Andy

Re: Best Practice for Spark Job Jar Generation

2016-12-23 Thread Andy Dang
We remodel Spark dependencies and ours together and chuck them under the /jars path. There are other ways to do it but we want the classpath to be strictly as close to development as possible. --- Regards, Andy On Fri, Dec 23, 2016 at 6:00 PM, Chetan Khatri wrote: > Andy, Thanks for re

Re: Best Practice for Spark Job Jar Generation

2016-12-23 Thread Andy Dang
ot a big issue for us). --- Regards, Andy On Fri, Dec 23, 2016 at 6:44 AM, Chetan Khatri wrote: > Hello Spark Community, > > For Spark Job Creation I use SBT Assembly to build Uber("Super") Jar and > then submit to spark-submit. > > Example, > > bin/spar

Negative number of active tasks

2016-12-23 Thread Andy Dang
only special thing I'm doing is saving multiple datasets at the same time to HDFS from different threads. Thanks, Andy - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: cutting 1.6.2 rc and 2.0.0 rc this week?

2016-06-15 Thread andy petrella
all of them look like >> they >> > can be retargeted are are just some doc updates. I'm going to be more >> > aggressive and pushing individual people about resolving those, in case >> this >> > drags on forever. >> > >> > >> > >> > >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> >> > -- andy

Re: [Build] repo1.maven.org: spark libs 1.5.0 for scala 2.10 poms are broken (404)

2015-10-02 Thread andy petrella
sure thing, I'll do it today I was just talking about the page thanks btw Le ven. 2 oct. 2015 20:26, Ted Yu a écrit : > Andy: > 1.5.1 has many critical bug fixes on top of 1.5.0 > > http://search-hadoop.com/m/q3RTtGrXP31BVt4l1 > > Please consider using 1.5.1 > > C

Re: [Build] repo1.maven.org: spark libs 1.5.0 for scala 2.10 poms are broken (404)

2015-10-02 Thread andy petrella
it's an option but not a solution, indeed Le ven. 2 oct. 2015 20:08, Ted Yu a écrit : > Andy: > 1.5.1 has been released. > > Maybe you can use this: > > https://repo1.maven.org/maven2/org/apache/spark/spark-streaming_2.10/1.5.1/spark-streaming_2.10-1.5.1.pom > > I

Re: [Build] repo1.maven.org: spark libs 1.5.0 for scala 2.10 poms are broken (404)

2015-10-02 Thread andy petrella
yup looks like it's funky they may have scalability issues ^^ Le ven. 2 oct. 2015 20:11, Reynold Xin a écrit : > Both work for me. It's possible maven.org is having problems with some > servers. > > > On Fri, Oct 2, 2015 at 11:08 AM, Ted Yu wrote: > >>

[Build] repo1.maven.org: spark libs 1.5.0 for scala 2.10 poms are broken (404)

2015-10-02 Thread andy petrella
10-1.5.0.pom> Any idea? ps: this happens for streaming too at least -- andy

Fwd: Parallel collection in driver programs

2015-09-22 Thread Andy Huang
Hi Devs, Hopefully one of you know more on this? Thanks Andy -- Forwarded message -- From: Andy Huang Date: Wed, Sep 23, 2015 at 12:39 PM Subject: Parallel collection in driver programs To: u...@spark.apache.org Hi All, Would like know if anyone has experienced with parallel

Re: [ANNOUNCE] Announcing Spark 1.5.0

2015-09-09 Thread andy petrella
this message in context: >>>> http://apache-spark-developers-list.1001551.n3.nabble.com/ANNOUNCE-Announcing-Spark-1-5-0-tp14013p14015.html >>>> Sent from the Apache Spark Developers List mailing list archive at >>>> Nabble.com. >>>> >>>> - >>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: dev-h...@spark.apache.org >>>> >>>> >>> >> > -- andy

Re: SciSpark: NASA AIST14 proposal

2015-01-15 Thread andy petrella
peration, I thought that IndexedRDD <https://github.com/amplab/spark-indexedrdd> could be interesting to consider (I've been asked to look at options to implement this kind of distributed and resilient R-Tree, so I'll be happy to see how it'd perform ^^). cheers and have fun! andy

Re: Nabble mailing list mirror errors: "This post has NOT been accepted by the mailing list yet"

2014-12-19 Thread Andy Konwinski
gh so not sure if it actually worked after all. Andy On Wed, Dec 17, 2014 at 1:09 PM, Josh Rosen wrote: > > Yeah, it looks like messages that are successfully posted via Nabble end > up on the Apache mailing list, but messages posted directly to Apache > aren't mirrored to Nabble anymore

Re: Nabble mailing list mirror errors: "This post has NOT been accepted by the mailing list yet"

2014-12-18 Thread andy
I just changed the domain name in the mailing list archive settings to remove ".incubator" so maybe it'll work now. Andy -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Nabble-mailing-list-mirror-errors-This-post-has-NOT-been-accepted

Re: Nabble mailing list mirror errors: "This post has NOT been accepted by the mailing list yet"

2014-12-18 Thread andy
I just changed the domain name in the mailing list archive settings to remove ".incubator" so maybe it'll work now. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Nabble-mailing-list-mirror-errors-This-post-has-NOT-been-accepted-by-the-mailing-list-ye

Re: Notes on writing complex spark applications

2014-11-23 Thread andy petrella
Cool! On Sun Nov 23 2014 at 5:58:03 PM Evan R. Sparks wrote: > Hi all, > > Shivaram Venkataraman, Joseph Gonzalez, Tomer Kaftan, and I have been > working on a short document about writing high performance Spark > applications based on our experience developing MLlib, GraphX, ml-matrix, > pipeli

Re: Spark Streaming Metrics

2014-11-21 Thread andy petrella
said the guy that I might poke him today with more materials. Btw, how're you? Tchuss man andy PS: did you tried the recent events thingy? On Fri Nov 21 2014 at 11:17:17 AM Gerard Maas wrote: > Looks like metrics are not a hot topic to discuss - yet so important to > sleep we

Re: Implementing TinkerPop on top of GraphX

2014-11-06 Thread andy petrella
Great stuffs! I've got some thoughts about that, and I was wondering if it would be first interesting to have something like for spark-core (let's say): 0/ Core API offering basic (or advanced → HeLP) primitives 1/ catalyst optimizer for a text base system (SPARQL, Cypher, custom SQL3, whatnot) 2/

Re: best IDE for scala + spark development?

2014-10-27 Thread andy petrella
I second the S[B]T combo! I tried ATOM → lack of features and stability (atm) aℕdy ℙetrella about.me/noootsab [image: aℕdy ℙetrella on about.me] On Mon, Oct 27, 2014 at 2:15 PM, Dean Wampler wrote: > For what it's worth, I use Sublime Text + the SBT console for ever

Re: [brainsotrming] Generalization of DStream, a ContinuousRDD ?

2014-08-01 Thread andy petrella
l screen ☺) Cheers Andy

Re: [brainsotrming] Generalization of DStream, a ContinuousRDD ?

2014-08-01 Thread andy petrella
ner without neither time nor timestamp field's value, but a kind-of internal index as range delimiter -- thus defining their own exotic continuum and break function. greetz, aℕdy ℙetrella about.me/noootsab [image: aℕdy ℙetrella on about.me] <http://about.me/noootsab> On Thu, Jul 17,

Re: [brainsotrming] Generalization of DStream, a ContinuousRDD ?

2014-07-16 Thread andy petrella
e solved without sacrificing efficiency (e.g. I can imagine > doing multiple pass magic) > > 2. An even more fundamental question is how do you ensure ordering with > delayed records. If you want to process in order of application time, and > records are delayed how do you deal wi

Re: [brainsotrming] Generalization of DStream, a ContinuousRDD ?

2014-07-16 Thread andy petrella
2014 at 12:33 AM, Tathagata Das wrote: > Very interesting ideas Andy! > > Conceptually i think it makes sense. In fact, it is true that dealing with > time series data, windowing over application time, windowing over number of > events, are things that DStream does not natively support

[brainsotrming] Generalization of DStream, a ContinuousRDD ?

2014-07-15 Thread andy petrella
Dear Sparkers, *[sorry for the lengthy email... => head to the gist for a preview :-p**]* I would like to share some thinking I had due to a use case I faced. Basically, as the subject announced it, it's a generalization of the DStream c

Fwd: 2014 Mesos community survey results

2014-06-24 Thread Andy Konwinski
I think it's cool that the Mesos team did a survey of usage and published the aggregate results. It would be cool to do a survey for the Spark project and publish the results on the Spark website like the Mesos team did. -- Forwarded message -- From: "Dave Lester" Date: Jun 24, 201

Re: encounter jvm problem when integreation spark with mesos

2014-06-17 Thread andy petrella
Yep but no real resolution nor advances on this topic, since finally we've chosen to stick with a "compatible" version of Mesos (0.14.1 ftm). But I'm still convince it has to do with native libs clash :-s aℕdy ℙetrella about.me/noootsab [image: aℕdy ℙetrella on about.me]

Re: [RESULT][VOTE] Release Apache Spark 1.0.0 (RC11)

2014-05-29 Thread Andy Konwinski
quot;0" vote > > and no > > >>> "-1" vote. > > >>> > > >>> Thanks to everyone who tested the RC and voted. Here are the totals: > > >>> > > >>> +1: (13 votes) > > >>> Matei Zaharia* > > &g

Re: [VOTE] Release Apache Spark 1.0.0 (RC11)

2014-05-28 Thread Andy Konwinski
+1 On May 28, 2014 7:05 PM, "Xiangrui Meng" wrote: > +1 > > Tested apps with standalone client mode and yarn cluster and client modes. > > Xiangrui > > On Wed, May 28, 2014 at 1:07 PM, Sean McNamara > wrote: > > Pulled down, compiled, and tested examples on OS X and ubuntu. > > Deployed app we a

Re: Scala examples for Spark do not work as written in documentation

2014-05-20 Thread Andy Konwinski
la. Thanks for catching this Glenn. Andy On Fri, May 16, 2014 at 12:38 PM, Mark Hamstra wrote: > Sorry, looks like an extra line got inserted in there. One more try: > > val count = spark.parallelize(1 to NUM_SAMPLES).map { _ => > val x = Math.random() > val y = Mat

Re: can RDD be shared across mutil spark applications?

2014-05-17 Thread Andy Konwinski
RDDs cannot currently be shared across multiple SparkContexts without using something like the Tachyon project (which is a separate project/codebase). Andy On May 16, 2014 2:14 PM, "qingyang li" wrote: > >

Re: Updating docs for running on Mesos

2014-05-11 Thread Andy Konwinski
Thanks for suggesting this and volunteering to do it. On May 11, 2014 3:32 AM, "Andrew Ash" wrote: > > The docs for how to run Spark on Mesos have changed very little since > 0.6.0, but setting it up is much easier now than then. Does it make sense > to revamp with the below changes? > > > You n

Re: Link not working

2014-04-22 Thread Andy Konwinski
Should be fixed now, thanks for reporting this! Andy On Mon, Apr 21, 2014 at 10:59 PM, prabeesh k wrote: > For Spark-0.8.0, the download links are not working. > > Please update the same > > Regarding, > prabeesh >

Updating all references to github.com/apache/incubator-spark on spark website

2014-04-09 Thread Andy Konwinski
Since http://github.com/apache/incubator-spark and any links underneath it now return 404, I propose we do a global search and replace to change all instances to remove "incubator-", including those in docs/0.8.0 docs/0.8.1 and docs/0.9.0. I'm happy to do this. Any discussion before I do? Andy

Re: branch-1.0 cut

2014-04-09 Thread Andy Konwinski
Wow, great work. Very impressive sticking to the schedule! On Wed, Apr 9, 2014 at 2:31 AM, Patrick Wendell wrote: > Hey All, > > In accordance with the scheduled window for the release I've cut a 1.0 > branch. Thanks a ton to everyone for being so active in reviews during the > last week. In th

Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread Andy Konwinski
Responses about London, Montreal/Toronto, DC, Chicago. Great coverage so far, and keep 'em coming! (still looking for an NYC connection) I'll reply to each of you off-list to coordinate next-steps for setting up a Spark meetup in your home area. Thanks again, this is super exciting.

Calling Spark enthusiasts in NYC

2014-03-31 Thread Andy Konwinski
all of the resources we have to offer (speakers from the core community, a budget for food, help scheduling, etc.), and let's make this happen! Andy

Re: The difference between driver and master Spark

2014-03-31 Thread Andy Konwinski
SparkContext object. Have you read through http://spark.apache.org/docs/latest/index.html yet? If not, it and the other docs might be helpful. In the future, this level of question would probably be better suited for the user list. Best, Andy On Mar 31, 2014 7:48 AM, "Dan" wrote: Hi,

[DISCUSS] Shepherding PRs

2014-03-27 Thread Andy Konwinski
ssing. Andy -- Forwarded message -- From: "Benjamin Mahler" Date: Mar 24, 2014 11:47 PM Subject: Re: Shepherding on ExternalContainerizer To: "dev" Cc: Hey Till, We want to foster a healthy review culture, and so, as you observed, we thought we would try o

Re: Announcing the official Spark Job Server repo

2014-03-24 Thread andy petrella
Thx for answering! see inline for my thoughts (or misunderstanding ? ^^) Andy, doesn't Marathon handle fault tolerance amongst its apps? ie if > you say that N instances of an app are running, and one shuts off, > then it spins up another one no? > Yes indeed, but my wonder is abo

Re: Making RDDs Covariant

2014-03-22 Thread andy petrella
ne Source, Process and Sink of Container of Wagons (Rdds Dstreams themselves) to compose a Job using a (to be defined) DSLs. So without covariance I cannot for now define a generic noop Sink. My0.02c Andy Sent from Tab, sorry for the typos... Le 22 mars 2014 17:00, "Pascal Voitot Dev" a éc

Re: Announcing the official Spark Job Server repo

2014-03-20 Thread andy petrella
expressing something completely dumb ^^). For sure, we'll try to share it when we'll reach this point to deploy using marathon (should be planned for April) greetz and again, Nice Work Evan! Ndi On Wed, Mar 19, 2014 at 7:27 AM, Evan Chan wrote: > Andy, > > Yeah, we've thought

Re: Announcing the official Spark Job Server repo

2014-03-18 Thread andy petrella
arding the resources needed (à la Jenkins). Any idea is welcome. Back to the news, Evan + Ooyala team: Great Job again. andy On Tue, Mar 18, 2014 at 11:39 PM, Henry Saputra wrote: > W00t! > > Thanks for releasing this, Evan. > > - Henry > > On Tue, Mar 18, 2014 at 1:51 PM, E

Re: [re-cont] map and flatMap

2014-03-15 Thread andy petrella
o clear this out by either renaming > the function -- or even awesomely better making flatMap able to > redistribute in-between (which can have a big impact at the reconciliation > => my first mail :-D). > Tell me if I'm completely wrong ;-) -- or if I forget something in my > ac

Re: [re-cont] map and flatMap

2014-03-15 Thread andy petrella
Yep, Regarding flatMap and an implicit parameter might work like in scala's future for instance: https://github.com/scala/scala/blob/master/src/library/scala/concurrent/Future.scala#L246 Dunno, still waiting for some insights from the team ^^ andy On Wed, Mar 12, 2014 at 3:23 PM, Pascal V

[re-cont] map and flatMap

2014-03-12 Thread andy petrella
(flatMapValues) So, wouldn't be better to rename flatMap as flatMapData (or whatever better name)? Or to have flatMap requiring a Monad instance of RDD? Sorry for the prose, just dropped my thoughts and feelings at once :-/ Cheers, andy PS: and my English maybe, although my name's Andy I'm a native Belgian ^^.

Re: Please update the incubator status for the graduated podlings

2014-03-10 Thread Andy Konwinski
s); remove the "graduating" element. * Remove the "reporting" element from that podlings.xml file. Andy On Sun, Mar 9, 2014 at 9:06 PM, Roman Shaposhnik wrote: > Hi! > > while compiling a board report, I have noticed > that the following podlings still