RDD API patterns

2015-09-14 Thread sim
no nested RDDs mean there are absolutely no high-level abstractions that we can expose via the Iterables borne of RDDs? I'd love your thoughts. /Sim http://linkedin.com/in/simeons <http://linkedin.com/in/simeons> -- View this message in context: http://apache-spark-developers-lis

Re: RDD API patterns

2015-09-18 Thread sim
Thanks everyone for the comments! I waited for more replies to come before I responded as I was interested in the community's opinion. The thread I'm noticing in this thread (pun intended) is that most responses focus on the nested RDD issue. I think we all agree that it is problematic for many r

Re: RDD API patterns

2015-09-18 Thread sim
Aniket, yes, I've done the separate file trick. :) Still, I think we can solve this problem without nested RDDs. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14192.html Sent from the Apache Spark Developers List mailing lis

Re: RDD API patterns

2015-09-18 Thread sim
e of the former to know what's worth optimizing. Thanks, Sim -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14193.html Sent from the Apache Spark Developers List mailing list archive

Re: RDD API patterns

2015-09-18 Thread sim
Robin, my point exactly. When an API is valuable, let's expose it in a way that it may be used easily for all data Spark touches. It should not require much development work to implement the sampling logic to work for an Iterable as opposed to an RDD. -- View this message in context: http://apa

Re: RDD API patterns

2015-09-18 Thread sim
@debasish83, yes, there are many ways to optimize and work around the limitation of no nested RDDs. The point of this thread is to discuss the API patterns of Spark in order to make the platform more accessible to lots of developers solving interesting problems quickly. We can get API consistency w

Re: RDD API patterns

2015-09-19 Thread sim
tation options. Best, Sim -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14222.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --

Scala API: simplifying common patterns

2016-02-07 Thread sim
, low-risk API tweaks could make common use cases more consistent + simpler to code? /Sim -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-API-simplifying-common-patterns-tp16238.html Sent from the Apache Spark Developers List mailing list archi

Re: Scala API: simplifying common patterns

2016-02-07 Thread sim
Sure. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-API-simplifying-common-patterns-tp16238p16241.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. -

Re: Scala API: simplifying common patterns

2016-02-07 Thread sim
Reynold, I just forked + built master and I'm getting lots of binary compatibility errors when running the tests. https://gist.github.com/ssimeonov/69cb0b41750be776 Nothing in the dev tools section of the wiki on this. Any advice on how to get green before I work on the PRs? Thanks

Re: Scala API: simplifying common patterns

2016-02-07 Thread sim
Same result with both caches cleared. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-API-simplifying-common-patterns-tp16238p16244.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. -

Re: Scala API: simplifying common patterns

2016-02-07 Thread sim
24 test failures for sql/test: https://gist.github.com/ssimeonov/89862967f87c5c497322 -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-API-simplifying-common-patterns-tp16238p16247.html Sent from the Apache Spark Developers List mailing list archi