Re: non-deprecation compiler warnings are upgraded to build errors now

2015-07-24 Thread Punyashloka Biswal
Would it make sense to isolate the use of deprecated warnings to a subset of projects? That way we could turn on more stringent checks for the other ones. Punya On Thu, Jul 23, 2015 at 12:08 AM Reynold Xin wrote: > Hi all, > > FYI, we just merged a patch that fails a build if there is a scala >

Re: PySpark on PyPi

2015-07-22 Thread Punyashloka Biswal
I agree with everything Justin just said. An additional advantage of publishing PySpark's Python code in a standards-compliant way is the fact that we'll be able to declare transitive dependencies (Pandas, Py4J) in a way that pip can use. Contrast this with the current situation, where df.toPandas(

Re: Python UDF performance at large scale

2015-06-24 Thread Punyashloka Biswal
Hi Davies, In general, do we expect people to use CPython only for "heavyweight" UDFs that invoke an external library? Are there any examples of using Jython, especially performance comparisons to Java/Scala and CPython? When using Jython, do you expect the driver to send code to the executor as a

Re: Spark 1.4.0 pyspark and pylint breaking

2015-05-26 Thread Punyashloka Biswal
Davies: Can we use relative imports (import .types) in the unit tests in order to disambiguate between the global and local module? Punya On Tue, May 26, 2015 at 3:09 PM Justin Uang wrote: > Thanks for clarifying! I don't understand python package and modules names > that well, but I thought th

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-19 Thread Punyashloka Biswal
n a couple of weeks :) Punya On Tue, May 19, 2015 at 12:39 PM Patrick Wendell wrote: > Punya, > > Let me see if I can publish these under rc1 as well. In the future > this will all be automated but current it's a somewhat manual task. > > - Patrick > > On Tue, May

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-19 Thread Punyashloka Biswal
When publishing future RCs to the staging repository, would it be possible to use a version number that includes the "rc1" designation? In the current setup, when I run a build against the artifacts at https://repository.apache.org/content/repositories/orgapachespark-1092/org/apache/spark/spark-cor

branch-1.4 nightly builds?

2015-05-08 Thread Punyashloka Biswal
Dear Spark devs, Does anyone maintain nightly builds for branch-1.4? I'd like to start testing against it, and having a regularly updated build on a well-publicized repository would be a great help! Punya

Re: Collect inputs on SPARK-7035: compatibility issue with DataFrame.__getattr__

2015-05-08 Thread Punyashloka Biswal
Is there a foolproof way to access methods exclusively (instead of picking between columns and methods at runtime)? Here are two ideas, neither of which seems particularly Pythonic - pyspark.sql.methods(df).name() - df.__methods__.name() Punya On Fri, May 8, 2015 at 10:06 AM Nicholas Chamm

Re: [build infra] quick downtime again tomorrow morning for DOCKER

2015-05-08 Thread Punyashloka Biswal
Just curious: will docker allow new capabilities for the Spark build? (Where can I read more?) Punya On Fri, May 8, 2015 at 10:00 AM shane knapp wrote: > this is happening now. > > On Thu, May 7, 2015 at 3:40 PM, shane knapp wrote: > > > yes, docker. that wonderful little wrapper for linux co

Re: [discuss] ending support for Java 6?

2015-04-30 Thread Punyashloka Biswal
I'm in favor of ending support for Java 6. We should also articulate a policy on how long we want to support current and future versions of Java after Oracle declares them EOL (Java 7 will be in that bucket in a matter of days). Punya On Thu, Apr 30, 2015 at 1:18 PM shane knapp wrote: > somethin

Re: [discuss] DataFrame function namespacing

2015-04-29 Thread Punyashloka Biswal
Do we still have to keep the names of the functions distinct to avoid collisions in SQL? Or is there a plan to allow "importing" a namespace into SQL somehow? I ask because if we have to keep worrying about name collisions then I'm not sure what the added complexity of #2 and #3 buys us. Punya On

Re: Plans for upgrading Hive dependency?

2015-04-27 Thread Punyashloka Biswal
> getting it to compile is really complicated. > > > > If there's interest in getting the HiveContext part fixed up I can > > send a PR for that code. But at this time I don't really have plans to > > look at the thrift server. > > > > > > On M

Plans for upgrading Hive dependency?

2015-04-27 Thread Punyashloka Biswal
Dear Spark devs, Is there a plan for staying up-to-date with current (and future) versions of Hive? Spark currently supports version 0.13 (June 2014), but the latest version of Hive is 1.1.0 (March 2015). I don't see any Jira tickets about updating beyond 0.13, so I was wondering if this was inten

Re: Design docs: consolidation and discoverability

2015-04-27 Thread Punyashloka Biswal
docs in a repo) is yet > another approach we could take, though if we want to do that on the main > Spark repo we'd need permission from Apache, which may be tough to get... > > On Mon, Apr 27, 2015 at 1:47 PM Punyashloka Biswal > wrote: > >> Nick, I like your idea of kee

Re: Design docs: consolidation and discoverability

2015-04-27 Thread Punyashloka Biswal
; > > - Patrick > > > > > > > > On Fri, Apr 24, 2015 at 4:57 PM, Sean Owen > wrote: > > > >> I know I recently used Google Docs from a JIRA, so am guilty as > > > >> charged. I don't think there are a lot of design docs in general,

Re: Design docs: consolidation and discoverability

2015-04-24 Thread Punyashloka Biswal
gt; committers want to contribute changes (as opposed to just comments)? > > > > On Fri, Apr 24, 2015 at 2:57 PM, Sean Owen wrote: > > > >> Only catch there is it requires commit access to the repo. We need a > >> way for people who aren't committers to

Re: Design docs: consolidation and discoverability

2015-04-24 Thread Punyashloka Biswal
p to maintain it, it would be cool to have a wiki page with > links to all the final design docs posted on JIRA. > > -Sandy > > On Fri, Apr 24, 2015 at 12:01 PM, Punyashloka Biswal < > punya.bis...@gmail.com> wrote: > >> The Gradle dev team keep their design docume

Re: Design docs: consolidation and discoverability

2015-04-24 Thread Punyashloka Biswal
gt;> On Fri, Apr 24, 2015 at 7:21 AM, Sean Owen wrote: > >> > >> > That would require giving wiki access to everyone or manually adding > >> > people > >> > any time they make a doc. > >> > > >> > I don't see how

Design docs: consolidation and discoverability

2015-04-24 Thread Punyashloka Biswal
Dear Spark devs, Right now, design docs are stored on Google docs and linked from tickets. For someone new to the project, it's hard to figure out what subjects are being discussed, what organization to follow for new feature proposals, etc. Would it make sense to consolidate future design docs i

Re: Graphical display of metrics on application UI page

2015-04-22 Thread Punyashloka Biswal
an > possibly see it on the github. Here's a few of them > https://github.com/apache/spark/pulls?utf8=%E2%9C%93&q=d3​ > > Thanks > Best Regards > > On Wed, Apr 22, 2015 at 8:08 AM, Punyashloka Biswal < > punya.bis...@gmail.com> wrote: > >> Dear Spar

Graphical display of metrics on application UI page

2015-04-21 Thread Punyashloka Biswal
Dear Spark devs, Would people find it useful to have a graphical display of metrics (such as duration, GC time, etc) on the application UI page? Has anybody worked on this before? Punya

Re: [discuss] new Java friendly InputSource API

2015-04-21 Thread Punyashloka Biswal
Reynold, thanks for this! At Palantir we're heavy users of the Java APIs and appreciate being able to stop hacking around with fake ClassTags :) Regarding this specific proposal, is the contract of RecordReader#get intended to be that it returns a fresh object each time? Or is it allowed to mutate