Packaging Java + Python library

2015-04-13 Thread Punya Biswal
Dear Spark users, My team is working on a small library that builds on PySpark and is organized like PySpark as well -- it has a JVM component (that runs in the Spark driver and executor) and a Python component (that runs in the PySpark driver and executor processes). What's a good approach for

Spark profiler

2014-05-01 Thread Punya Biswal
Hi all, I am thinking of starting work on a profiler for Spark clusters. The current idea is that it would collect jstacks from executor nodes and put them into a central index (either a database or elasticsearch), and it would present them to people in a UI that would let people slice and dice th

Re: Separating classloader management from SparkContexts

2014-03-19 Thread Punya Biswal
ps://urldefense.proofpoint.com/v1/url?u=https://github.com/apache/spark/ pull/119&k=fDZpZZQMmYwf27OU23GmAQ%3D%3D%0A&r=kTrYN051orSRhyA6mqYxbjRIX%2BBCP m7thmzLC79vBeM%3D%0A&m=FPFPeXJiBQNyIG6CREbwusGj2ZQn1K10JLVA7ZNTjxY%3D%0A&s=0 3ce2711c63b039ff6ea09a592ea5f16ac287890bcb90d1bf5855ed968ecf

Maven repo for Spark pre-built with CDH4?

2014-03-18 Thread Punya Biswal
Hi all, The Maven central repo contains an artifact for spark 0.9.0 built with unmodified Hadoop, and the Cloudera repo contains an artifact for spark 0.9.0 built with CDH 5 beta. Is there a repo that contains spark-core built against a non-beta version of CDH (such as 4.4.0)? Punya smime.p7

Re: Separating classloader management from SparkContexts

2014-03-18 Thread Punya Biswal
xtras-v2.jar") print(sc2.filter(/* fn that depends on jar */).count) } ... even if classes in extras-v1.jar and extras-v2.jar have name collisions. Punya From: Punya Biswal Reply-To: Date: Sunday, March 16, 2014 at 11:09 AM To: "user@spark.apache.org" Subject: Separating c

Separating classloader management from SparkContexts

2014-03-16 Thread Punya Biswal
Hi all, I'm trying to use Spark to support users who are interactively refining the code that processes their data. As a concrete example, I might create an RDD[String] and then write several versions of a function to map over the RDD until I'm satisfied with the transformation. Right now, once I