Re: CPU/Disk/network performance instrumentation

2014-07-09 Thread Surendranauth Hiraman
+1 on advanced tab. On Wed, Jul 9, 2014 at 5:20 PM, Mridul Muralidharan wrote: > +1 on advanced mode ! > > Regards. > Mridul > > On Thu, Jul 10, 2014 at 12:55 AM, Reynold Xin wrote: > > Maybe it's time to create an advanced mode in the ui. > > > > > > On Wed, Jul 9, 2014 at 12:23 PM, Kay Oust

PySpark Driver from Jython

2014-07-01 Thread Surendranauth Hiraman
Has anyone tried running pyspark driver code in Jython, preferably by calling python code within Java code? I know CPython is the only interpreter tested because of the need to support C extensions. But in my case, C extensions would be called on the worker, not in the driver. And being able to

Re: Trailing Tasks Saving to HDFS

2014-06-19 Thread Surendranauth Hiraman
owse/SPARK-2202 -Suren On Wed, Jun 18, 2014 at 8:35 PM, Surendranauth Hiraman < suren.hira...@velos.io> wrote: > Looks like eventually there was some type of reset or timeout and the > tasks have been reassigned. I'm guessing they'll keep failing until max > failure cou

Re: Java IO Stream Corrupted - Invalid Type AC?

2014-06-18 Thread Surendranauth Hiraman
, do you get this particular exception if you are not > consolidating shuffle data? > > On Wed, Jun 18, 2014 at 12:15 PM, Mridul Muralidharan > wrote: > > On Wed, Jun 18, 2014 at 6:19 PM, Surendranauth Hiraman > > wrote: > >> Patrick, > >> > >> My

Re: Java IO Stream Corrupted - Invalid Type AC?

2014-06-18 Thread Surendranauth Hiraman
conf.set("spark.akka.askTimeout", "30") // block manager conf.set("spark.storage.blockManagerTimeoutIntervalMs", "18") conf.set("spark.blockManagerHeartBeatMs", "8") -Suren On Wed, Jun 18,

Re: Java IO Stream Corrupted - Invalid Type AC?

2014-06-17 Thread Surendranauth Hiraman
Matt/Ryan, Did you make any headway on this? My team is running into this also. Doesn't happen on smaller datasets. Our input set is about 10 GB but we generate 100s of GBs in the flow itself. -Suren On Fri, Jun 6, 2014 at 5:19 PM, Ryan Compton wrote: > Just ran into this today myself. I'm

Compression with DISK_ONLY persistence

2014-06-11 Thread Surendranauth Hiraman
Hi, Will spark.rdd.compress=true enable compression when using DISK_ONLY persistence? SUREN HIRAMAN, VP TECHNOLOGY Velos Accelerating Machine Learning 440 NINTH AVENUE, 11TH FLOOR NEW YORK, NY 10001 O: (917) 525-2466 ext. 105 F: 646.349.4063 E: suren.hiraman@v elos.io W: www.velos.io

Re: Error During ReceivingConnection

2014-06-11 Thread Surendranauth Hiraman
16.25.125,45610) 14/06/10 18:51:14 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(172.16.25.125,45610) On Wed, Jun 11, 2014 at 8:38 AM, Surendranauth Hiraman < suren.hira...@velos.io> wrote: > I have a somewhat large job (10 GB input data but generate

Re: Spark 1.0.0 - Java 8

2014-05-30 Thread Surendranauth Hiraman
With respect to virtual hosts, my team uses Vagrant/Virtualbox. We have 3 CentOS VMs with 4 GB RAM each - 2 worker nodes and a master node. Everything works fine, though if you are using MapR, you have to make sure they are all on the same subnet. -Suren On Fri, May 30, 2014 at 12:20 PM, Upend

Re: Clearspring Analytics Version

2014-05-27 Thread Surendranauth Hiraman
Xin wrote: > 2.7 sounds good. I was actually waiting for 2.7 to come out to post a JIRA > (mainly for the serializable HyperLogLogPlus class). > > > On Tue, May 27, 2014 at 3:11 PM, Surendranauth Hiraman < > suren.hira...@velos.io> wrote: > > > Great, I will submit

Re: Clearspring Analytics Version

2014-05-27 Thread Surendranauth Hiraman
t rid of the SerializableHyperLogLog class. (and move to use > HyperLogLogPlus). > > > > > > > On Tue, May 27, 2014 at 3:01 PM, Surendranauth Hiraman < > suren.hira...@velos.io> wrote: > > > Hi, > > > > It looks like the version of Clearspring's stream

Clearspring Analytics Version

2014-05-27 Thread Surendranauth Hiraman
Hi, It looks like the version of Clearspring's stream analytics class in 1.0 branch and master is 2.5 There were some significant bug fixes in 2.6 and version 2.7 is just out now as well. Are there any plans to upgrade? The QDigest deserialization code in 2.5 seems to have bugs that are fixed i

Re: Calling Spark enthusiasts in NYC

2014-03-31 Thread Surendranauth Hiraman
We at Velos are willing to get involved too. Please let me know. -Suren SUREN HIRAMAN, VP TECHNOLOGY Velos Accelerating Machine Learning 440 NINTH AVENUE, 11TH FLOOR NEW YORK, NY 10001 O: (917) 525-2466 ext. 105 F: 646.349.4063 E: suren.hiraman@v elos.io W: www.velos.io On Mon, Mar 31, 2014

Re: Largest input data set observed for Spark.

2014-03-20 Thread Surendranauth Hiraman
Reynold, How complex was that job (I guess in terms of number of transforms and actions) and how long did that take to process? -Suren On Thu, Mar 20, 2014 at 2:08 PM, Reynold Xin wrote: > Actually we just ran a job with 70TB+ compressed data on 28 worker nodes - > I didn't count the size of