Re: Shuffle files

2014-10-07 Thread Sunny Khatri
@SK: Make sure ulimit has taken effect as Todd mentioned. You can verify via ulimit -a. Also make sure you have proper kernel parameters set in /etc/sysctl.conf (MacOSX) On Tue, Oct 7, 2014 at 3:57 PM, Lisonbee, Todd wrote: > > Are you sure the new ulimit has taken effect? > > How many cores are

Re: return probability \ confidence instead of actual class

2014-10-07 Thread Sunny Khatri
1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, >>> 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0] >>> ] >>> Y = [ >>> 0.0, >>> 0.0, >>> 0.0, >>> 1.0, >>> 1.0, >>> 1.0 >>>

Re: Cannot read from s3 using "sc.textFile"

2014-10-07 Thread Sunny Khatri
Not sure if it's supposed to work. Can you try newAPIHadoopFile() passing in the required configuration object. On Tue, Oct 7, 2014 at 4:20 AM, Tomer Benyamini wrote: > Hello, > > I'm trying to read from s3 using a simple spark java app: > > - > > SparkConf sparkConf = new Sp

Re: return probability \ confidence instead of actual class

2014-10-06 Thread Sunny Khatri
, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, > 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0], > [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, > 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, > 1.0,

Re: Fwd: Spark SQL: ArrayIndexOutofBoundsException

2014-10-02 Thread Sunny Khatri
You can do filter with startswith ? On Thu, Oct 2, 2014 at 4:04 PM, SK wrote: > Thanks for the help. Yes, I did not realize that the first header line has > a > different separator. > > By the way, is there a way to drop the first line that contains the header? > Something along the following li

Re: return probability \ confidence instead of actual class

2014-09-24 Thread Sunny Khatri
For multi-class you can use the same SVMWithSGD (for binary classification) with One-vs-All approach constructing respective training corpuses consisting one Class i as positive samples and Rest of the classes as negative one, and then use the same method provided by Aris as a measure of how far Cl

Re: Using Hadoop InputFormat in Python

2014-08-13 Thread Sunny Khatri
Not that much familiar with Python APIs, but You should be able to configure a job object with your custom InputFormat and pass in the required configuration (:- job.getConfiguration()) to newAPIHadoopRDD to get the required RDD On Wed, Aug 13, 2014 at 2:59 PM, Tassilo Klein wrote: > Hi, > > I'

Re: Reference External Variables in Map Function (Inner class)

2014-08-12 Thread Sunny Khatri
n Tue, Aug 12, 2014 at 10:56 AM, Sunny Khatri > wrote: > > Are there any other workarounds that could be used to pass in the values > > from someVariable to the transformation function ? > > > > > > On Tue, Aug 12, 2014 at 10:48 AM, Sean Owen wrote: > >>

Re: Reference External Variables in Map Function (Inner class)

2014-08-12 Thread Sunny Khatri
looking at its local > SampleOuterClass, which is maybe not initialized on the remote JVM. > > On Tue, Aug 12, 2014 at 6:02 PM, Sunny Khatri > wrote: > > I have a class defining an inner static class (map function). The inner > > class tries to refer the variable instant

Reference External Variables in Map Function (Inner class)

2014-08-12 Thread Sunny Khatri
I have a class defining an inner static class (map function). The inner class tries to refer the variable instantiated in the outside class, which results in a NullPointerException. Sample Code as follows: class SampleOuterClass { private static ArrayList someVariable; SampleOuterClass

Re: Spark Memory Issues

2014-08-05 Thread Sunny Khatri
Yeah, ran it on yarn-cluster mode. On Tue, Aug 5, 2014 at 12:17 PM, Akhil Das wrote: > Are you sure that you were not running SparkPi in local mode? > > Thanks > Best Regards > > > On Wed, Aug 6, 2014 at 12:43 AM, Sunny Khatri > wrote: > >> Well I was able t

Re: Spark Memory Issues

2014-08-05 Thread Sunny Khatri
t to make sure your cluster setup is proper and is working. > > Thanks > Best Regards > > > On Wed, Aug 6, 2014 at 12:17 AM, Sunny Khatri > wrote: > >> The only UI I have currently is the Application Master (Cluster mode), >> with the following executor nodes s

Re: Spark Memory Issues

2014-08-05 Thread Sunny Khatri
(as seen in the > top left of the webUI) while creating the SparkContext. > > > > Thanks > Best Regards > > > On Tue, Aug 5, 2014 at 11:38 PM, Sunny Khatri > wrote: > >> Hi, >> >> I'm trying to run a spark application with the executo

Spark Memory Issues

2014-08-05 Thread Sunny Khatri
Hi, I'm trying to run a spark application with the executor-memory 3G. but I'm running into the following error: 14/08/05 18:02:58 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[5] at map at KMeans.scala:123), which has no missing parents 14/08/05 18:02:58 INFO DAGScheduler: Submitting 1 missin