Re: Compute pairwise distance

2016-07-07 Thread Manoj Awasthi
Hi Debasish, All, I see the status of SPARK-4823 [0] is "in-progress" still. I couldn't gather from the relevant pull request [1] if part of it is already in 1.6.0 (it's closed now). We are facing the same problem of computing pairwise distances between vectors where rows are > 5M and columns in t

Re: Confusing RDD function

2016-03-08 Thread Manoj Awasthi
Spark RDDs are lazily computed and hence unless an 'action' is applied which mandates the computation - there won't be any computation. You can read more on spark docs. On Mar 9, 2016 7:11 AM, "Hemminger Jeff" wrote: > > I'm currently developing a Spark Streaming application. > > I have a functio

Re: reading the parquet file in spark sql

2016-03-07 Thread Manoj Awasthi
>From the parquet file content (dir content) it doesn't look like that parquet write was successful or complete. On Mon, Mar 7, 2016 at 11:17 AM, Angel Angel wrote: > Hello Sir/Madam, > > I am running one spark application having 3 slaves and one master. > > I am wring the my information using t

Re: AM creation in yarn client mode

2016-02-10 Thread Manoj Awasthi
My pardon to writing that "there is no AM". I realize it! :-) :-) On Wed, Feb 10, 2016 at 7:14 PM, Steve Loughran wrote: > > On 10 Feb 2016, at 13:20, Manoj Awasthi wrote: > > > > On Wed, Feb 10, 2016 at 5:20 PM, Steve Loughran > wrote: > >> >&

Re: AM creation in yarn client mode

2016-02-10 Thread Manoj Awasthi
On Wed, Feb 10, 2016 at 5:20 PM, Steve Loughran wrote: > > On 10 Feb 2016, at 04:42, praveen S wrote: > > Hi, > > I have 2 questions when running the spark jobs on yarn in client mode : > > 1) Where is the AM(application master) created : > > > in the cluster > > > A) is it created on the client

Re: Client versus cluster mode

2016-01-21 Thread Manoj Awasthi
The only difference is that in yarn-cluster mode your driver runs within a yarn container (called AM or application master). You would want to run your production jobs in yarn-cluster mode while for development environment may do with yarn-client mode. Again, I think this just a recommendation and

Re: How to group multiple row data ?

2015-04-29 Thread Manoj Awasthi
Sorry but I didn't fully understand the grouping. This line: >> The group must only take the closest previous trigger. The first one hence shows alone. Can you please explain further? On Wed, Apr 29, 2015 at 4:42 PM, bipin wrote: > Hi, I have a ddf with schema (CustomerID, SupplierID, Product

Re: Reading a text file into RDD[Char] instead of RDD[String]

2015-03-19 Thread Manoj Awasthi
sc.textFile().flatMap(_.toIterator) On Thu, Mar 19, 2015 at 6:31 PM, Sean Owen wrote: > val s = sc.parallelize(Array("foo", "bar", "baz")) > > val c = s.flatMap(_.toIterator) > > c.collect() > res8: Array[Char] = Array(f, o, o, b, a, r, b, a, z) > > On Thu, Mar 19, 2015 at 8:46 AM, Michael L