SparkR and Spark Mlib

2015-07-03 Thread praveen S
Hi, Is sparkR and spark Mlib same?

No of Spark context per jvm

2016-05-09 Thread praveen S
Hi, As far as I know you can create one SparkContext per jvm, but wanted to confirm if it's one per jvm or one per classloader. As in one SparkContext created per *. war, all deployment under one tomcat instance Regards, Praveen

Create a n x n graph given only the vertices

2016-01-08 Thread praveen S
Is it possible in graphx to create/generate a graph n x n given n vertices?

Re: Create a n x n graph given only the vertices no

2016-01-10 Thread praveen S
Is it possible in graphx to create/generate graph of n x n given only the vertices. On 8 Jan 2016 23:57, "praveen S" wrote: > Is it possible in graphx to create/generate a graph n x n given n > vertices? >

Re: Create a n x n graph given only the vertices no

2016-01-11 Thread praveen S
Action* Michael Malak and Robin East > Manning Publications Co. > http://www.manning.com/books/spark-graphx-in-action > > > > > > On 11 Jan 2016, at 03:19, praveen S wrote: > > Is it possible in graphx to create/generate graph of n x n given only the > vertices. > On 8 Jan 2016 23:57, "praveen S" wrote: > >> Is it possible in graphx to create/generate a graph n x n given n >> vertices? >> > >

Usage of SparkContext within a Web container

2016-01-13 Thread praveen S
Is use of SparkContext from a Web container a right way to process spark jobs or should we use spark-submit in a processbuilder? Are there any pros or cons of using SparkContext from a Web container..? How does zeppelin trigger spark jobs from the Web context?

Re: Reuse Executor JVM across different JobContext

2016-01-19 Thread praveen S
Can you give me more details on Spark's jobserver. Regards, Praveen On 18 Jan 2016 03:30, "Jia" wrote: > I guess all jobs submitted through JobServer are executed in the same JVM, > so RDDs cached by one job can be visible to all other jobs executed later. > On Jan 17, 2016, at 3:56 PM, Mark Ham

Re: Create a n x n graph given only the vertices no

2016-01-20 Thread praveen S
. --- Robin East *Spark GraphX in Action* Michael Malak and Robin East Manning Publications Co. http://www.manning.com/books/spark-graphx-in-action On 11 Jan 2016, at 12:30, praveen S wrote: Yes I was looking something of that sort.. Thank you

Re: Create a n x n graph given only the vertices no

2016-01-20 Thread praveen S
Sorry.. Found the api.. On 21 Jan 2016 10:17, "praveen S" wrote: > Hi Robin, > > I am using Spark 1.3 and I am not able to find the api > Graph.fromEdgeTuples(edge RDD, 1) > > Regards, > Praveen > Well you can use a similar tech to generate an RDD[(Long, Long)]

AM creation in yarn client mode

2016-02-09 Thread praveen S
Hi, I have 2 questions when running the spark jobs on yarn in client mode : 1) Where is the AM(application master) created : A) is it created on the client where the job was submitted? i.e driver and AM on the same client? Or B) yarn decides where the the AM should be created? 2) Driver and AM

Re: AM creation in yarn-client mode

2016-02-09 Thread praveen S
Can you explain what happens in yarn client mode? Regards, Praveen On 10 Feb 2016 10:55, "ayan guha" wrote: > It depends on yarn-cluster and yarn-client mode. > > On Wed, Feb 10, 2016 at 3:42 PM, praveen S wrote: > >> Hi, >> >> I have 2 questions when

Re: Best practises of share Spark cluster over few applications

2016-02-14 Thread praveen S
Even i was trying to launch spark jobs from webservice : But I thought you could run spark jobs in yarn mode only through spark-submit. Is my understanding not correct? Regards, Praveen On 15 Feb 2016 08:29, "Sabarish Sasidharan" wrote: > Yes you can look at using the capacity scheduler or the

Re: Spark Streaming with Kafka Use Case

2016-02-18 Thread praveen S
Have a look at spark.streaming.backpressure.enabled Property Regards, Praveen On 18 Feb 2016 00:13, "Abhishek Anand" wrote: > I have a spark streaming application running in production. I am trying to > find a solution for a particular use case when my application has a > downtime of say 5 hour

Re: Spark Streaming with Kafka Use Case

2016-02-18 Thread praveen S
Can having a smaller block interval only resolve this? Regards, Praveen On 18 Feb 2016 21:13, "Cody Koeninger" wrote: > Backpressure won't help you with the first batch, you'd need > spark.streaming.kafka.maxRatePerPartition > for that > > On Thu, Feb 18

Re: Spark Streaming with Kafka Use Case

2016-02-18 Thread praveen S
Sorry.. Rephrasing : Can this issue be resolved by having a smaller block interval? Regards, Praveen On 18 Feb 2016 21:30, "praveen S" wrote: > Can having a smaller block interval only resolve this? > > Regards, > Praveen > On 18 Feb 2016 21:13, "Cody Koeninger&qu

Concurrent execution of actions within a driver

2015-10-26 Thread praveen S
Does spark run different actions of an rdd within a driver in parallel also? Let's say class Driver{ val rdd1= sc. textFile("... ") val rdd2=sc.textFile("") rdd1. collect //Action 1 rdd2. collect //Action 2 } Does Spark run Action 1 & 2 run in parallel? ( some kind of a pass through the dri

Difference between RandomForestModel and RandomForestClassificationModel

2015-07-29 Thread praveen S
Hi Wanted to know what is the difference between RandomForestModel and RandomForestClassificationModel? in Mlib.. Will they yield the same results for a given dataset?

Spark MLib v/s SparkR

2015-08-05 Thread praveen S
I was wondering when one should go for MLib or SparkR. What is the criteria or what should be considered before choosing either of the solutions for data analysis? or What is the advantages of Spark MLib over Spark R or advantages of SparkR over MLib?

Re: How to binarize data in spark

2015-08-06 Thread praveen S
Use StringIndexer in MLib1.4 : https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/ml/feature/StringIndexer.html On Thu, Aug 6, 2015 at 8:49 PM, Adamantios Corais < adamantios.cor...@gmail.com> wrote: > I have a set of data based on which I want to create a classification > model. Each

Re: Spark MLib v/s SparkR

2015-08-06 Thread praveen S
ing to solve, and > then the selection may be evident. > > > On Wednesday, August 5, 2015, praveen S wrote: > >> I was wondering when one should go for MLib or SparkR. What is the >> criteria or what should be considered before choosing either of the >> solutio

StringIndexer + VectorAssembler equivalent to HashingTF?

2015-08-06 Thread praveen S
Is StringIndexer + VectorAssembler equivalent to HashingTF while converting the document for analysis?

Meaning of local[2]

2015-08-17 Thread praveen S
What does this mean in .setMaster("local[2]") Is this applicable only for standalone Mode? Can I do this in a cluster setup, eg: . setMaster("[2]").. Is it number of threads per worker node?

Regarding rdd.collect()

2015-08-17 Thread praveen S
When I do an rdd.collect().. The data moves back to driver Or is still held in memory across the executors?