Re: Spark Streaming

2015-01-17 Thread Rohit Pujari
on on the stream. > > On Sat, Jan 17, 2015 at 10:17 AM, Rohit Pujari > wrote: > > Hi Francois: > > > > I tried using "print(kafkaStream)” as output operator but no luck. It > throws > > the same error. Any other thoughts? > > > > Thanks, > >

Re: Spark Streaming

2015-01-17 Thread Rohit Pujari
Date: Saturday, January 17, 2015 at 4:10 AM To: Rohit Pujari mailto:rpuj...@hortonworks.com>> Subject: Re: Spark Streaming Streams are lazy. Their computation is triggered by an output operator, which is apparently missing from your code. See the programming guide: https://spark.apache

Spark Streaming

2015-01-17 Thread Rohit Pujari
streams = (1 to numPartitionsOfInputTopic) map { _ => KafkaUtils.createStream(ssc, kafkaParams, Map(inputTopic -> 1), StorageLevel.MEMORY_ONLY_SER) } val unifiedStream = ssc.union(streams) val sparkProcessingParallelism = 1 unifiedStream.repartition(sparkProcessingPar

Re: Market Basket Analysis

2014-12-05 Thread Rohit Pujari
for frequent item > set algos when they really mean they want to compute item similarity > or make recommendations. What's your use case? > > On Thu, Dec 4, 2014 at 8:23 PM, Rohit Pujari > wrote: > > Sure, I’m looking to perform frequent item set analysis on POS data set. &

Re: Market Basket Analysis

2014-12-04 Thread Rohit Pujari
se to perform a similar task? If there's no spoon to spoon substitute, spoon to fork will suffice too. Hopefully this provides some clarification. Thanks, Rohit From: Tobias Pfeiffer mailto:t...@preferred.jp>> Date: Thursday, December 4, 2014 at 7:20 PM To: Rohit Pujari mailto:rpuj...

Market Basket Analysis

2014-12-04 Thread Rohit Pujari
Hello Folks: I'd like to do market basket analysis using spark, what're my options? Thanks, Rohit Pujari Solutions Architect, Hortonworks -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain i

Python Scientific Libraries in Spark

2014-11-24 Thread Rohit Pujari
hat's possible today and some of the active development in the community that's on the horizon. Thanks, Rohit Pujari Solutions Architect, Hortonworks -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain i

Re: Spark job doesn't clean after itself

2014-10-12 Thread Rohit Pujari
Reviving this .. any thoughts experts? On Thu, Oct 9, 2014 at 3:47 PM, Rohit Pujari wrote: > Hello Folks: > > I'm running spark job on YARN. After the execution, I would expect the > spark job to clean staging the area, but it seems every run creates a new > staging director

Debug Spark in Cluster Mode

2014-10-09 Thread Rohit Pujari
Hello Folks: What're some best practices to debug Spark in cluster mode? Thanks, Rohit -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disc

Spark job doesn't clean after itself

2014-10-09 Thread Rohit Pujari
Hello Folks: I'm running spark job on YARN. After the execution, I would expect the spark job to clean staging the area, but it seems every run creates a new staging directory. Is there a way to force spark job to clean after itself? Thanks, Rohit -- CONFIDENTIALITY NOTICE NOTICE: This message

How true is this about spark streaming?

2014-07-28 Thread Rohit Pujari
you please offer some insights? Thanks, Rohit Pujari Solutions Engineer, Hortonworks rpuj...@hortonworks.com 716-430-6899 -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential,

Re: Can Spark stack scale to petabyte scale without performance degradation?

2014-07-16 Thread Rohit Pujari
://www.nubetech.co/> > > <http://in.linkedin.com/in/sonalgoyal> > > > > > On Wed, Jul 16, 2014 at 9:17 AM, Rohit Pujari > wrote: > >> Hello Folks: >> >> There is lot of buzz in the hadoop community around Spark's inability to >> sc

Can Spark stack scale to petabyte scale without performance degradation?

2014-07-15 Thread Rohit Pujari
tter understand boundaries of the tech and recommend right solution for right problem. Thanks, Rohit Pujari Solutions Engineer, Hortonworks rpuj...@hortonworks.com 716-430-6899 -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed

Re: KMeansModel Construtor error

2014-07-15 Thread Rohit Pujari
://issues.apache.org/jira/browse/SPARK-2488 and we try to > make sure it is implemented in v1.1. For now, you can modify the > KMeansModel and remove private[mllib] from the constructor. Sorry for > the inconvenience! -Xiangrui > >> On Mon, Jul 14, 2014 at 10:41 PM, Rohit Pujari

KMeansModel Construtor error

2014-07-14 Thread Rohit Pujari
Hello Folks: I have written a simple program to read the already saved model from HDFS and score it. But when I'm trying to read the saved model, I get the following error. Any clues what might be going wrong here .. val x = sc.objectFile[Vector]("/data/model").collect() val y = new KMeansModel(