Re: need someone to help clear some questions.

2014-03-07 Thread Mayur Rustagi
groups.google.com/forum/#!forum/shark-users Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi On Thu, Mar 6, 2014 at 8:08 PM, qingyang li wrote: > Hi, Yana, do you know if there is mailing list for shark like spark's? > > >

how to get size of rdd in memery

2014-03-07 Thread qingyang li
dear community, can anyone tell me : how to get size of rdd in memery ? thanks.

Re: how to get size of rdd in memery

2014-03-07 Thread Mayur Rustagi
http://:4040/storage/ Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi On Fri, Mar 7, 2014 at 12:09 AM, qingyang li wrote: > dear community, can anyone tell me : how to get size of rdd in memery ? > thanks. >

Re: how to get size of rdd in memery

2014-03-07 Thread qingyang li
in that page, it is empty , it does not show anything. Here is the picture. 2014-03-07 16:14 GMT+08:00 Mayur Rustagi : > http://:4040/storage/ > > Mayur Rustagi > Ph: +1 (760) 203 3257 > http://www.sigmoidanalytics.com > @mayur_rustagi > > > > On Fri, Mar 7,

Re: how to get size of rdd in memery

2014-03-07 Thread qingyang li
addtion : 1. i have run LOAD DATA INPATH '/user/root/input/test.txt' into table b; in shark. i think this will create rdd in memery, right? 2. when i run "free -g" , the result show somethings has been stored into memery. the file is almost 4g. [root@bigdata001 spark-0.9.0-incubating-bin-hadoop2

Re: how to get size of rdd in memery

2014-03-07 Thread qingyang li
addtion : 1. i have run LOAD DATA INPATH '/user/root/input/test.txt' into table b; in shark. i think this will create rdd in memery, right? 2. when i run "free -g" , the result show somethings has been stored into memery. the file is almost 4g. [root@bigdata001 spark-0.9.0-incubating-bin-hadoop2

Class not found in Kafka-Stream due to multi-thread without correct ClassLoader?

2014-03-07 Thread Aries Kong
Hi, I'm trying to run a kafka-stream and get a strange exception. The streaming is created by following code: val lines = KafkaUtils.createStream[String, VtrRecord, StringDecoder, VtrRecordDeserializer](ssc, kafkaParams.toMap, topicpMap, StorageLevel.MEMORY_AND_DISK_SER_2)

Re: Kryo serialization does not compress

2014-03-07 Thread pradeeps8
Hi Patrick, Thanks for your reply. I am guessing even an array type will be registered automatically. Is this correct? Thanks, Pradeep -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-serialization-does-not-compress-tp2042p2400.html Sent from the Apac

java.lang.ClassNotFoundException in spark 0.9.0, shark 0.9.0 (pre-release) and hadoop 2.2.0

2014-03-07 Thread pradeeps8
Hi, We are currently trying to migrate to hadoop 2.2.0 and hence we have installed spark 0.9.0 and the pre-release version of shark 0.9.0. When we execute the script ( script.txt ) we get the following error. /org.apache.

Setting properties in core-site.xml for Spark and Hadoop to access

2014-03-07 Thread Nicholas Chammas
On spinning up a Spark cluster in EC2, I'd like to set a few configs that will allow me to access files in S3 without having to specify my AWS access and secret keys over and over, as described here . The properties are fs.s3.awsAccessKeyId and fs.s3.awsS

Re: major Spark performance problem

2014-03-07 Thread elyast
Hi, There is also an option to run spark applications on top of mesos in fine grained mode, then it is possible for fair scheduling (applications will run in parallel and mesos is responsible for scheduling all tasks) so in a sense all applications will progress in parallel, obviously it total in

Can anyone offer any insight at all?

2014-03-07 Thread Ognen Duzlevski
What is wrong with this code? A condensed set of this code works in the spark-shell. It does not work when deployed via a jar. def calcSimpleRetention(start:String,end:String,event1:String,event2:String):List[Double] = { val spd = new PipelineDate(start) val epd = new PipelineDate(en

Re: Can anyone offer any insight at all?

2014-03-07 Thread Ognen Duzlevski
Strike that. Figured it out. Don't you just hate it when you fire off an email and you figure it out as it is being sent? ;) Ognen On 3/7/14, 12:41 PM, Ognen Duzlevski wrote: What is wrong with this code? A condensed set of this code works in the spark-shell. It does not work when deployed vi

Re: Can anyone offer any insight at all?

2014-03-07 Thread Mayur Rustagi
the issue was with print? printing on worker? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi On Fri, Mar 7, 2014 at 10:43 AM, Ognen Duzlevski < og...@plainvanillagames.com> wrote: > Strike that. Figured it out. Don't you j

Re: Setting properties in core-site.xml for Spark and Hadoop to access

2014-03-07 Thread Mayur Rustagi
Set them as environment variable at boot & configure both stacks to call on that.. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi On Fri, Mar 7, 2014 at 9:32 AM, Nicholas Chammas wrote: > On spinning up a Spark cluster in

Re: Can anyone offer any insight at all?

2014-03-07 Thread Ognen Duzlevski
No. It was a logical error. val ev1rdd = f.filter(_.split(",")(0).split(":")(1).replace("\"","") == event1).map(line => (line.split(",")(2).split(":")(1).replace("\"",""),1)).cache should have mapped to ,0, not ,1 I have had the most awful time figuring out these "looped" things. It seems

Re: Job aborted: Spark cluster looks down

2014-03-07 Thread Mayur Rustagi
It seems your workers are disassociating. can you try setting STANDALONE_SPARK_MASTER_HOST=`hostname -f` in spark-env.sh I think the issue is in the way workers resolve ip & master resolves IP. master has a full dns & slaves dont. All guesses here, can you try to resolve all the hostnames on each o

Re: Running actions in loops

2014-03-07 Thread Mayur Rustagi
Mostly the job you are executing is not serializable, this typically happens when you have a library that is not serializable.. are you using any library like jodatime etc ? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi On

Re: Streaming JSON string from REST Api in Spring

2014-03-07 Thread Mayur Rustagi
Easiest is to use a queue, Kafka for example. So push your json request string into kafka, connect spark streaming to kafka & pull data from it & execute it. Spark streaming will split up the jobs & pipeline the data. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rusta

Help connecting to the cluster

2014-03-07 Thread Yana Kadiyska
Hi Spark users, could someone help me out. My company has a fully functioning spark cluster with shark running on top of it (as part of the same cluster, on the same LAN) . I'm interested in running raw spark code against it but am running against the following issue -- it seems like the machine

[BLOG] Spark on Cassandra w/ Calliope

2014-03-07 Thread Brian O'Neill
FWIW - I posted some notes to help people get started quickly with Spark on C*. http://brianoneill.blogspot.com/2014/03/spark-on-cassandra-w-calliope.html (tnx again to Rohit and team for all of their help) -brian -- Brian ONeill CTO, Health Market Science (http://healthmarketscience.com) mobil

Re: [BLOG] Spark on Cassandra w/ Calliope

2014-03-07 Thread Ognen Duzlevski
Nice, thanks :) Ognen On 3/7/14, 2:48 PM, Brian O'Neill wrote: FWIW - I posted some notes to help people get started quickly with Spark on C*. http://brianoneill.blogspot.com/2014/03/spark-on-cassandra-w-calliope.html (tnx again to Rohit and team for all of their help) -brian -- Brian ONei

Re: Running actions in loops

2014-03-07 Thread Ognen Duzlevski
Mayur, have not thought of that. Yes, I use jodatime. What is the scope that this serialization issue applies to? Only the method making a call into / using such a library? The whole class the method using such a library belongs to? Sorry if it is a dumb question :) Ognen On 3/7/14, 1:29 PM,

Re: Setting properties in core-site.xml for Spark and Hadoop to access

2014-03-07 Thread Nicholas Chammas
Mayur, So looking at the section on environment variables here, are you saying to set these options via SPARK_JAVA_OPTS -D? On a related note, in looking around I just discovered this command line tool for modi

Class not found in Kafka-Stream due to multi-thread without correct ClassLoader?

2014-03-07 Thread Aries Kong
Hi, I'm trying to run a kafka-stream and get a strange exception. The streaming is created by following code: val lines = KafkaUtils.createStream[String, VtrRecord, StringDecoder, VtrRecordDeserializer](ssc, kafkaParams.toMap, topicpMap, StorageLevel.MEMORY_AND_DISK_SER_2) 'VtrRecord' i

Re: Running actions in loops

2014-03-07 Thread Mayur Rustagi
So the whole function closure you want to apply on your RDD needs to be serializable so that it can be "serialized" & sent to workers to operate on RDD. So objects of jodatime cannot be serialized & sent hence jodatime is out of work. 2 bad answers 1. initialize jodatime for each row & complete wor

Re: Help connecting to the cluster

2014-03-07 Thread Mayur Rustagi
The driver contains the DAG scheduler which manages stages of jobs & needs to talk back & forth with workers. So you can run Driver on any machine that can reach master & drivers(even your laptop). But Driver will need to be reachable to all machines. I think 0.9.0 added an ability for the driver t

Class not found in Kafka-Stream due to multi-thread without correct ClassLoader?

2014-03-07 Thread Aries Kong
Hi, I'm trying to run a kafka-stream and get a strange exception. The streaming is created by following code: val lines = KafkaUtils.createStream[String, VtrRecord, StringDecoder, VtrRecordDeserializer](ssc, kafkaParams.toMap, topicpMap, StorageLevel.MEMORY_AND_DISK_SER_2) 'VtrRecord' i

Re: Explain About Logs NetworkWordcount.scala

2014-03-07 Thread Tathagata Das
I am not sure how to debug this without any more information about the source. Can you monitor on the receiver side that data is being accepted by the receiver but not reported? TD On Wed, Mar 5, 2014 at 7:23 AM, eduardocalfaia wrote: > Hi TD, > I have seen in the web UI the stage number that r

Re: Running actions in loops

2014-03-07 Thread Nick Pentreath
There is #3 which is use mapPartitions and init one jodatime obj per partition, which is less overhead for large objects— Sent from Mailbox for iPhone On Sat, Mar 8, 2014 at 2:54 AM, Mayur Rustagi wrote: > So the whole function closure you want to apply on your RDD needs to be > serializable so