HDFS Rest Service not available

2015-06-01 Thread Su She
Hello All, A bit scared I did something stupid...I killed a few PIDs that were listening to ports 2183 (kafka), 4042 (spark app), some of the PIDs didn't even seem to be stopped as they still are running when i do lsof -i:[port number] I'm not sure if the problem started after or before I did th

Re: HDFS Rest Service not available

2015-06-02 Thread Su She
g sbin/stop-dfs.sh and > then sbin/start-dfs.sh > > Thanks > Best Regards > > On Tue, Jun 2, 2015 at 5:03 AM, Su She wrote: >> >> Hello All, >> >> A bit scared I did something stupid...I killed a few PIDs that were >> listening to ports 2183 (kafka),

[X-post] Saving SparkSQL result RDD to Cassandra

2015-07-09 Thread Su She
Hello All, I also posted this on the Spark/Datastax thread, but thought it was also 50% a spark question (or mostly a spark question). I was wondering what is the best practice to saving streaming Spark SQL ( https://github.com/Intel-bigdata/spark-streamingsql/blob/master/src/main/scala/org/apach

Re: [X-post] Saving SparkSQL result RDD to Cassandra

2015-07-09 Thread Su She
is is an output operator, > so 'this' DStream will be registered as an output stream and therefore > materialized. > > Change it to a map, foreach or some other form of transform. > > HTH > > -Todd > > > On Thu, Jul 9, 2015 at 5:24 PM, Su She wrote: >

Re: HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-21 Thread Su She
e command for shutting down storage or can I simply stop hdfs in Cloudera Manager? Thank you for the help! On Sat, Jan 17, 2015 at 12:58 PM, Su She wrote: > Thanks Akhil and Sean for the responses. > > I will try shutting down spark, then storage and then the instances. > Initia

Re: HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-26 Thread Su She
if you do the steps correctly across your whole cluster. > I'm not sure if the stock stop-all.sh script is supposed to work. > Certainly, if you are using CM, by far the easiest is to start/stop > all of these things in CM. > > On Wed, Jan 21, 2015 at 6:08 PM, Su She wrote: &g

Re: HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-27 Thread Su She
( mostly, it binds to localhost in that case) > On 27 Jan 2015 07:25, "Su She" wrote: > >> Hello Sean and Akhil, >> >> I shut down the services on Cloudera Manager. I shut them down in the >> appropriate order and then stopped all services of CM. I then shu

Trouble deploying spark program because of soft link?

2015-01-30 Thread Su She
Hello Everyone, A bit confused on this one...I have set up the KafkaWordCount found here: https://github.com/apache/spark/blob/master/examples/scala-2.10/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java Everything runs fine when I run it using this on instance A: reposito

Best tools for visualizing Spark Streaming data?

2015-02-05 Thread Su She
Hello Everyone, I wanted to hear the community's thoughts on what (open - source) tools have been used to visualize data from Spark/Spark Streaming? I've taken a look at Zepellin, but had some trouble working with it. Couple questions: 1) I've looked at a couple blog posts and it seems like spar

Can spark job server be used to visualize streaming data?

2015-02-09 Thread Su She
Hello Everyone, I was reading this blog post: http://homes.esat.kuleuven.be/~bioiuser/blog/a-d3-visualisation-from-spark-as-a-service/ and was wondering if this approach can be taken to visualize streaming data...not just historical data? Thank you! -Suh

Re: Can spark job server be used to visualize streaming data?

2015-02-11 Thread Su She
wrote: > Checkout > > https://databricks.com/blog/2015/01/28/introducing-streaming-k-means-in-spark-1-2.html > > In there are links to how that is done. > > > --- Original Message --- > > From: "Kelvin Chu" <2dot7kel...@gmail.com> > Sent: February 10, 201

Re: Can spark job server be used to visualize streaming data?

2015-02-11 Thread Su She
-receivers > > --- Original Message --- > > From: "Su She" > Sent: February 11, 2015 10:23 AM > To: "Felix C" > Cc: "Kelvin Chu" <2dot7kel...@gmail.com>, user@spark.apache.org > Subject: Re: Can spark job server be used to visualize strea

Re: Can spark job server be used to visualize streaming data?

2015-02-12 Thread Su She
dating graphs >> periodically. I haven’t used it myself yet so not sure how well it works. >> See here: https://github.com/andypetrella/spark-notebook >> >> From: Su She >> Date: Thursday, February 12, 2015 at 1:55 AM >> To: Felix C >> Cc: Kelvin Chu, &qu

Why are there different "parts" in my CSV?

2015-02-12 Thread Su She
Hello Everyone, I am writing simple word counts to hdfs using messages.saveAsHadoopFiles("hdfs://user/ec2-user/","csv",String.class, String.class, (Class) TextOutputFormat.class); 1) However, each 2 seconds I getting a new *directory *that is titled as a csv. So i'll have test.csv, which will be

Re: Why are there different "parts" in my CSV?

2015-02-13 Thread Su She
artition before the saveAs* > call. > > messages.repartition(1).saveAsHadoopFiles("hdfs://user/ec2-user/","csv",String.class, > String.class, (Class) TextOutputFormat.class); > > > Thanks > Best Regards > > On Fri, Feb 13, 2015 at 11:59 AM, Su She

Re: Why are there different "parts" in my CSV?

2015-02-14 Thread Su She
rue(to delete the original dir),null) > > > > Thanks > Best Regards > > On Sat, Feb 14, 2015 at 2:18 AM, Su She wrote: > >> Thanks Akhil for the suggestion, it is now only giving me one part - >> . Is there anyway I can just create a file rather than a dire

Re: Why are there different "parts" in my CSV?

2015-02-14 Thread Su She
27;s and example for doing > that https://issues.apache.org/jira/browse/SPARK-944 > > Thanks > Best Regards > > On Sat, Feb 14, 2015 at 2:55 PM, Su She wrote: > >> Hello Akhil, thank you for your continued help! >> >> 1) So, if I can write it in programitically aft

Re: Why are there different "parts" in my CSV?

2015-02-14 Thread Su She
http://stackoverflow.com/questions/23527941/how-to-write-to-csv-in-spark Just read this...seems like it should be easily readable. Thanks! On Sat, Feb 14, 2015 at 1:36 AM, Su She wrote: > Thanks Akhil for the link. Is there a reason why there is a new directory > created for each bat

Re: Why are there different "parts" in my CSV?

2015-02-14 Thread Su She
til.copyMerge(FileSystem of source(hdfs), /output-location, > FileSystem > > of destination(hdfs), Path to the merged files /merged-ouput, true(to > delete > > the original dir),null) > > > > > > > > Thanks > > Best Regards > > > > On Sat, Fe

Re: Why are there different "parts" in my CSV?

2015-02-14 Thread Su She
ectory of these files. > > On Sat, Feb 14, 2015 at 9:05 PM, Su She wrote: > > Thanks Sean and Akhil! I will take out the repartition(1). Please let me > > know if I understood this correctly, Spark Streamingwrites data like > this: > > > > foo-1001.csv/part -x

What joda-time dependency does spark submit use/need?

2015-02-27 Thread Su She
Hello Everyone, I'm having some issues launching (non-spark) applications via the spark-submit commands. The common error I am getting is c/p below. I am able to submit a spark streaming/kafka spark application, but can't start a dynamoDB java app. The common error is related to joda-time. 1) I r

Re: What joda-time dependency does spark submit use/need?

2015-03-02 Thread Su She
specify these jars (joda-time-2.7.jar, joda-convert-1.7.jar) > either as part of your build and assembly or via the --jars option to > spark-submit. > > HTH. > > On Fri, Feb 27, 2015 at 2:48 PM, Su She wrote: > >> Hello Everyone, >> >> I'm having some is

Running Scala Word Count Using Maven

2015-03-15 Thread Su She
Hello Everyone, I am trying to run the Word Count from here: https://github.com/holdenk/learning-spark-examples/blob/master/mini-complete-example/src/main/scala/com/oreilly/learningsparkexamples/mini/scala/WordCount.scala I was able to successfully run the app using SBT, but not Maven. I don't se

Re: Running Scala Word Count Using Maven

2015-03-16 Thread Su She
Hello, So actually solved the problem...see point 3. Here are a few approaches/errors I was getting: 1) mvn package exec:java -Dexec.mainClass=HelloWorld Error: java.lang.ClassNotFoundException: HelloWorld 2) http://stackoverflow.com/questions/26929100/running-a-scala-application-in-maven-pro

MLlib Spam example gets stuck in Stage X

2015-03-18 Thread Su She
Hello Everyone, I am trying to run this MLlib example from Learning Spark: https://github.com/databricks/learning-spark/blob/master/src/main/scala/com/oreilly/learningsparkexamples/scala/MLlib.scala#L48 Things I'm doing differently: 1) Using spark shell instead of an application 2) instead of t

Re: MLlib Spam example gets stuck in Stage X

2015-03-19 Thread Su She
on Stage 1. > See if its a GC time, then try increasing the level of parallelism or > repartition it like sc.getDefaultParallelism*3. > > Thanks > Best Regards > > On Thu, Mar 19, 2015 at 12:15 PM, Su She wrote: > >> Hello Everyone, >> >> I am trying to ru

Re: MLlib Spam example gets stuck in Stage X

2015-03-20 Thread Su She
> > Thanks > > Best Regards > > > > On Thu, Mar 19, 2015 at 1:15 PM, Su She wrote: > >> > >> Hi Akhil, > >> > >> 1) How could I see how much time it is spending on stage 1? Or what if, > >> like above, it doesn't get past stag

Re: MLlib Spam example gets stuck in Stage X

2015-03-30 Thread Su She
tly. > > On Mon, Mar 30, 2015 at 10:16 AM, Xiangrui Meng wrote: >> >> +Holden, Joseph >> >> It seems that there is something wrong with the sample data file: >> https://github.com/databricks/learning-spark/blob/master/files/ham.txt >> >> -Xiangrui >>

value reduceByKeyAndWindow is not a member of org.apache.spark.streaming.dstream.DStream[(String, Int)]

2015-04-07 Thread Su She
Hello Everyone, I am trying to implement this example (Spark Streaming with Twitter). https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/TwitterPopularTags.scala I am able to do: hashTags.print() to get a live stream of filtered hashtags, but

Getting error running MLlib example with new cluster

2015-04-23 Thread Su She
I had asked this question before, but wanted to ask again as I think it is related to my pom file or project setup. I have been trying on/off for the past month to try to run this MLlib example: - To unsubscribe, e-mail: user-uns

Getting error running MLlib example with new cluster

2015-04-23 Thread Su She
Sorry, accidentally sent the last email before finishing. I had asked this question before, but wanted to ask again as I think it is now related to my pom file or project setup. Really appreciate the help! I have been trying on/off for the past month to try to run this MLlib example: https://git

Re: Getting error running MLlib example with new cluster

2015-04-27 Thread Su She
e/ec2-user/sparkApps/learning-spark/target/simple-project-1.1.jar Thank you for the help! Best, Su On Mon, Apr 27, 2015 at 9:58 AM, Xiangrui Meng wrote: > How did you run the example app? Did you use spark-submit? -Xiangrui > > On Thu, Apr 23, 2015 at 2:27 PM, Su She wrote: >>

Re: Getting error running MLlib example with new cluster

2015-05-11 Thread Su She
! On Mon, Apr 27, 2015 at 11:48 AM, Su She wrote: > Hello Xiangrui, > > I am using this spark-submit command (as I do for all other jobs): > > /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/bin/spark-submit > --class MLlib --master local[2] --jars $(echo > /ho

Trouble trying to run ./spark-ec2 script

2015-05-13 Thread Su She
I'm trying to set up my own cluster and am having trouble running this script: ./spark-ec2 --key-pair=xx --identity-file=xx.pem --region=us-west-2 --zone=us-west-2c --num-slaves=1 launch my-spark-cluster based off: https://spark.apache.org/docs/latest/ec2-scripts.html It just tries to open the s

Re: Trouble trying to run ./spark-ec2 script

2015-05-13 Thread Su She
Hmm, just tried to run it again, but opened the script with python, the cmd line seemed to pop up really quick and exited. On Wed, May 13, 2015 at 2:06 PM, Su She wrote: > Hi Ted, Yes I do have Python 3.5 installed. I just ran "py" from the > ec2 directory and it started up

Getting Output From a Cluster

2015-01-08 Thread Su She
Hello Everyone, Thanks in advance for the help! I successfully got my Kafka/Spark WordCount app to print locally. However, I want to run it on a cluster, which means that I will have to save it to HDFS if I want to be able to read the output. I am running Spark 1.1.0, which means according to th

Re: Getting Output From a Cluster

2015-01-08 Thread Su She
ext files on the DStream --looks like it? Look > at the section called "Design Patterns for using foreachRDD" in the link > you sent -- you want to do dstream.foreachRDD(rdd => rdd.saveAs) > > On Thu, Jan 8, 2015 at 5:20 PM, Su She wrote: > >> Hello Everyone,

Re: Getting Output From a Cluster

2015-01-08 Thread Su She
; yourStream.saveAsNewAPIHadoopFiles(hdfsUrl, "/output-location",Text.class, > Text.class, outputFormatClass); > > > > Thanks > Best Regards > > On Fri, Jan 9, 2015 at 10:22 AM, Su She wrote: > >> Yes, I am calling the saveAsHadoopFiles on the Dstream. How

Re: Getting Output From a Cluster

2015-01-12 Thread Su She
Hello Everyone, Quick followup, is there any way I can append output to one file rather then create a new directory/file every X milliseconds? Thanks! Suhas Shekar University of California, Los Angeles B.A. Economics, Specialization in Computing 2014 On Thu, Jan 8, 2015 at 11:41 PM, Su She

Re: Getting Output From a Cluster

2015-01-12 Thread Su She
on the data to 1 before saving. > Another way would be to use hadoop's copy merge command/api(available from > 2.0 versions) > On 13 Jan 2015 01:08, "Su She" wrote: > >> Hello Everyone, >> >> Quick followup, is there any way I can append output to one fi

HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-16 Thread Su She
Hello Everyone, I am encountering trouble running Spark applications when I shut down my EC2 instances. Everything else seems to work except Spark. When I try running a simple Spark application, like sc.parallelize() I get the message that hdfs name node is in safemode. Has anyone else had this i

Re: HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-17 Thread Su She
> > stop-all.sh would do) and then shutdown the machines. > > > > You can execute the following command to disable safe mode: > > > >> hadoop fs -safemode leave > > > > > > > > Thanks > > Best Regards > > > > On Sat, Jan 17, 201