Hello All,
A bit scared I did something stupid...I killed a few PIDs that were
listening to ports 2183 (kafka), 4042 (spark app), some of the PIDs
didn't even seem to be stopped as they still are running when i do
lsof -i:[port number]
I'm not sure if the problem started after or before I did th
g sbin/stop-dfs.sh and
> then sbin/start-dfs.sh
>
> Thanks
> Best Regards
>
> On Tue, Jun 2, 2015 at 5:03 AM, Su She wrote:
>>
>> Hello All,
>>
>> A bit scared I did something stupid...I killed a few PIDs that were
>> listening to ports 2183 (kafka),
Hello All,
I also posted this on the Spark/Datastax thread, but thought it was also
50% a spark question (or mostly a spark question).
I was wondering what is the best practice to saving streaming Spark SQL (
https://github.com/Intel-bigdata/spark-streamingsql/blob/master/src/main/scala/org/apach
is is an output operator,
> so 'this' DStream will be registered as an output stream and therefore
> materialized.
>
> Change it to a map, foreach or some other form of transform.
>
> HTH
>
> -Todd
>
>
> On Thu, Jul 9, 2015 at 5:24 PM, Su She wrote:
>
e command for shutting down storage or can I simply stop
hdfs in Cloudera Manager?
Thank you for the help!
On Sat, Jan 17, 2015 at 12:58 PM, Su She wrote:
> Thanks Akhil and Sean for the responses.
>
> I will try shutting down spark, then storage and then the instances.
> Initia
if you do the steps correctly across your whole cluster.
> I'm not sure if the stock stop-all.sh script is supposed to work.
> Certainly, if you are using CM, by far the easiest is to start/stop
> all of these things in CM.
>
> On Wed, Jan 21, 2015 at 6:08 PM, Su She wrote:
&g
( mostly, it binds to localhost in that case)
> On 27 Jan 2015 07:25, "Su She" wrote:
>
>> Hello Sean and Akhil,
>>
>> I shut down the services on Cloudera Manager. I shut them down in the
>> appropriate order and then stopped all services of CM. I then shu
Hello Everyone,
A bit confused on this one...I have set up the KafkaWordCount found here:
https://github.com/apache/spark/blob/master/examples/scala-2.10/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java
Everything runs fine when I run it using this on instance A: reposito
Hello Everyone,
I wanted to hear the community's thoughts on what (open - source) tools
have been used to visualize data from Spark/Spark Streaming? I've taken a
look at Zepellin, but had some trouble working with it.
Couple questions:
1) I've looked at a couple blog posts and it seems like spar
Hello Everyone,
I was reading this blog post:
http://homes.esat.kuleuven.be/~bioiuser/blog/a-d3-visualisation-from-spark-as-a-service/
and was wondering if this approach can be taken to visualize streaming
data...not just historical data?
Thank you!
-Suh
wrote:
> Checkout
>
> https://databricks.com/blog/2015/01/28/introducing-streaming-k-means-in-spark-1-2.html
>
> In there are links to how that is done.
>
>
> --- Original Message ---
>
> From: "Kelvin Chu" <2dot7kel...@gmail.com>
> Sent: February 10, 201
-receivers
>
> --- Original Message ---
>
> From: "Su She"
> Sent: February 11, 2015 10:23 AM
> To: "Felix C"
> Cc: "Kelvin Chu" <2dot7kel...@gmail.com>, user@spark.apache.org
> Subject: Re: Can spark job server be used to visualize strea
dating graphs
>> periodically. I haven’t used it myself yet so not sure how well it works.
>> See here: https://github.com/andypetrella/spark-notebook
>>
>> From: Su She
>> Date: Thursday, February 12, 2015 at 1:55 AM
>> To: Felix C
>> Cc: Kelvin Chu, &qu
Hello Everyone,
I am writing simple word counts to hdfs using
messages.saveAsHadoopFiles("hdfs://user/ec2-user/","csv",String.class,
String.class, (Class) TextOutputFormat.class);
1) However, each 2 seconds I getting a new *directory *that is titled as a
csv. So i'll have test.csv, which will be
artition before the saveAs*
> call.
>
> messages.repartition(1).saveAsHadoopFiles("hdfs://user/ec2-user/","csv",String.class,
> String.class, (Class) TextOutputFormat.class);
>
>
> Thanks
> Best Regards
>
> On Fri, Feb 13, 2015 at 11:59 AM, Su She
rue(to delete the original dir),null)
>
>
>
> Thanks
> Best Regards
>
> On Sat, Feb 14, 2015 at 2:18 AM, Su She wrote:
>
>> Thanks Akhil for the suggestion, it is now only giving me one part -
>> . Is there anyway I can just create a file rather than a dire
27;s and example for doing
> that https://issues.apache.org/jira/browse/SPARK-944
>
> Thanks
> Best Regards
>
> On Sat, Feb 14, 2015 at 2:55 PM, Su She wrote:
>
>> Hello Akhil, thank you for your continued help!
>>
>> 1) So, if I can write it in programitically aft
http://stackoverflow.com/questions/23527941/how-to-write-to-csv-in-spark
Just read this...seems like it should be easily readable. Thanks!
On Sat, Feb 14, 2015 at 1:36 AM, Su She wrote:
> Thanks Akhil for the link. Is there a reason why there is a new directory
> created for each bat
til.copyMerge(FileSystem of source(hdfs), /output-location,
> FileSystem
> > of destination(hdfs), Path to the merged files /merged-ouput, true(to
> delete
> > the original dir),null)
> >
> >
> >
> > Thanks
> > Best Regards
> >
> > On Sat, Fe
ectory of these files.
>
> On Sat, Feb 14, 2015 at 9:05 PM, Su She wrote:
> > Thanks Sean and Akhil! I will take out the repartition(1). Please let me
> > know if I understood this correctly, Spark Streamingwrites data like
> this:
> >
> > foo-1001.csv/part -x
Hello Everyone,
I'm having some issues launching (non-spark) applications via the
spark-submit commands. The common error I am getting is c/p below. I am
able to submit a spark streaming/kafka spark application, but can't start a
dynamoDB java app. The common error is related to joda-time.
1) I r
specify these jars (joda-time-2.7.jar, joda-convert-1.7.jar)
> either as part of your build and assembly or via the --jars option to
> spark-submit.
>
> HTH.
>
> On Fri, Feb 27, 2015 at 2:48 PM, Su She wrote:
>
>> Hello Everyone,
>>
>> I'm having some is
Hello Everyone,
I am trying to run the Word Count from here:
https://github.com/holdenk/learning-spark-examples/blob/master/mini-complete-example/src/main/scala/com/oreilly/learningsparkexamples/mini/scala/WordCount.scala
I was able to successfully run the app using SBT, but not Maven. I don't
se
Hello,
So actually solved the problem...see point 3.
Here are a few approaches/errors I was getting:
1) mvn package exec:java -Dexec.mainClass=HelloWorld
Error: java.lang.ClassNotFoundException: HelloWorld
2)
http://stackoverflow.com/questions/26929100/running-a-scala-application-in-maven-pro
Hello Everyone,
I am trying to run this MLlib example from Learning Spark:
https://github.com/databricks/learning-spark/blob/master/src/main/scala/com/oreilly/learningsparkexamples/scala/MLlib.scala#L48
Things I'm doing differently:
1) Using spark shell instead of an application
2) instead of t
on Stage 1.
> See if its a GC time, then try increasing the level of parallelism or
> repartition it like sc.getDefaultParallelism*3.
>
> Thanks
> Best Regards
>
> On Thu, Mar 19, 2015 at 12:15 PM, Su She wrote:
>
>> Hello Everyone,
>>
>> I am trying to ru
> > Thanks
> > Best Regards
> >
> > On Thu, Mar 19, 2015 at 1:15 PM, Su She wrote:
> >>
> >> Hi Akhil,
> >>
> >> 1) How could I see how much time it is spending on stage 1? Or what if,
> >> like above, it doesn't get past stag
tly.
>
> On Mon, Mar 30, 2015 at 10:16 AM, Xiangrui Meng wrote:
>>
>> +Holden, Joseph
>>
>> It seems that there is something wrong with the sample data file:
>> https://github.com/databricks/learning-spark/blob/master/files/ham.txt
>>
>> -Xiangrui
>>
Hello Everyone,
I am trying to implement this example (Spark Streaming with Twitter).
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/TwitterPopularTags.scala
I am able to do:
hashTags.print() to get a live stream of filtered hashtags, but
I had asked this question before, but wanted to ask again as I think
it is related to my pom file or project setup.
I have been trying on/off for the past month to try to run this MLlib example:
-
To unsubscribe, e-mail: user-uns
Sorry, accidentally sent the last email before finishing.
I had asked this question before, but wanted to ask again as I think
it is now related to my pom file or project setup. Really appreciate the help!
I have been trying on/off for the past month to try to run this MLlib
example:
https://git
e/ec2-user/sparkApps/learning-spark/target/simple-project-1.1.jar
Thank you for the help!
Best,
Su
On Mon, Apr 27, 2015 at 9:58 AM, Xiangrui Meng wrote:
> How did you run the example app? Did you use spark-submit? -Xiangrui
>
> On Thu, Apr 23, 2015 at 2:27 PM, Su She wrote:
>>
!
On Mon, Apr 27, 2015 at 11:48 AM, Su She wrote:
> Hello Xiangrui,
>
> I am using this spark-submit command (as I do for all other jobs):
>
> /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/bin/spark-submit
> --class MLlib --master local[2] --jars $(echo
> /ho
I'm trying to set up my own cluster and am having trouble running this script:
./spark-ec2 --key-pair=xx --identity-file=xx.pem --region=us-west-2
--zone=us-west-2c --num-slaves=1 launch my-spark-cluster
based off: https://spark.apache.org/docs/latest/ec2-scripts.html
It just tries to open the s
Hmm, just tried to run it again, but opened the script with python,
the cmd line seemed to pop up really quick and exited.
On Wed, May 13, 2015 at 2:06 PM, Su She wrote:
> Hi Ted, Yes I do have Python 3.5 installed. I just ran "py" from the
> ec2 directory and it started up
Hello Everyone,
Thanks in advance for the help!
I successfully got my Kafka/Spark WordCount app to print locally. However,
I want to run it on a cluster, which means that I will have to save it to
HDFS if I want to be able to read the output.
I am running Spark 1.1.0, which means according to th
ext files on the DStream --looks like it? Look
> at the section called "Design Patterns for using foreachRDD" in the link
> you sent -- you want to do dstream.foreachRDD(rdd => rdd.saveAs)
>
> On Thu, Jan 8, 2015 at 5:20 PM, Su She wrote:
>
>> Hello Everyone,
; yourStream.saveAsNewAPIHadoopFiles(hdfsUrl, "/output-location",Text.class,
> Text.class, outputFormatClass);
>
>
>
> Thanks
> Best Regards
>
> On Fri, Jan 9, 2015 at 10:22 AM, Su She wrote:
>
>> Yes, I am calling the saveAsHadoopFiles on the Dstream. How
Hello Everyone,
Quick followup, is there any way I can append output to one file rather
then create a new directory/file every X milliseconds?
Thanks!
Suhas Shekar
University of California, Los Angeles
B.A. Economics, Specialization in Computing 2014
On Thu, Jan 8, 2015 at 11:41 PM, Su She
on the data to 1 before saving.
> Another way would be to use hadoop's copy merge command/api(available from
> 2.0 versions)
> On 13 Jan 2015 01:08, "Su She" wrote:
>
>> Hello Everyone,
>>
>> Quick followup, is there any way I can append output to one fi
Hello Everyone,
I am encountering trouble running Spark applications when I shut down my
EC2 instances. Everything else seems to work except Spark. When I try
running a simple Spark application, like sc.parallelize() I get the message
that hdfs name node is in safemode.
Has anyone else had this i
> > stop-all.sh would do) and then shutdown the machines.
> >
> > You can execute the following command to disable safe mode:
> >
> >> hadoop fs -safemode leave
> >
> >
> >
> > Thanks
> > Best Regards
> >
> > On Sat, Jan 17, 201
42 matches
Mail list logo