Re: building spark over proxy

2014-03-10 Thread Bharath Vissapragada
http://mail-archives.apache.org/mod_mbox/spark-user/201403.mbox/%3ccaaqhkj48japuzqc476es67c+rrfime87uprambdoofhcl0k...@mail.gmail.com%3E On Tue, Mar 11, 2014 at 11:44 AM, hades dark wrote: > Can someone help me on how to build spark over proxy settings .. > > -- > REGARDS > ASHUTOSH JAIN > IIT-

building spark over proxy

2014-03-10 Thread hades dark
Can someone help me on how to build spark over proxy settings .. -- REGARDS ASHUTOSH JAIN IIT-BHU VARANASI

pyspark broadcast error

2014-03-10 Thread Brad Miller
Hi All, When I run the program shown below, I receive the error shown below. I am running the current version of branch-0.9 from github. Note that I do not receive the error when I replace "2 ** 29" with "2 ** X", where X < 29. More interestingly, I do not receive the error when X = 30, and when

Block

2014-03-10 Thread David Thomas
What is the concept of Block and BlockManager in Spark? How is a Block related to a Partition of a RDD?

Re: Pig on Spark

2014-03-10 Thread Xiangrui Meng
Hi Sameer, Lin (cc'ed) could also give you some updates about Pig on Spark development on her side. Best, Xiangrui On Mon, Mar 10, 2014 at 12:52 PM, Sameer Tilak wrote: > Hi Mayur, > We are planning to upgrade our distribution MR1> MR2 (YARN) and the goal is > to get SPROK set up next month. I

Re: SPARK_JAVA_OPTS not picked up by the application

2014-03-10 Thread Linlin
Thanks! so SPARK_DAEMON_JAVA_OPTS is for worker? and SPARK_JAVA_OPTS is for master? I only set SPARK_JAVA_OPTS in spark-env.sh, and the JVM opt is applied to both master/worker daemon. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SPARK-JAVA-OPTS-not

Re: SPARK_JAVA_OPTS not picked up by the application

2014-03-10 Thread Linlin
Thanks! since my worker is on the same node, -Xss JVM option is for setting thread maximum stack size, my worker does show this option now. now I realized I accidently run the the app run in local mode as I didn't give the master URL when initializing the spark context, for local mode, how to p

Using s3 instead of broadcast

2014-03-10 Thread Aureliano Buendia
Hi, My spark app has to broadcast 5 GB of RDD to about 100 workers at the beginning of each job. Obviously, this takes some time, and this time linearly increases as the number of workers increases. Does it make sense instead of broadcasting the 5 GB RDD, to ask each worker to download it from s3

Re: SPARK_JAVA_OPTS not picked up by the application

2014-03-10 Thread Aaron Davidson
It's interesting that the setting was applied to the master/worker processes, as those have been using a different environment variable called SPARK_DAEMON_JAVA_OPTS since around spark 0.8.0. Is it being set in the driver? On Mon, Mar 10, 2014 at 9:15 PM, Linlin wrote: > my cluster only has 1 n

Re: how to use the log4j for the standalone app

2014-03-10 Thread lihu
Thanks, but I do not to log myself program info, I just do not want spark output all the info to my console, I want the spark output the log into to some file which I specified. On Tue, Mar 11, 2014 at 11:49 AM, Robin Cjc wrote: > Hi lihu, > > you can extends the org.apache.spark.logging class

Re: SPARK_JAVA_OPTS not picked up by the application

2014-03-10 Thread Linlin
my cluster only has 1 node (master/worker). -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SPARK-JAVA-OPTS-not-picked-up-by-the-application-tp2483p2506.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: SPARK_JAVA_OPTS not picked up by the application

2014-03-10 Thread Robin Cjc
The properties in spark-env.sh are machine-specific. so need to specify in you worker as well. I guess you ask is the System.setproperty(). you can call it before you initialize your sparkcontext. Best Regards, Chen Jingci On Tue, Mar 11, 2014 at 6:47 AM, Linlin wrote: > > Hi, > > I have a jav

Re: how to use the log4j for the standalone app

2014-03-10 Thread Robin Cjc
Hi lihu, you can extends the org.apache.spark.logging class. Then use the function like logInfo(). Then will log according to the config in your log4j.properties. Best Regards, Chen Jingci On Tue, Mar 11, 2014 at 11:36 AM, lihu wrote: > Hi, >I use the spark0.9, and when i run the spark-sh

how to use the log4j for the standalone app

2014-03-10 Thread lihu
Hi, I use the spark0.9, and when i run the spark-shell, I can log property according the log4j.properties in the SPARK_HOME/conf directory.But when I use the standalone app, I do not know how to log it. I use the SparkConf to set it, such as: *val conf = new SparkConf()* * conf.set("*log4

Re: is spark 0.9.0 HA?

2014-03-10 Thread qingyang li
ok , thanks. 2014-03-11 10:51 GMT+08:00 Aaron Davidson : > Spark 0.9.0 does include standalone scheduler HA, but it requires running > multiple masters. The docs are located here: > https://spark.apache.org/docs/0.9.0/spark-standalone.html#high-availability > > 0.9.0 also includes driver HA (for

Re: is spark 0.9.0 HA?

2014-03-10 Thread Aaron Davidson
Spark 0.9.0 does include standalone scheduler HA, but it requires running multiple masters. The docs are located here: https://spark.apache.org/docs/0.9.0/spark-standalone.html#high-availability 0.9.0 also includes driver HA (for long-running normal or streaming jobs), allowing you to submit a dri

Re: SPARK_JAVA_OPTS not picked up by the application

2014-03-10 Thread hequn cheng
have your send spark-env.sh to the slave nodes ? 2014-03-11 6:47 GMT+08:00 Linlin : > > Hi, > > I have a java option (-Xss) setting specified in SPARK_JAVA_OPTS in > spark-env.sh, noticed after stop/restart the spark cluster, the > master/worker daemon has the setting being applied, but this se

Re: sequenceFile and groupByKey

2014-03-10 Thread Yishu Lin
Need this to solve the problem: import org.apache.spark.SparkContext._ Yishu On Mar 10, 2014, at 2:46 PM, Yishu Lin wrote: > I have the same question and tried with 1, but get compilation error: > > [error] …. could not find implicit value for parameter kcf: () => > org.apache.spark.Writable

Re: Sharing SparkContext

2014-03-10 Thread abhinav chowdary
hdfs 1.0.4 but we primarily use Cassandra + Spark (calliope). I tested it with both Are you using it with HDFS? What version of Hadoop? 1.0.4? Ognen On 3/10/14, 8:49 PM, abhinav chowdary wrote: for any one who is interested to know about job server from Ooyala.. we started using it recently and

is spark 0.9.0 HA?

2014-03-10 Thread qingyang li
is spark 0.9.0 HA? we only have one master server , i think is is not . so, Does anyone know how to support HA for spark?

Re: Sharing SparkContext

2014-03-10 Thread Ognen Duzlevski
Are you using it with HDFS? What version of Hadoop? 1.0.4? Ognen On 3/10/14, 8:49 PM, abhinav chowdary wrote: for any one who is interested to know about job server from Ooyala.. we started using it recently and been working great so far.. On Feb 25, 2014 9:23 PM, "Ognen Duzlevski"

Re: Sharing SparkContext

2014-03-10 Thread abhinav chowdary
0.8.1 we used branch 0.8 and pull request into our local repo. I remember we have to deal with few issues but once we are thought that its working great. On Mar 10, 2014 6:51 PM, "Mayur Rustagi" wrote: > Which version of Spark are you using? > > > Mayur Rustagi > Ph: +1 (760) 203 3257 > http://

Re: Sharing SparkContext

2014-03-10 Thread Mayur Rustagi
Which version of Spark are you using? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi On Mon, Mar 10, 2014 at 6:49 PM, abhinav chowdary < abhinav.chowd...@gmail.com> wrote: > for any one who is interested to know about jo

Re: Sharing SparkContext

2014-03-10 Thread abhinav chowdary
for any one who is interested to know about job server from Ooyala.. we started using it recently and been working great so far.. On Feb 25, 2014 9:23 PM, "Ognen Duzlevski" wrote: > In that case, I must have misunderstood the following (from > http://spark.incubator.apache.org/docs/0.8.1/job-sch

Re: [BLOG] Spark on Cassandra w/ Calliope

2014-03-10 Thread abhinav chowdary
+1 that we have been using calliope for few months and its working out really great for us. Any plans on integrating into spark? On Mar 10, 2014 1:58 PM, "Rohit Rai" wrote: > We are happy that you found Calliope useful and glad we could help. > > *Founder & CEO, **Tuplejump, Inc.* > _

Re: Custom RDD

2014-03-10 Thread Prashant Sharma
Hi David, There are many implementations of RDD available in org.apache.spark. All you have to do is implement RDD class. Ofcourse this is not possible from java AFAIK. Prashant Sharma On Tue, Mar 11, 2014 at 1:00 AM, David Thomas wrote: > Is there any guide available on creating a custom RDD

Re: How to create RDD from Java in-memory data?

2014-03-10 Thread wallacemann
I was right ... I was missing something obvious. The answer to my question is to use JavaSparkContext.parallelize which works with List or List>. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-RDD-from-Java-in-memory-data-tp2486p2487.html Sen

if there is shark 0.9 build can be download?

2014-03-10 Thread qingyang li
Does anyone know if there is shark 0.9 build can be download? if not, when there will be shark 0.9 build?

Unsubscribe

2014-03-10 Thread Aditya Devarakonda
Unsubscribe

Unsubscribe

2014-03-10 Thread Shalini Singh
Unsubscribe

How to create RDD from Java in-memory data?

2014-03-10 Thread wallacemann
I would like to construct an RDD from data I already have in memory as POJO objects. Is this possible? For example, is it possible to create an RDD from Iterable? I'm running Spark from Java as a stand-alone application. The JavaWordCount example runs fine. In the example, the initial RDD is p

Re: "Too many open files" exception on reduceByKey

2014-03-10 Thread Patrick Wendell
Hey Matt, The best way is definitely just to increase the ulimit if possible, this is sort of an assumption we make in Spark that clusters will be able to move it around. You might be able to hack around this by decreasing the number of reducers but this could have some performance implications f

Re: computation slows down 10x because of cached RDDs

2014-03-10 Thread Koert Kuipers
hey matei, it happens repeatedly. we are currently runnning on java 6 with spark 0.9. i will add -XX:+PrintGCDetails and collect details, and also look into java 7 G1. thanks On Mon, Mar 10, 2014 at 6:27 PM, Matei Zaharia wrote: > Does this happen repeatedly if you keep running the computa

SPARK_JAVA_OPTS not picked up by the application

2014-03-10 Thread Linlin
Hi, I have a java option (-Xss) setting specified in SPARK_JAVA_OPTS in spark-env.sh, noticed after stop/restart the spark cluster, the master/worker daemon has the setting being applied, but this setting is not being propagated to the executor, my application continue behave the same. I am not

Re: computation slows down 10x because of cached RDDs

2014-03-10 Thread Matei Zaharia
Does this happen repeatedly if you keep running the computation, or just the first time? It may take time to move these Java objects to the old generation the first time you run queries, which could lead to a GC pause that also slows down the small queries. If you can run with -XX:+PrintGCDetai

computation slows down 10x because of cached RDDs

2014-03-10 Thread Koert Kuipers
hello all, i am observing a strange result. i have a computation that i run on a cached RDD in spark-standalone. it typically takes about 4 seconds. but when other RDDs that are not relevant to the computation at hand are cached in memory (in same spark context), the computation takes 40 seconds o

Fwd: test

2014-03-10 Thread Yishu Lin
-- Forwarded message -- From: Yishu Lin Date: Mon, Mar 10, 2014 at 2:47 PM Subject: test To: user@spark.apache.org please ignore if you can see it ...

test

2014-03-10 Thread Yishu Lin
please ignore if you can see it …

Re: sequenceFile and groupByKey

2014-03-10 Thread Yishu Lin
I have the same question and tried with 1, but get compilation error: [error] …. could not find implicit value for parameter kcf: () => org.apache.spark.WritableConverter[String] [error] val t2 = sc.sequenceFile[String, Int](“/test/data", 20) Yishu On Mar 9, 2014, at 12:21 AM, Shixiong Zhu

Re: [External] Re: no stdout output from worker

2014-03-10 Thread Patrick Wendell
Hey Sen, Suarav is right, and I think all of your print statements are inside of the driver program rather than inside of a closure. How are you running your program (i.e. what do you run that starts this job)? Where you run the driver you should expect to see the output. - Patrick On Mon, Mar

Java example of using broadcast

2014-03-10 Thread Sen, Ranjan [USA]
Hi Patrick Yes I get it. I have a different question now - (changed the sub) Can anyone point me to a Java example of using broadcast variables? - Ranjan From: Patrick Wendell mailto:pwend...@gmail.com>> Reply-To: "user@spark.apache.org" mailto:user@spark.apache.

Re: [BLOG] Spark on Cassandra w/ Calliope

2014-03-10 Thread Rohit Rai
We are happy that you found Calliope useful and glad we could help. *Founder & CEO, **Tuplejump, Inc.* www.tuplejump.com *The Data Engineering Platform* On Sat, Mar 8, 2014 at 2:18 AM, Brian O'Neill wrote: > > FWIW - I posted some notes to help people get started q

RE: Pig on Spark

2014-03-10 Thread Sameer Tilak
Hi Mayur,We are planning to upgrade our distribution MR1> MR2 (YARN) and the goal is to get SPROK set up next month. I will keep you posted. Can you please keep me informed about your progress as well. From: mayur.rust...@gmail.com Date: Mon, 10 Mar 2014 11:47:56 -0700 Subject: Re: Pig on Spark T

Re: Custom RDD

2014-03-10 Thread Mayur Rustagi
copy paste? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi On Mon, Mar 10, 2014 at 12:30 PM, David Thomas wrote: > Is there any guide available on creating a custom RDD? >

Custom RDD

2014-03-10 Thread David Thomas
Is there any guide available on creating a custom RDD?

Re: Room for rent in Aptos

2014-03-10 Thread Ognen Duzlevski
Probably unintentional :) Ognen P.S. I have a house for rent avail nah, just kidding! :) On 3/10/14, 1:54 PM, Muttineni, Vinay wrote: Why's this here? *From:*vaquar khan [mailto:vaquar.k...@gmail.com] *Sent:* Monday, March 10, 2014 11:43 AM *To:* user@spark.apache.org *Subject:* Re: Room

RE: Room for rent in Aptos

2014-03-10 Thread Muttineni, Vinay
Why's this here? From: vaquar khan [mailto:vaquar.k...@gmail.com] Sent: Monday, March 10, 2014 11:43 AM To: user@spark.apache.org Subject: Re: Room for rent in Aptos :) good one On 10 Mar 2014 23:21, "arjun biswas" mailto:arjunbiswas@gmail.com>> wrote: Hello , My name is Arjun and i am 30

Re: Pig on Spark

2014-03-10 Thread Mayur Rustagi
Hi Sameer, Did you make any progress on this. My team is also trying it out would love to know some detail so progress. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi On Thu, Mar 6, 2014 at 2:20 PM, Sameer Tilak wrote: >

Re: Room for rent in Aptos

2014-03-10 Thread vaquar khan
:) good one On 10 Mar 2014 23:21, "arjun biswas" wrote: > Hello , > > My name is Arjun and i am 30 years old and I was inquiring about the room > ad that you have put up on craigslist in Aptos. I am very much interested > in the room and can move in pretty early . My annual income is around 105K

Re-distribute cache on new slave nodes for better performance

2014-03-10 Thread Praveen Rachabattuni
I have observed a query responding faster when dataset A is cached on 2 slave nodes rather than on 1 slave node. I wanted to add more slave nodes and check the performance but I can only use the new node when data is re-cached. Is there any way the cached dataset can be re-distributed(lesser time

Re: Sbt Permgen

2014-03-10 Thread Koert Kuipers
hey sandy, i think that pulreq is not relevant to the 0.9 branch i am using switching to java 7 for sbt/sbt test made it work. not sure why... On Sun, Mar 9, 2014 at 11:44 PM, Sandy Ryza wrote: > There was an issue related to this fixed recently: > https://github.com/apache/spark/pull/103 > >

Room for rent in Aptos

2014-03-10 Thread arjun biswas
Hello , My name is Arjun and i am 30 years old and I was inquiring about the room ad that you have put up on craigslist in Aptos. I am very much interested in the room and can move in pretty early . My annual income is around 105K and I am a software engineer working in the silicon valley for abou

Unsubscribe

2014-03-10 Thread arjun biswas

"Too many open files" exception on reduceByKey

2014-03-10 Thread Matthew Cheah
Hi everyone, My team (cc'ed in this e-mail) and I are running a Spark reduceByKey operation on a cluster of 10 slaves where I don't have the privileges to set "ulimit -n" to a higher number. I'm running on a cluster where "ulimit -n" returns 1024 on each machine. When I attempt to run this job wi

Re: what is shark's mailiing list?

2014-03-10 Thread Mayur Rustagi
This is Shark mailing list, thr is also one for Shark Issues. I havnt found Shark dev either :) Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi On Mon, Mar 10, 2014 at 12:06 AM, qingyang li wrote: > Does anyone know what i

Log Analyze

2014-03-10 Thread Eduardo Costa Alfaia
Hi Guys, Could anyone help me to understand this piece of log in red? Why is this happened? Thanks 14/03/10 16:55:20 INFO SparkContext: Starting job: first at NetworkWordCount.scala:87 14/03/10 16:55:20 INFO JobScheduler: Finished job streaming job 1394466892000 ms.0 from job set of time 139

Re: [External] Re: no stdout output from worker

2014-03-10 Thread Sen, Ranjan [USA]
Hi Sourav That makes so much sense. Thanks much. Ranjan From: Sourav Chandra mailto:sourav.chan...@livestream.com>> Reply-To: "user@spark.apache.org" mailto:user@spark.apache.org>> Date: Sunday, March 9, 2014 at 10:37 PM To: "user@spark.apache.org

Re: Explain About Logs NetworkWordcount.scala

2014-03-10 Thread eduardocalfaia
Hi TD, Today I have seen these differences between the logs: Result different from zero: 14/03/10 10:55:27 INFO BlockManagerMasterActor$BlockManagerInfo: Removed input-0-1394445287000 on computer1.ant-net:51441 in memory (size: 10.1 MB, free: 1447.3 MB) 14/03/10 10:55:27 INFO BlockManagerMasterA

Using flume to create stream for spark streaming.

2014-03-10 Thread Ravi Hemnani
Hey, I am using the following flume flow, Flume agent 1 consisting of Rabbitmq-> source, files-> channet, avro-> sink sending data to a slave node of spark cluster. Flume agent 2, slave node of spark cluster, consisting of avro-> source, files-> channel, now for the sink i tried avro, hdfs, file

Unsubscribe

2014-03-10 Thread Aditya Devarakonda

Re: subscribe

2014-03-10 Thread He-chien Tsai
send this to 'user-request', not 'user' 2014-03-10 17:32 GMT+08:00 hequn cheng : > hi >

subscribe

2014-03-10 Thread hequn cheng
hi

subscribe

2014-03-10 Thread hequn cheng
hi

Re: Explain About Logs NetworkWordcount.scala

2014-03-10 Thread eduardocalfaia
Hi TD, I have attached a source code from the application that I use to send the words to workers. BR Em 3/8/14, 4:21, Tathagata Das [via Apache Spark User List] escreveu: > I am not sure how to debug this without any more information about the > source. Can you monitor on the receiver side that

Re: Streaming JSON string from REST Api in Spring

2014-03-10 Thread sonyjv
Thanks Mayur for your clarification. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-JSON-string-from-REST-Api-in-Spring-tp2358p2451.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

what is shark's mailiing list?

2014-03-10 Thread qingyang li
Does anyone know what is shark's mailiing list? I have tried shark-uesr@googlegroups, but it is not. It is also very slow to open groups.google.com/forum/#!forum/shark-users. is there any other way to communicate with shark developers?