WebUI's Application count doesn't get updated

2014-06-03 Thread MrAsanjar .
- HI all, - Application running and completed count does not get updated, it is always zero. I have ran - SparkPi application at least 10 times. please help - - *Workers:* 3 - *Cores:* 24 Total, 0 Used - *Memory:* 43.7 GB Total, 0.0 B Used - *Applications:* 0 Running, 0 C

Need equallyWeightedPartitioner Algorithm

2014-06-03 Thread Joe L
I need to partition my data into the same weighted partitions, suppose I have 20GB data and I want 4 partitions where each partition has 5GB of the data. Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Need-equallyWeightedPartitioner-Algorithm-tp6788

Re: WebUI's Application count doesn't get updated

2014-06-03 Thread Andrew Ash
Your applications are probably not connecting to your existing cluster and instead running in local mode. Are you passing the master URL to the SparkPi application? Andrew On Tue, Jun 3, 2014 at 12:30 AM, MrAsanjar . wrote: > >- HI all, >- Application running and completed count does

Upgradation to Spark 1.0.0

2014-06-03 Thread MEETHU MATHEW
Hi , I am currently using SPARK 0.9 configured with the Hadoop 1.2.1 cluster.What should I do if I want to upgrade it to spark 1.0.0?Do I need to download the latest version and replace the existing spark with new one and make the configuration changes again from the scratch or is there any oth

Re: WebUI's Application count doesn't get updated

2014-06-03 Thread MrAsanjar .
Thanks for your reply Andrew. I am running applications directly on the master node. My cluster also contain three worker nodes, all are visible on WebUI. Spark Master at spark://sanjar-local-machine-1:7077 - *URL:* spark://sanjar-local-machine-1:7077 - *Workers:* 3 - *Cores:* 24 Total,

Re: WebUI's Application count doesn't get updated

2014-06-03 Thread Akhil Das
​As Andrew said, your application is running on Standalone mode. You need to pass MASTER=spark://sanjar-local-machine-1:7077 before running your sparkPi example. Thanks Best Regards On Tue, Jun 3, 2014 at 1:12 PM, MrAsanjar . wrote: > Thanks for your reply Andrew. I am running applications

Re: WebUI's Application count doesn't get updated

2014-06-03 Thread MrAsanjar .
thanks guys, that fixed my problem. As you might have noticed, I am VERY new to spark. Building a spark cluster using LXC has been a challenge. On Tue, Jun 3, 2014 at 2:49 AM, Akhil Das wrote: > ​As Andrew said, your application is running on Standalone mode. You need > to pass > > MASTER=spark

Re: How can I make Spark 1.0 saveAsTextFile to overwrite existing file

2014-06-03 Thread Sean Owen
Ah, the output directory check was just not executed in the past. I thought it deleted the files. A third way indeed. FWIW I also think (B) is best. (A) and (C) both have their risks, but if they're non-default and everyone's willing to entertain a new arg to the API method, sure. (A) seems more s

Re: Having spark-ec2 join new slaves to existing cluster

2014-06-03 Thread sirisha_devineni
Nick, Did you open a JIRA ticket for this feature to be implemented in spark-ec2? If so can you please point me to the ticket? I would also like to do the autoscaling of spark nodes(add/remove slave nodes). So curious to know how did you acheive this. Sirisha Nick Chammas wrote > Sweet, thanks f

Reg: Add/Remove slave nodes spark-ec2

2014-06-03 Thread Sirisha Devineni
Hi All, I have created a spark cluster on EC2 using spark-ec2 script. Whenever more data is there to be processed I would like to add new slaves to existing cluster and would like to remove slave node when the data to be processed is low. It seems currently spark-ec2 doesn't have option to add

Spark block manager registration extreme slow

2014-06-03 Thread Denes
Hi, My Spark installations (both 0.9.1 and 1.0.0) starts up extremely slow when starting a simple Spark Streaming job. I have to wait 6 (!) minutes at INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager stage and another 4 (!) minutes at INFO util.MetadataCleaner: R

Kyro deserialisation error

2014-06-03 Thread Denes
I tried to use Kryo as a serialiser isn spark streaming, did everything according to the guide posted on the spark website, i.e. added the following lines: conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer"); conf.set("spark.kryo.registrator", "MyKryoRegistrator"); I also a

Re: Yay for 1.0.0! EC2 Still has problems.

2014-06-03 Thread Jeremy Lee
Thanks for that, Matei! I'll look at that once I get a spare moment. :-) If you like, I'll keep documenting my newbie problems and frustrations... perhaps it might make things easier for others. Another issue I seem to have found (now that I can get small clusters up): some of the examples (the s

Re: Is uberjar a recommended way of running Spark/Scala applications?

2014-06-03 Thread Pierre Borckmans
You might want to look at another great plugin : “sbt-pack” https://github.com/xerial/sbt-pack. It collects all the dependencies JARs and creates launch scripts for *nix (including Mac OS) and windows. HTH Pierre On 02 Jun 2014, at 17:29, Andrei wrote: > Thanks! This is even closer to what

Error related to serialisation in spark streaming

2014-06-03 Thread nilmish
I am using the following code segment : countPerWindow.foreachRDD(new Function, Void>() { @Override public Void call(JavaPairRDD rdd) throws Exception { Comparator> comp = new Comparator >() { public int compare(Tuple2

Reconnect to an application/RDD

2014-06-03 Thread Oleg Proudnikov
HI All, Is it possible to run a standalone app that would compute and persist/cache an RDD and then run other standalone apps that would gain access to that RDD? -- Thank you, Oleg

Re: Error related to serialisation in spark streaming

2014-06-03 Thread Sean Owen
Sorry if I'm dense but is OptimisingSort your class? it's saying you have included something from it in a function that is shipped off to remote workers but something in it is not java.io.Serializable. OptimisingSort$6$1 needs to be Serializable. On Tue, Jun 3, 2014 at 2:23 PM, nilmish wrote: > I

Prepare spark executor

2014-06-03 Thread yaoxin
Hi Is there any way to prepare spark executor? Like what we do in MapReduce, we implements a setup and a clearup method. For my case, I need this prepare method to init StaticParser base on the env(dev, production). Then, I can directly use this StaticParser on executor. like this object Sta

Re: Reconnect to an application/RDD

2014-06-03 Thread Gerard Maas
I don't think that's supported by default as when the standalone context will close, the related RDDs will be GC'ed You should explore Spark-Job Server, which allows to cache RDDs by name and reuse them within a context. https://github.com/ooyala/spark-jobserver -kr, Gerard. On Tue, Jun 3, 2

Spark not working with mesos

2014-06-03 Thread praveshjain1991
I set up Spark-0.9.1 to run on mesos-0.13.0 using the steps mentioned here . The Mesos UI is showing two workers registered. I want to run these commands on Spark-shell > scala> val data = 1 to 1 data: > scala.collection.immutable.R

spark 1.0 not using properties file from SPARK_CONF_DIR

2014-06-03 Thread Eugen Cepoi
Is it on purpose that when setting SPARK_CONF_DIR spark submit still loads the properties file from SPARK_HOME/conf/spark-defauls.conf ? IMO it would be more natural to override what is defined in SPARK_HOME/conf by SPARK_CONF_DIR when defined (and SPARK_CONF_DIR being overriden by command line ar

Re: Spark not working with mesos

2014-06-03 Thread Akhil Das
1. Make sure your spark-*.tgz that you created by make_distribution.sh is accessible by all the slaves nodes. 2. Check the worker node logs. Thanks Best Regards On Tue, Jun 3, 2014 at 8:13 PM, praveshjain1991 wrote: > I set up Spark-0.9.1 to run on mesos-0.13.0 using the steps mentioned he

---cores option in spark-shell

2014-06-03 Thread Marek Wiewiorka
Hi All, there is information in 1.0.0 Spark's documentation that there is an option "--cores" that one can use to set the number of cores that spark-shell uses on the cluster: You can also pass an option --cores to control the number of cores that spark-shell uses on the cluster. This option doe

Re: ---cores option in spark-shell

2014-06-03 Thread Matt Kielo
I havent been able to set the cores with that option in Spark 1.0.0 either. To work around that, setting the environment variable: SPARK_JAVA_OPTS="-Dspark.cores.max=" seems to do the trick. Matt Kielo Data Scientist Oculus Info Inc. On Tue, Jun 3, 2014 at 11:15 AM, Marek Wiewiorka wrote: > Hi

Re: ---cores option in spark-shell

2014-06-03 Thread Mikhail Strebkov
Try -c instead, works for me, e.g. bin/spark-shell -c 88 On Tue, Jun 3, 2014 at 8:15 AM, Marek Wiewiorka wrote: > Hi All, > there is information in 1.0.0 Spark's documentation that > there is an option "--cores" that one can use to set the number of cores > that spark-shell uses on the clust

Re: ---cores option in spark-shell

2014-06-03 Thread Marek Wiewiorka
That used to work with version 0.9.1 and earlier and does not seem to work with 1.0.0. M. 2014-06-03 17:53 GMT+02:00 Mikhail Strebkov : > Try -c instead, works for me, e.g. > > bin/spark-shell -c 88 > > > > On Tue, Jun 3, 2014 at 8:15 AM, Marek Wiewiorka > wrote: > >> Hi All, >> there is inf

Re: Re: how to construct a ClassTag object as a method parameter in Java

2014-06-03 Thread Michael Armbrust
Ah, this is a bug that was fixed in 1.0. I think you should be able to workaround it by using a fake class tag: scala.reflect.ClassTag$.MODULE$.AnyRef() On Mon, Jun 2, 2014 at 8:22 PM, bluejoe2008 wrote: > spark 0.9.1 > textInput is a JavaRDD object > i am programming in Java > > 2014-06-03 >

Re: Using MLLib in Scala

2014-06-03 Thread Xiangrui Meng
Hi Suela, (Please subscribe our user mailing list and send your questions there in the future.) For your case, each file contains a column of numbers. So you can use `sc.textFile` to read them first, zip them together, and then create labeled points: val xx = sc.textFile("/path/to/ex2x.dat").map(

Spark 1.0.0 fails if mesos.coarse set to true

2014-06-03 Thread Marek Wiewiorka
Hi All, I'm trying to run my code that used to work with mesos-0.14 and spark-0.9.0 with mesos-0.18.2 and spark-1.0.0. and I'm getting a weird error when I use coarse mode (see below). If I use the fine-grained mode everything is ok. Has anybody of you experienced a similar error? more stderr

Re: wholeTextFiles() : java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected

2014-06-03 Thread Sean Owen
"Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected" is the classic error meaning "you compiled against Hadoop 1, but are running against Hadoop 2" I think you need to override the hadoop-client artifact that Spark depends on to be a Hadoop 2.x version. On Tue,

wholeTextFiles() : java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected

2014-06-03 Thread toivoa
Hi Set up project under Eclipse using Maven: org.apache.spark spark-core_2.10 1.0.0 Simple example fails: def main(args: Array[String]): Unit = { val conf = new SparkConf() .setMaster("local")

Re: wholeTextFiles() : java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected

2014-06-03 Thread toivoa
Wow! What a quick reply! adding org.apache.hadoop hadoop-client 2.4.0 solved the problem. But now I get 14/06/03 19:52:50 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could

mounting SSD devices of EC2 r3.8xlarge instances

2014-06-03 Thread Andras Barjak
Hi, I have noticed that upon launching a cluster consisting of r3.8xlarge high-memory instances the standard /mnt /mnt2 /mnt3 /mnt4 temporary directories get created and set up for temp usage, however they will point to the root 8Gb filesystem. The 2x320GB SSD-s are not mounted and also they are no

Re: Failed to remove RDD error

2014-06-03 Thread Michael Chang
Thanks Tathagata, Thanks for all your hard work! In the future, is it possible to mark "experimental" features as such on the online documentation? Thanks, Michael On Mon, Jun 2, 2014 at 6:12 PM, Tathagata Das wrote: > Spark.streaming.unpersist was an experimental feature introduced with > S

Re: NoSuchElementException: key not found

2014-06-03 Thread Michael Chang
I only had the warning level logs, unfortunately. There were no other references of 32855 (except a repeated stack trace, I believe). I'm using Spark 0.9.1 On Mon, Jun 2, 2014 at 5:50 PM, Tathagata Das wrote: > Do you have the info level logs of the application? Can you grep the value > "3285

Re: mounting SSD devices of EC2 r3.8xlarge instances

2014-06-03 Thread Matei Zaharia
Those instance types are not yet supported by the scripts, but https://issues.apache.org/jira/browse/SPARK-1790 is tracking this issue and it will soon be fixed in both branch-0.9 and 1.0. The problem is that those drives are not formatted on r3 machines, whereas they are on the other instance t

Re: How to create RDDs from another RDD?

2014-06-03 Thread Gerard Maas
Hi Andrew, Thanks for your answer. The reason of the question: I've been trying to contribute to the community by helping answering Spark-related questions on Stack Overflow. (note on that: Given the growing volume on the user list lately, I think it will need to scale out to other venues, so he

Re: spark 1.0 not using properties file from SPARK_CONF_DIR

2014-06-03 Thread Patrick Wendell
You can set an arbitrary properties file by adding --properties-file argument to spark-submit. It would be nice to have spark-submit also look in SPARK_CONF_DIR as well by default. If you opened a JIRA for that I'm sure someone would pick it up. On Tue, Jun 3, 2014 at 7:47 AM, Eugen Cepoi wrote:

Re: wholeTextFiles() : java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected

2014-06-03 Thread Sean Owen
I'd try the internet / SO first -- these are actually generic Hadoop-related issues. Here I think you don't have HADOOP_HOME or similar set. http://stackoverflow.com/questions/19620642/failed-to-locate-the-winutils-binary-in-the-hadoop-binary-path On Tue, Jun 3, 2014 at 5:54 PM, toivoa wrote: >

Problems with connecting Spark to Hive

2014-06-03 Thread Lars Selsaas
Hi, I've installed Spark 1.0.0 on a HDP 2.1 I moved the hive-site.xml file into the conf directory for Spark in an attempt to connect Spark with my existing Hive. Below is the full log from me starting Spark till I get the error. It seems to be building the assembly with hive so that part should

Re: Failed to remove RDD error

2014-06-03 Thread Tathagata Das
It was not intended to be experimental as this improves general performance. We tested the feature since 0.9, and didnt see any problems. We need to investigate the cause of this. Can you give us the logs showing this error so that we can analyze this. TD On Tue, Jun 3, 2014 at 10:08 AM, Michael

Re: wholeTextFiles() : java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.TaskAttemptContext, but interface was expected

2014-06-03 Thread Matei Zaharia
Yeah unfortunately Hadoop 2 requires these binaries on Windows. Hadoop 1 runs just fine without them. Matei On Jun 3, 2014, at 10:33 AM, Sean Owen wrote: > I'd try the internet / SO first -- these are actually generic > Hadoop-related issues. Here I think you don't have HADOOP_HOME or > simila

Strange problem with saveAsTextFile after upgrade Spark 0.9.0->1.0.0

2014-06-03 Thread Marek Wiewiorka
Hi All, I've been experiencing a very strange error after upgrade from Spark 0.9 to 1.0 - it seems that saveAsTestFile function is throwing java.lang.UnsupportedOperationException that I have never seen before. Any hints appreciated. scheduler.TaskSetManager: Loss was due to java.lang.ClassNotFoun

Re: Strange problem with saveAsTextFile after upgrade Spark 0.9.0->1.0.0

2014-06-03 Thread Gerard Maas
Have you tried re-compiling your job against the 1.0 release? On Tue, Jun 3, 2014 at 8:46 PM, Marek Wiewiorka wrote: > Hi All, > I've been experiencing a very strange error after upgrade from Spark 0.9 > to 1.0 - it seems that saveAsTestFile function is throwing > java.lang.UnsupportedOperation

Re: NoSuchElementException: key not found

2014-06-03 Thread Tathagata Das
I think I know what is going on! This probably a race condition in the DAGScheduler. I have added a JIRA for this. The fix is not trivial though. https://issues.apache.org/jira/browse/SPARK-2002 A "not-so-good" workaround for now would be not use coalesced RDD, which is avoids the race condition.

Re: Problems with connecting Spark to Hive

2014-06-03 Thread Yin Huai
Hello Lars, Can you check the value of "hive.security.authenticator.manager" in hive-site.xml? I guess the value is "org.apache.hadoop.hive.ql.security.ProxyUserAuthenticator". This class was introduced in hive 0.13, but Spark SQL is based on hive 0.12 right now. Can you change the value of "hive.

SchemaRDD's saveAsParquetFile() throws java.lang.IncompatibleClassChangeError

2014-06-03 Thread k.tham
I'm trying to save an RDD as a parquet file through the saveAsParquestFile() api, With code that looks something like: val sc = ... val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext._ val someRDD: RDD[SomeCaseClass] = ... someRDD.saveAsParquetFile("someRDD.parquet") How

Re: SchemaRDD's saveAsParquetFile() throws java.lang.IncompatibleClassChangeError

2014-06-03 Thread Michael Armbrust
This thread seems to be about the same issue: https://www.mail-archive.com/user@spark.apache.org/msg04403.html On Tue, Jun 3, 2014 at 12:25 PM, k.tham wrote: > I'm trying to save an RDD as a parquet file through the > saveAsParquestFile() > api, > > With code that looks something like: > > val

Re: How to create RDDs from another RDD?

2014-06-03 Thread Andrew Ash
Hmm that sounds like it could be done in a custom OutputFormat, but I'm not familiar enough with custom OutputFormats to say that's the right thing to do. On Tue, Jun 3, 2014 at 10:23 AM, Gerard Maas wrote: > Hi Andrew, > > Thanks for your answer. > > The reason of the question: I've been tryin

Re: SchemaRDD's saveAsParquetFile() throws java.lang.IncompatibleClassChangeError

2014-06-03 Thread k.tham
Oh, I missed that thread. Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-s-saveAsParquetFile-throws-java-lang-IncompatibleClassChangeError-tp6837p6839.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Error related to serialisation in spark streaming

2014-06-03 Thread Mayur Rustagi
So are you using Java 7 or 8. 7 doesnt clean closures properly. So you need to define a static class as a function & then call that in your operations. Else it'll try to send the whole class along with the function. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi

Re: Problems with connecting Spark to Hive

2014-06-03 Thread Lars Selsaas
Thanks a lot, That worked great! Thanks, Lars On Tue, Jun 3, 2014 at 12:17 PM, Yin Huai wrote: > Hello Lars, > > Can you check the value of "hive.security.authenticator.manager" in > hive-site.xml? I guess the value is > "org.apache.hadoop.hive.ql.security.ProxyUserAuthenticator". This class

Re: Reg: Add/Remove slave nodes spark-ec2

2014-06-03 Thread Mayur Rustagi
You'll have to restart the cluster.. create copy of your existing slave.. add it to slave files in master & restart the cluster Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi On Tue, Jun 3, 2014 at 4:30 PM, Sirisha Devineni

Re: WebUI's Application count doesn't get updated

2014-06-03 Thread Mayur Rustagi
Did you use docker or plain lxc specifically? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi On Tue, Jun 3, 2014 at 1:40 PM, MrAsanjar . wrote: > thanks guys, that fixed my problem. As you might have noticed, I am VERY >

Re: Strange problem with saveAsTextFile after upgrade Spark 0.9.0->1.0.0

2014-06-03 Thread Marek Wiewiorka
yes, I have - I compiled both Spark and my soft from sources - actually the whole processing is executing fine - just saving results is failing. 2014-06-03 21:01 GMT+02:00 Gerard Maas : > Have you tried re-compiling your job against the 1.0 release? > > > On Tue, Jun 3, 2014 at 8:46 PM, Marek W

Re: SchemaRDD's saveAsParquetFile() throws java.lang.IncompatibleClassChangeError

2014-06-03 Thread k.tham
I've read through that thread, and it seems for him, he needed to add a particular hadoop-client dependency. However, I don't think I should be required to do that as I'm not reading from HDFS. I'm just running a straight up minimal example, in local mode, and out of the box. Here's an example m

Re: SchemaRDD's saveAsParquetFile() throws java.lang.IncompatibleClassChangeError

2014-06-03 Thread Sean Owen
All of that support code uses Hadoop-related classes, like OutputFormat, to do the writing to Parquet format. There's a Hadoop code dependency in play here even if the bytes aren't going to HDFS. On Tue, Jun 3, 2014 at 10:10 PM, k.tham wrote: > I've read through that thread, and it seems for him,

Re: SchemaRDD's saveAsParquetFile() throws java.lang.IncompatibleClassChangeError

2014-06-03 Thread k.tham
I see, thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-s-saveAsParquetFile-throws-java-lang-IncompatibleClassChangeError-tp6837p6848.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

RDD with a Map

2014-06-03 Thread Amit Kumar
Hi Folks, I am new to spark -and this is probably a basic question. I have a file on the hdfs 1, one 1, uno 2, two 2, dos I want to create a multi Map RDD RDD[Map[String,List[String]]] {"1"->["one","uno"], "2"->["two","dos"]} First I read the file val identityData:RDD[String] = sc.textFile(

Re: RDD with a Map

2014-06-03 Thread Ian O'Connell
So if your data can be kept in memory on the driver node then you don't really need spark? If you want to use it for hadoop reading then i'd immediately call collect after you open it and then you can do normal scala collections operations. On Tue, Jun 3, 2014 at 2:56 PM, Amit Kumar wrote: > Hi

Re: RDD with a Map

2014-06-03 Thread Doris Xin
Hey Amit, You might want to check out PairRDDFunctions . For your use case in particular, you can load the file as a RDD[(String, String)] and then use the groupByKey() function in PairRDDFunctions to g

Re: Having spark-ec2 join new slaves to existing cluster

2014-06-03 Thread Nicholas Chammas
On Tue, Jun 3, 2014 at 6:52 AM, sirisha_devineni < sirisha_devin...@persistent.co.in> wrote: > Did you open a JIRA ticket for this feature to be implemented in spark-ec2? > If so can you please point me to the ticket? > Just created it: https://issues.apache.org/jira/browse/SPARK-2008 Nick

Re: Error related to serialisation in spark streaming

2014-06-03 Thread Andrew Ash
Hi Mayur, is that closure cleaning a JVM issue or a Spark issue? I'm used to thinking of closure cleaner as something Spark built. Do you have somewhere I can read more about this? On Tue, Jun 3, 2014 at 12:47 PM, Mayur Rustagi wrote: > So are you using Java 7 or 8. > 7 doesnt clean closures

Re: Window slide duration

2014-06-03 Thread Vadim Chekan
Ok, it's a bug in spark. I've submitted a patch: https://issues.apache.org/jira/browse/SPARK-2009 On Mon, Jun 2, 2014 at 8:39 PM, Vadim Chekan wrote: > Thanks for looking into this Tathagata. > > Are you looking for traces of ReceiveInputDStream.clearMetadata call? > Here is the log: http://wep

Re: NoSuchElementException: key not found

2014-06-03 Thread Michael Chang
Hi Tathagata, Thanks for your help! By not using coalesced RDD, do you mean not repartitioning my Dstream? Thanks, Mike On Tue, Jun 3, 2014 at 12:03 PM, Tathagata Das wrote: > I think I know what is going on! This probably a race condition in the > DAGScheduler. I have added a JIRA for thi

Re: NoSuchElementException: key not found

2014-06-03 Thread Tathagata Das
I am not sure what DStream operations you are using, but some operation is internally creating CoalescedRDDs. That is causing the race condition. I might be able help if you can tell me what DStream operations you are using. TD On Tue, Jun 3, 2014 at 4:54 PM, Michael Chang wrote: > Hi Tathagat

Re: Window slide duration

2014-06-03 Thread Vadim Chekan
Лучше по частям собрать. http://www.newegg.com/Product/Product.aspx?Item=N82E16813157497 Пассивное охлаждение, 16Гб памяти можно поставить. А на то что ты прислал 4Гб максимум, это не годиться. Выбрать малый корпус и дело с концом. On Tue, Jun 3, 2014 at 4:35 PM, Vadim Chekan wrote: > Ok, it's

Invalid Class Exception

2014-06-03 Thread Suman Somasundar
Hi all, I get the following exception when using Spark to run example k-means program. I am using Spark 1.0.0 and running the program locally. java.io.InvalidClassException: scala.Tuple2; invalid descriptor for field _1 at java.io.ObjectStreamClass.readNonProxy(ObjectStreamClass.jav

Better line number hints for logging?

2014-06-03 Thread John Salvatier
I have created some extension methods for RDDs in RichRecordRDD and these are working exceptionally well for me. However, when looking at the logs, its impossible to tell what's going on because all the line number hints point to RichRecordRDD.scala rather than the code that uses it. For example:

Re: Better line number hints for logging?

2014-06-03 Thread Matei Zaharia
You can use RDD.setName to give it a name. There’s also a creationSite field that is private[spark] — we may want to add a public setter for that later. If the name isn’t enough and you’d like this, please open a JIRA issue for it. Matei On Jun 3, 2014, at 5:22 PM, John Salvatier wrote: > I h

Re: Better line number hints for logging?

2014-06-03 Thread John Salvatier
Ok, I will probably open a Jira. On Tue, Jun 3, 2014 at 5:29 PM, Matei Zaharia wrote: > You can use RDD.setName to give it a name. There’s also a creationSite > field that is private[spark] — we may want to add a public setter for that > later. If the name isn’t enough and you’d like this, plea

Re: how to construct a ClassTag object as a method parameter in Java

2014-06-03 Thread Gino Bustelo
A better way seems to be to use ClassTag$.apply(Class). I'm going by memory since I'm on my phone, but I just did that today. Gino B. > On Jun 3, 2014, at 11:04 AM, Michael Armbrust wrote: > > Ah, this is a bug that was fixed in 1.0. > > I think you should be able to workaround it by using

Re: Interactive modification of DStreams

2014-06-03 Thread Gino Bustelo
Thanks for the reply. Are there plans to allow this runtime interactions with a dstream context? From the surface they seem doable. What is preventing this to work? Also... I implemented the modifiable windowdstream and it seemed to work good. Thanks for the pointer. Gino B. > On Jun 2, 2014

Re: Interactive modification of DStreams

2014-06-03 Thread Tobias Pfeiffer
Gino, On Wed, Jun 4, 2014 at 9:51 AM, Gino Bustelo wrote: > Thanks for the reply. Are there plans to allow this runtime interactions > with a dstream context? I would be interested in that as well. > Also... I implemented the modifiable windowdstream and it seemed to work > good. Thanks for the

spark is dead and pid file exists

2014-06-03 Thread Sophia
When I run spark in cloudera of CDH5 with service spark-master start command,it turns out that Spark master is dead and pid file exists,What can I do to solve the problem? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-is-dead-and-pid-file-exists-tp68

Re: A single build.sbt file to start Spark REPL?

2014-06-03 Thread Tobias Pfeiffer
Hi, I guess it should be possible to dig through the scripts bin/spark-shell, bin/spark-submit etc. and convert them to a long sbt command that you can run. I just tried sbt "run-main org.apache.spark.deploy.SparkSubmit spark-shell --class org.apache.spark.repl.Main" but that fails with Faile

Re: Invalid Class Exception

2014-06-03 Thread Matei Zaharia
What Java version do you have, and how did you get Spark (did you build it yourself by any chance or download a pre-built one)? If you build Spark yourself you need to do it with Java 6 — it’s a known issue because of the way Java 6 and 7 package JAR files. But I haven’t seen it result in this p

Re: Yay for 1.0.0! EC2 Still has problems.

2014-06-03 Thread Matei Zaharia
Ah, sorry to hear you had more problems. Some thoughts on them: > Thanks for that, Matei! I'll look at that once I get a spare moment. :-) > > If you like, I'll keep documenting my newbie problems and frustrations... > perhaps it might make things easier for others. > > Another issue I seem to

Re: Upgradation to Spark 1.0.0

2014-06-03 Thread Matei Zaharia
You can copy your configuration from the old one. I’d suggest just downloading it to a different location on each node first for testing, then you can delete the old one if things work. On Jun 3, 2014, at 12:38 AM, MEETHU MATHEW wrote: > Hi , > > I am currently using SPARK 0.9 configured wit

Re: access hdfs file name in map()

2014-06-03 Thread Xu (Simon) Chen
I don't quite get it.. mapPartitionWithIndex takes a function that maps an integer index and an iterator to another iterator. How does that help with retrieving the hdfs file name? I am obviously missing some context.. Thanks. On May 30, 2014 1:28 AM, "Aaron Davidson" wrote: > Currently there

How to stop a running SparkContext in the proper way?

2014-06-03 Thread MEETHU MATHEW
Hi, I want to know how I can stop a running SparkContext in a proper way so that next time when I start a new SparkContext, the web UI can be launched on the same port 4040.Now when i quit the job using ctrl+z the new sc are launched in new ports. I have the same problem with ipython notebook.

Re: spark is dead and pid file exists

2014-06-03 Thread Theodore Wong
Look in the directory /var/run/spark to see if a spark-master.pid file is left over from a crashed master, and remove it. - -- Theodore Wong www.tmwong.org -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-is-dead-and-pid-file-ex

KMeans.train() throws NotSerializableException

2014-06-03 Thread bluejoe2008
when i called KMeans.train(), an error happened: 14/06/04 13:02:29 INFO scheduler.DAGScheduler: Submitting Stage 3 (MappedRDD[12] at map at KMeans.scala:123), which has no missing parents 14/06/04 13:02:29 INFO scheduler.DAGScheduler: Failed to run takeSample at KMeans.scala:260 Exception in thr

Re: How to stop a running SparkContext in the proper way?

2014-06-03 Thread Xiangrui Meng
Did you try sc.stop()? On Tue, Jun 3, 2014 at 9:54 PM, MEETHU MATHEW wrote: > Hi, > > I want to know how I can stop a running SparkContext in a proper way so that > next time when I start a new SparkContext, the web UI can be launched on the > same port 4040.Now when i quit the job using ctrl+z t

Re: KMeans.train() throws NotSerializableException

2014-06-03 Thread Matei Zaharia
How is your RDD created? It might mean that something used in the process of creating it was not serializable. Matei On Jun 3, 2014, at 10:11 PM, bluejoe2008 wrote: > when i called KMeans.train(), an error happened: > > 14/06/04 13:02:29 INFO scheduler.DAGScheduler: Submitting Stage 3 > (Ma

ZeroMQ Stream -> stack guard problem and no data

2014-06-03 Thread Tobias Pfeiffer
Hi, I am trying to use Spark Streaming (1.0.0) with ZeroMQ, i.e. I say def bytesToStringIterator(x: Seq[ByteString]) = (x.map(_.utf8String)).iterator val lines: DStream[String] = ZeroMQUtils.createStream(ssc, "tcp://localhost:5556", Subscribe("mytopic"), byte

Re: How to stop a running SparkContext in the proper way?

2014-06-03 Thread Akhil Das
ctrl + z will stop the job from being executed ( If you do a *fg/bg *you can resume the job). You need to press ctrl + c to terminate the job! Thanks Best Regards On Wed, Jun 4, 2014 at 10:24 AM, MEETHU MATHEW wrote: > Hi, > > I want to know how I can stop a running SparkContext in a proper wa

Re: ZeroMQ Stream -> stack guard problem and no data

2014-06-03 Thread Prashant Sharma
Hi, What is your Zeromq version ? It is known to work well with 2.2 an output of `sudo ldconfig -v | grep zmq` would helpful in this regard. Thanks Prashant Sharma On Wed, Jun 4, 2014 at 11:40 AM, Tobias Pfeiffer wrote: > Hi, > > I am trying to use Spark Streaming (1.0.0) with ZeroMQ, i.e.