Bouncing Mails

2015-01-17 Thread Akhil Das
My mails to the mailing list are getting rejected, have opened a Jira issue, can someone take a look at it? https://issues.apache.org/jira/browse/INFRA-9032 Thanks Best Regards

Re: Futures timed out during unpersist

2015-01-17 Thread Akhil Das
What is the data size? Have you tried increasing the driver memory?? Thanks Best Regards On Sat, Jan 17, 2015 at 1:01 PM, Kevin (Sangwoo) Kim wrote: > Hi experts, > I got an error during unpersist RDD. > Any ideas? > > java.util.concurrent.TimeoutException: Futures timed out after [30 > seconds

Re: HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-17 Thread Akhil Das
Safest way would be to first shutdown HDFS and then shutdown Spark (call stop-all.sh would do) and then shutdown the machines. You can execute the following command to disable safe mode: *hadoop fs -safemode leave* Thanks Best Regards On Sat, Jan 17, 2015 at 8:31 AM, Su She wrote: > Hello E

Re: Problem with File Streams

2015-01-17 Thread Akhil Das
Try: JavaPairDStream foo = ssc.fileStream("/sigmoid/foo"); Thanks Best Regards On Sat, Jan 17, 2015 at 4:24 AM, Leonidas Fegaras wrote: > Dear Spark users, > I have a problem using File Streams in Java on Spark 1.2.0. I can process > hadoop files in local mode using: > > spark_context.newAPI

No Output

2015-01-17 Thread Deep Pradhan
Hi, I am using Spark-1.0.0 in a single node cluster. When I run a job with small data set it runs perfectly but when I use a data set of 350 KB, no output is being produced and when I try to run it the second time it is giving me an exception telling that SparkContext was shut down. Can anyone help

Re: remote Akka client disassociated - some timeout?

2015-01-17 Thread Akhil Das
​Try setting the following property: .set("spark.akka.frameSize","50")​ Also make sure that spark is able read from hbase (you can try it with small amount data). Thanks Best Regards On Fri, Jan 16, 2015 at 11:30 PM, Antony Mayi wrote: > Hi, > > I believe this is some kind of timeout problem

RE: using hiveContext to select a nested Map-data-type from an AVROmodel+parquet file

2015-01-17 Thread Cheng, Hao
Wow, glad to know that it works well, and sorry, the Jira is another issue, which is not the same case here. From: Bagmeet Behera [mailto:bagme...@gmail.com] Sent: Saturday, January 17, 2015 12:47 AM To: Cheng, Hao Subject: Re: using hiveContext to select a nested Map-data-type from an AVROmode

Re: No Output

2015-01-17 Thread Akhil Das
Can you paste the code? Also you can try updating your spark version. Thanks Best Regards On Sat, Jan 17, 2015 at 2:40 PM, Deep Pradhan wrote: > Hi, > I am using Spark-1.0.0 in a single node cluster. When I run a job with > small data set it runs perfectly but when I use a data set of 350 KB, n

Re: HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-17 Thread Sean Owen
You would not want to turn off storage underneath Spark. Shut down Spark first, then storage, then shut down the instances. Reverse the order when restarting. HDFS will be in safe mode for a short time after being started before it becomes writeable. I would first check that it's not just that. Ot

Spark Streaming

2015-01-17 Thread Rohit Pujari
Hello Folks: I'm running into following error while executing relatively straight forward spark-streaming code. Am I missing anything? *Exception in thread "main" java.lang.AssertionError: assertion failed: No output streams registered, so nothing to execute* Code: val conf = new SparkConf().s

Re: Spark Streaming

2015-01-17 Thread Akhil Das
You need to trigger some action (stream.print(), stream.foreachRDD, stream.saveAs*) over the stream that you created for the entire pipeline to execute. In your code add the following line: *unifiedStream.print()* Thanks Best Regards On Sat, Jan 17, 2015 at 3:35 PM, Rohit Pujari wrote: > He

Re: Spark Streaming

2015-01-17 Thread Rohit Pujari
Hi Francois: I tried using "print(kafkaStream)" as output operator but no luck. It throws the same error. Any other thoughts? Thanks, Rohit From: "francois.garil...@typesafe.com" mailto:francois.garil...@typesafe.com>> Date: Saturday, January 17, 2015 at

Re: Spark Streaming

2015-01-17 Thread Sean Owen
Not print(kafkaStream), which would just print some String description of the stream to the console, but kafkaStream.print(), which actually invokes the print operation on the stream. On Sat, Jan 17, 2015 at 10:17 AM, Rohit Pujari wrote: > Hi Francois: > > I tried using "print(kafkaStream)” as ou

Re: ALS.trainImplicit running out of mem when using higher rank

2015-01-17 Thread Sean Owen
I'm not sure how you are setting these values though. Where is spark.yarn.executor.memoryOverhead=6144 ? Env variables aren't the best way to set configuration either. Again have a look at http://spark.apache.org/docs/latest/running-on-yarn.html ... --executor-memory 22g --conf "spark.yarn.executo

Re: Spark Streaming

2015-01-17 Thread Rohit Pujari
That was it. Thanks Akhil and Owen for your quick response. On Sat, Jan 17, 2015 at 4:27 AM, Sean Owen wrote: > Not print(kafkaStream), which would just print some String description > of the stream to the console, but kafkaStream.print(), which actually > invokes the print operation on the stre

Re: Maven out of memory error

2015-01-17 Thread Sean Owen
Hm, this test hangs for me in IntelliJ. It could be a real problem, and a combination of a) just recently actually enabling Java tests, b) recent updates to the complicated Guava shading situation. The manifestation of the error usually suggests that something totally failed to start (because of,

Re: spark 1.2 compatibility

2015-01-17 Thread Chitturi Padma
Yes. I built spar 1.2 with apache hadoop 2.2. No compatibility issues. On Sat, Jan 17, 2015 at 4:47 AM, bhavyateja [via Apache Spark User List] < ml-node+s1001560n21197...@n3.nabble.com> wrote: > Is spark 1.2 is compatibly with HDP 2.1 > > -- > If you reply to this em

Re: remote Akka client disassociated - some timeout?

2015-01-17 Thread Ted Yu
Antony: Please check hbase master log to see if there was something noticeable in that period of time. If the hbase cluster is not big, check region server log as well. Cheers > On Jan 16, 2015, at 10:00 AM, Antony Mayi > wrote: > > Hi, > > I believe this is some kind of timeout problem

Re: Discourse: A proposed alternative to the Spark User list

2015-01-17 Thread pzecevic
Hi, guys! I'm reviving this old question from Nick Chammas with a new proposal: what do you think about creating a separate Stack Exchange 'Apache Spark' site (like 'philosophy' and 'English' etc.)? I'm not sure what would be the best way to deal with user and dev lists, though - to merge them in

Error occurs when running Spark SQL example

2015-01-17 Thread bit1...@163.com
When I run the following spark sql example within Idea, I got the StackOverflowError, lookes like the scala.util.parsing.combinator.Parsers are calling recursively and infinitely. Anyone encounters this? package spark.examples import org.apache.spark.{SparkContext, SparkConf} import org.apa

[no subject]

2015-01-17 Thread Kyounghyun Park
Hi, I'm running Spark 1.2 in yarn-client mode. (using Hadoop 2.6.0) On VirtualBox, I can run " spark-shell --master yarn-client" without any error However, on a physical machine, I got the following error. Does anyone know why this happens? Any help would be appreciated. Thanks, Kyounghyun --

spark error in yarn-client mode

2015-01-17 Thread Kyounghyun Park
Hi, I'm running Spark 1.2 in yarn-client mode. (using Hadoop 2.6.0) On VirtualBox, I can run " spark-shell --master yarn-client" without any error However, on a physical machine, I got the following error. Does anyone know why this happens? Any help would be appreciated. Thanks, Kyounghyun --

Re: ALS.trainImplicit running out of mem when using higher rank

2015-01-17 Thread Antony Mayi
the values are for sure applied as expected - confirmed using the spark UI environment page... it comes from my defaults configured using 'spark.yarn.executor.memoryOverhead=8192' (yes, now increased even more) in  /etc/spark/conf/spark-defaults.conf and 'export SPARK_EXECUTOR_MEMORY=24G' in  /et

Re: Futures timed out during unpersist

2015-01-17 Thread Kevin (Sangwoo) Kim
data size is about 300~400GB, I'm using 800GB cluster and set driver memory to 50GB. On Sat Jan 17 2015 at 6:01:46 PM Akhil Das wrote: > What is the data size? Have you tried increasing the driver memory?? > > Thanks > Best Regards > > On Sat, Jan 17, 2015 at 1:01 PM, Kevin (Sangwoo) Kim > wrot

Is cluster mode is supported by the submit command for standalone clusters?

2015-01-17 Thread guxiaobo1982
Hi, The submitting applications guide in http://spark.apache.org/docs/latest/submitting-applications.html says: Alternatively, if your application is submitted from a machine far from the worker machines (e.g. locally on your laptop), it is common to usecluster mode to minimize network laten

Re: Discourse: A proposed alternative to the Spark User list

2015-01-17 Thread Andrew Ash
People can continue using the stack exchange sites as is with no additional work from the Spark team. I would not support migrating our mailing lists yet again to another system like Discourse because I fear fragmentation of the community between the many sites. On Sat, Jan 17, 2015 at 6:24 AM, p

Join DStream With Other Datasets

2015-01-17 Thread Ji ZHANG
Hi, I want to join a DStream with some other dataset, e.g. join a click stream with a spam ip list. I can think of two possible solutions, one is use broadcast variable, and the other is use transform operation as is described in the manual. But the problem is the spam ip list will be updated out

Re: spark 1.2 compatibility

2015-01-17 Thread bhavyateja
Hi Did you try using spark 1.2 on hdp 2.1 YARN Can you please go thru the thread http://apache-spark-user-list.1001560.n3.nabble.com/Troubleshooting-Spark-tt21189.html and check where I am going wrong. As my word count program is erroring out when using spark 1.2 using YARN but its getting execut

Re: Why Parquet Predicate Pushdown doesn't work?

2015-01-17 Thread Yana Kadiyska
Just wondering if you've made any progress on this -- I'm having the same issue. My attempts to help myself are documented here http://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAJ4HpHFVKvdNgKes41DvuFY=+f_nTJ2_RT41+tadhNZx=bc...@mail.gmail.com%3E . I don't believe I have the valu

Re: Discourse: A proposed alternative to the Spark User list

2015-01-17 Thread Nicholas Chammas
The Stack Exchange community will not support creating a whole new site just for Spark (otherwise you’d see dedicated sites for much larger topics like “Python”). Their tagging system works well enough to separate questions about different topics, and the apache-spark

Re: Problem with File Streams

2015-01-17 Thread Leonidas Fegaras
My key/value classes are custom serializable classes. It looks like a bug. So I filed it on JIRA as SPARK-5297 Thanks Leonidas On 01/17/2015 03:07 AM, Akhil Das wrote: Try: JavaPairDStream foo = ssc.fileStream("/sigmoid/foo"); Thanks Best Regards On Sat, Jan 17, 2015 at 4:24 AM, Leonida

Re: spark 1.2 compatibility

2015-01-17 Thread bhavyateja
Hi all, Thanks for your contribution. We have checked and confirmed that HDP 2.1 YARN don't work with Spark 1.2 On Sat, Jan 17, 2015 at 9:11 AM, bhavya teja potineni < bhavyateja.potin...@gmail.com> wrote: > Hi > > Did you try using spark 1.2 on hdp 2.1 YARN > > Can you please go thru the thread

Re: spark 1.2 compatibility

2015-01-17 Thread Chitturi Padma
It worked for me. spark 1.2.0 with hadoop 2.2.0 On Sat, Jan 17, 2015 at 9:39 PM, bhavyateja [via Apache Spark User List] < ml-node+s1001560n21207...@n3.nabble.com> wrote: > Hi all, > > Thanks for your contribution. We have checked and confirmed that HDP 2.1 > YARN don't work with Spark 1.2 > > On

Re: Maven out of memory error

2015-01-17 Thread Andrew Musselman
Failing for me and another team member on the command line, for what it's worth. > On Jan 17, 2015, at 2:39 AM, Sean Owen wrote: > > Hm, this test hangs for me in IntelliJ. It could be a real problem, > and a combination of a) just recently actually enabling Java tests, b) > recent updates to th

Re: Spark SQL Custom Predicate Pushdown

2015-01-17 Thread Corey Nolet
I did an initial implementation. There are two assumptions i had from the start that I was very surprised were not a part of the predicate pushdown API: 1) The fields in the SELECT clause are not pushed down to the predicate pushdown API. I have many optimizations that allow fields to be filtered

Re: Row similarities

2015-01-17 Thread Andrew Musselman
Thanks Reza, interesting approach. I think what I actually want is to calculate pair-wise distance, on second thought. Is there a pattern for that? > On Jan 16, 2015, at 9:53 PM, Reza Zadeh wrote: > > You can use K-means with a suitably large k. Each cluster should correspond > to rows that

Re: Row similarities

2015-01-17 Thread Suneel Marthi
Andrew, u would be better off using Mahout's RowSimilarityJob for what u r trying to accomplish.  1.  It does give u pair-wise distances 2.  U can specify the Distance measure u r looking to use 3.  There's the old MapReduce impl and the Spark DSL impl per ur preference. From: Andrew Mus

Re: spark 1.2 compatibility

2015-01-17 Thread bhavyateja
Yes it works with 2.2 but we are trying to use spark 1.2 on HDP 2.1 On Sat, Jan 17, 2015, 11:18 AM Chitturi Padma [via Apache Spark User List] < ml-node+s1001560n21208...@n3.nabble.com> wrote: > It worked for me. spark 1.2.0 with hadoop 2.2.0 > > On Sat, Jan 17, 2015 at 9:39 PM, bhavyateja [via A

Re: Maven out of memory error

2015-01-17 Thread Ted Yu
The test passed here: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/1215/consoleFull It passed locally with the following command: mvn -DHADOOP_PROFILE=hadoop-2.4 -Phadoop-2.4 -Pyarn -Phive test -Dtest=JavaAPISuite FYI

Re: Row similarities

2015-01-17 Thread Andrew Musselman
Yeah that's the kind of thing I'm looking for; was looking at SPARK-4259 and poking around to see how to do things. https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-4259 > On Jan 17, 2015, at 8:35 AM, Suneel Marthi wrote: > > Andrew, u would be better off using Mahout's RowSim

Re: Row similarities

2015-01-17 Thread Pat Ferrel
Mahout’s Spark implementation of rowsimilarity is in the Scala SimilarityAnalysis class. It actually does either row or column similarity but only supports LLR at present. It does [AA’] for columns or [A’A] for rows first then calculates the distance (LLR) for non-zero elements. This is a major

Re: Row similarities

2015-01-17 Thread Andrew Musselman
Excellent, thanks Pat. > On Jan 17, 2015, at 9:27 AM, Pat Ferrel wrote: > > Mahout’s Spark implementation of rowsimilarity is in the Scala > SimilarityAnalysis class. It actually does either row or column similarity > but only supports LLR at present. It does [AA’] for columns or [A’A] for row

Cluster hangs in 'ssh-ready' state using Spark 1.2 EC2 launch script

2015-01-17 Thread Nathan Murthy
Originally posted here: http://stackoverflow.com/questions/28002443/cluster-hangs-in-ssh-ready-state-using-spark-1-2-ec2-launch-script I'm trying to launch a standalone Spark cluster using its pre-packaged EC2 scripts, but it just indefinitely hangs in an 'ssh-ready' state: ubuntu@machine:~/s

Re: Join DStream With Other Datasets

2015-01-17 Thread Jörn Franke
Can't you send a special event through spark streaming once the list is updated? So you have your normal events and a special reload event Le 17 janv. 2015 15:06, "Ji ZHANG" a écrit : > Hi, > > I want to join a DStream with some other dataset, e.g. join a click > stream with a spam ip list. I can

Re: Performance issue

2015-01-17 Thread TJ Klein
I suspect that putting a function into shared variable incurs additional overhead? Any suggestion how to avoid that? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Performance-issue-tp21194p21210.html Sent from the Apache Spark User List mailing list archiv

Re: Row similarities

2015-01-17 Thread Pat Ferrel
BTW it looks like row and column similarities (cosine based) are coming to MLlib through DIMSUM. Andrew said rowSimilarity doesn’t seem to be in the master yet. Does anyone know the status? See: https://databricks.com/blog/2014/10/20/efficient-similarity-algorithm-now-in-spark-twitter.html

Re: Row similarities

2015-01-17 Thread Reza Zadeh
Pat, columnSimilarities is what that blog post is about, and is already part of Spark 1.2. rowSimilarities in a RowMatrix is a little more tricky because you can't transpose a RowMatrix easily, and is being tracked by this JIRA: https://issues.apache.org/jira/browse/SPARK-4823 Andrew, sometimes (

Re: Row similarities

2015-01-17 Thread Andrew Musselman
Yeah okay, thanks. > On Jan 17, 2015, at 11:15 AM, Reza Zadeh wrote: > > Pat, columnSimilarities is what that blog post is about, and is already part > of Spark 1.2. > > rowSimilarities in a RowMatrix is a little more tricky because you can't > transpose a RowMatrix easily, and is being track

Re: Spark SQL Custom Predicate Pushdown

2015-01-17 Thread Michael Armbrust
> > 1) The fields in the SELECT clause are not pushed down to the predicate > pushdown API. I have many optimizations that allow fields to be filtered > out before the resulting object is serialized on the Accumulo tablet > server. How can I get the selection information from the execution plan? >

Re: Cluster hangs in 'ssh-ready' state using Spark 1.2 EC2 launch script

2015-01-17 Thread gen tang
Hi, This is because "ssh-ready" is the ec2 scripy means that all the instances are in the status of running and all the instances in the status of "OK", In another word, the instances is ready to download and to install software, just as emr is ready for bootstrap actions. Before, the script just

Re: Spark SQL Custom Predicate Pushdown

2015-01-17 Thread Corey Nolet
Michael, What I'm seeing (in Spark 1.2.0) is that the required columns being pushed down to the DataRelation are not the product of the SELECT clause but rather just the columns explicitly included in the WHERE clause. Examples from my testing: SELECT * FROM myTable --> The required columns are

Spark job stuck at RangePartitioner at Exchange.scala:79

2015-01-17 Thread Sunita Arvind
Hi, My spark jobs suddenly started getting hung and here is the debug leading to it: Following the program, it seems to be stuck whenever I do any collect(), count or rdd.saveAsParquet file. AFAIK, any operation that requires data flow back to master causes this. I increased the memory to 5 MB. Al

Re: HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-17 Thread Su She
Thanks Akhil and Sean for the responses. I will try shutting down spark, then storage and then the instances. Initially, when hdfs was in safe mode, I waited for >1 hour and the problem still persisted. I will try this new method. Thanks! On Sat, Jan 17, 2015 at 2:03 AM, Sean Owen wrote: > Y

Re: Spark SQL Custom Predicate Pushdown

2015-01-17 Thread Michael Armbrust
How are you running your test here? Are you perhaps doing a .count()? On Sat, Jan 17, 2015 at 12:54 PM, Corey Nolet wrote: > Michael, > > What I'm seeing (in Spark 1.2.0) is that the required columns being pushed > down to the DataRelation are not the product of the SELECT clause but > rather j

Re: Spark SQL Custom Predicate Pushdown

2015-01-17 Thread Corey Nolet
I see now. It optimizes the selection semantics so that less things need to be included just to do a count(). Very nice. I did a collect() instead of a count just to see what would happen and it looks like the all the expected select fields were propagated down as expected. Thanks. On Sat, Jan

Re: Row similarities

2015-01-17 Thread Pat Ferrel
In the Mahout Spark R-like DSL [A’A] and [AA’] doesn’t actually do a transpose—it’s optimized out. Mahout has had a stand alone row matrix transpose since day 1 and supports it in the Spark version. Can’t really do matrix algebra without it even though it’s often possible to optimize it away.

maven doesn't build dependencies with Scala 2.11

2015-01-17 Thread Walrus theCat
Hi, When I run this: dev/change-version-to-2.11.sh mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -DskipTests clean package as per here , maven doesn't build Spark's dependencies. Only when I run: dev/change-version-to-2.11

Directory / File Reading Patterns

2015-01-17 Thread Steve Nunez
Hello Users, I've got a real-world use case that seems common enough that its pattern would be documented somewhere, but I can't find any references to a simple solution. The challenge is that data is getting dumped into a directory structure, and that directory structure itself contains featur

Re: Bouncing Mails

2015-01-17 Thread Patrick Wendell
Akhil, Those are handled by ASF infrastructure, not anyone in the Spark project. So this list is not the appropriate place to ask for help. - Patrick On Sat, Jan 17, 2015 at 12:56 AM, Akhil Das wrote: > My mails to the mailing list are getting rejected, have opened a Jira issue, > can someone t

Re: maven doesn't build dependencies with Scala 2.11

2015-01-17 Thread Ted Yu
I did the following: 1655 dev/change-version-to-2.11.sh 1657 mvn -DHADOOP_PROFILE=hadoop-2.4 -Pyarn,hive -Phadoop-2.4 -Dscala-2.11 -DskipTests clean package And mvn command passed. Did you see any cross-compilation errors ? Cheers BTW the two links you mentioned are consistent in terms of b

Spark UI and Spark Version on Google Compute Engine

2015-01-17 Thread Soumya Simanta
I'm deploying Spark using the "Click to Deploy" Hadoop -> "Install Apache Spark" on Google Compute Engine. I can run Spark jobs on the REPL and read data from Google storage. However, I'm not sure how to access the Spark UI in this deployment. Can anyone help? Also, it deploys Spark 1.1. It there

Spark attempts to de/serialize using JavaSerializer despite being configured to use Kryo

2015-01-17 Thread waymost
I'm new to Spark and have run into issues using Kryo for serialization instead of Java. I have my SparkConf configured as such: val conf = new SparkConf().setMaster("local").setAppName("test") .set("spark.kryo.registrationRequired","false") .set("spark.serializer", classOf[KryoSeri

How to get the master URL at runtime inside driver program?

2015-01-17 Thread guxiaobo1982
Hi, Driver programs submitted by the spark-submit script will get the runtime spark master URL, but how it get the URL inside the main method when creating the SparkConf object? Regards,

Re: Spark UI and Spark Version on Google Compute Engine

2015-01-17 Thread Matei Zaharia
Unfortunately we don't have anything to do with Spark on GCE, so I'd suggest asking in the GCE support forum. You could also try to launch a Spark cluster by hand on nodes in there. Sigmoid Analytics published a package for this here: http://spark-packages.org/package/9 Matei > On Jan 17, 2015

Re: Row similarities

2015-01-17 Thread Reza Zadeh
We're focused on providing block matrices, which makes transposition simple: https://issues.apache.org/jira/browse/SPARK-3434 On Sat, Jan 17, 2015 at 3:25 PM, Pat Ferrel wrote: > In the Mahout Spark R-like DSL [A’A] and [AA’] doesn’t actually do a > transpose—it’s optimized out. Mahout has had a

Re: maven doesn't build dependencies with Scala 2.11

2015-01-17 Thread Ted Yu
There're 3 jars under lib_managed/jars directory with and without -Dscala-2.11 flag. Difference between scala-2.10 and scala-2.11 profiles is that scala-2.10 profile has the following: external/kafka FYI On Sat, Jan 17, 2015 at 4:07 PM, Ted Yu wrote: > I did the following

Re: Bouncing Mails

2015-01-17 Thread Akhil Das
Yep. They have sorted it out it seems. On 18 Jan 2015 03:58, "Patrick Wendell" wrote: > Akhil, > > Those are handled by ASF infrastructure, not anyone in the Spark > project. So this list is not the appropriate place to ask for help. > > - Patrick > > On Sat, Jan 17, 2015 at 12:56 AM, Akhil Das

Re: Row similarities

2015-01-17 Thread Andrew Musselman
Makes sense. > On Jan 17, 2015, at 6:27 PM, Reza Zadeh wrote: > > We're focused on providing block matrices, which makes transposition simple: > https://issues.apache.org/jira/browse/SPARK-3434 > >> On Sat, Jan 17, 2015 at 3:25 PM, Pat Ferrel wrote: >> In the Mahout Spark R-like DSL [A’A] and

Re: Multiple Spark Streaming receiver model

2015-01-17 Thread aglowik
I'm new to Spark. From my experience when I use a"single StreamingContext to create different input streams from different sources" I get multiple errors and problems down stream. This seems like it is not the way to go. From what I read creating multiple StreamingContext is not advised. It appears

spark-submit --py-files remote: "Only local additional python files are supported"

2015-01-17 Thread voukka
Hi all! I found this problem when I tried running python application on Amazon's EMR yarn cluster. It is possible to run bundled example applications on EMR but I cannot figure out how to run a little bit more complex python application which depends on some other python scripts. I tried adding t

Re: Is spark suitable for large scale pagerank, such as 200 millionnodes, 2 billion edges?

2015-01-17 Thread txw
I’ve read these pages. In the paper "GraphX: Graph Processing in a Distributed Dataflow Framework “, the authors claim that it only takes 400 seconds for uk-2007-05 dataset, which is similar size as my dateset. Is the current Graphx the same version as the Graphx in that paper? And how many part

Not able to run spark job from code on EC2 with spark 1.2.0

2015-01-17 Thread rahulkumar-aws
Hi I am trying to run simple count on a s3 bucket, but with spark 1.2.0 version on EC2 it is not able to run. I started my cluster using ec2 script that came with spark 1.2.0. some part of code : It is working with spark 1.1.1 , but not with 1.2.0 - Software Developer SigmoidAna

Re: SparkSQL 1.2.0 sources API error

2015-01-17 Thread Walrus theCat
I'm getting this also, with Scala 2.11 and Scala 2.10: 15/01/18 07:34:51 INFO slf4j.Slf4jLogger: Slf4jLogger started 15/01/18 07:34:51 INFO Remoting: Starting remoting 15/01/18 07:34:51 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.remote.default-remote-dispatcher