Re: ElasticSearch enrich

2014-06-26 Thread boci
That's okay, but hadoop has ES integration. what happened if I run saveAsHadoopFile without hadoop (or I must need to pull up hadoop programatically? (if I can)) b0c1 --

Re: Spark standalone network configuration problems

2014-06-26 Thread Akhil Das
Hi Shannon, It should be a configuration issue, check in your /etc/hosts and make sure localhost is not associated with the SPARK_MASTER_IP you provided. Thanks Best Regards On Thu, Jun 26, 2014 at 6:37 AM, Shannon Quinn wrote: > Hi all, > > I have a 2-machine Spark network I've set up: a ma

Re: ElasticSearch enrich

2014-06-26 Thread Nick Pentreath
You can just add elasticsearch-hadoop as a dependency to your project to user the ESInputFormat and ESOutputFormat ( https://github.com/elasticsearch/elasticsearch-hadoop). Some other basics here: http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/spark.html For testing, yes I thin

Spark Streaming Window without slideDuration parameter

2014-06-26 Thread haopu
If a window is defined without the slideDuration parameter, how will it slide? I guess it will use context's batchInterval as the slideDuration? Thanks for any help. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Window-without-slideDuration

Re: Using CQLSSTableWriter to batch load data from Spark to Cassandra.

2014-06-26 Thread Rohit Rai
Hi Gerard, What is the version of Spark, Hadoop, Cassandra and Calliope are you using. We never built Calliope to Hadoop2 as we/or our clients don't use Hadoop in their deployments or use it only as the Infra component for Spark in which case H1/H2 doesn't make a difference for them. I know atlea

RE: Prediction using Classification with text attributes in Apache Spark MLLib

2014-06-26 Thread Ulanov, Alexander
Hi, I cannot argue about other use-cases, however MLLib doesn’t support working with text classification out of the box. There was basic support in MLI (thanks Sean for correcting me that it is MLI not MLLib), but I don’t know why it is not developed anymore. For text classification in general

Re: Spark vs Google cloud dataflow

2014-06-26 Thread Sean Owen
Dataflow is a hosted service and tries to abstract an entire pipeline; Spark maps to some components in that pipeline and is software. My first reaction was that Dataflow mapped more to Summingbird, as part of it is a higher-level system for doing a specific thing in batch/streaming -- aggregations

Re: Hadoop interface vs class

2014-06-26 Thread Sean Owen
You seem to have the binary for Hadoop 2, since it was compiled expecting that TaskAttemptContext is an interface. So the error indicates that Spark is also seeing Hadoop 1 classes somewhere. On Wed, Jun 25, 2014 at 4:41 PM, Robert James wrote: > After upgrading to Spark 1.0.0, I get this error:

What's the best practice to deploy spark on Big SMP servers?

2014-06-26 Thread guxiaobo1982
Hi, We have a big SMP server(with 128G RAM and 32 CPU cores) to runn small scale analytical works, what's the best practice to deploy a stand alone Spark on the server to achieve good performance. How many instances should be configured, how many RAM and CPU cores should be allocated for ea

Re: Spark executor error

2014-06-26 Thread Surendranauth Hiraman
I unfortunately haven't seen this directly. But some typical things I try when debugging are as follows. Do you see a corresponding error on the other side of that connection (alpinenode7.alpinenow.local)? Or is that the same machine? Also, do the driver logs show any longer stack trace and have

running multiple applications at the same time

2014-06-26 Thread jamborta
Hi all, not sure if this is a config issue or it's by design, but when I run the spark shell, and try to submit another application from elsewhere, the second application waits for the first to finish and outputs the following: Initial job has not accepted any resources; check your cluster UI to

Re: running multiple applications at the same time

2014-06-26 Thread Akhil Das
​​Hi Jamborta, You can use the following options in your application to limit the usage of resources, like - spark.cores.max - spark.executor.memory Its better to use Mesos if you want to run multiple applications on the same cluster smoothly. Thanks Best Regards On Thu, Jun 26, 2014

About StorageLevel

2014-06-26 Thread tomsheep...@gmail.com
Hi all, I have a newbie question about StorageLevel of spark. I came up with these sentences in spark documents: If your RDDs fit comfortably with the default storage level (MEMORY_ONLY), leave them that way. This is the most CPU-efficient option, allowing operations on the RDDs to run as fast

Re: running multiple applications at the same time

2014-06-26 Thread jamborta
thanks a lot. I have tried restricting the memory usage before, but it seems it was the issue with the number of cores available. I am planning to run this on a yarn cluster, I assume yarn's resource manager will take care of allocating resources, too? -- View this message in context: http:/

[ANNOUNCE] Apache MRQL 0.9.2-incubating released

2014-06-26 Thread Leonidas Fegaras
The Apache MRQL team is pleased to announce the release of Apache MRQL 0.9.2-incubating. This is our second Apache release. Apache MRQL is a query processing and optimization system for large-scale distributed data analysis, built on top of Apache Hadoop, Hama, and Spark. The release artifacts ar

Re: running multiple applications at the same time

2014-06-26 Thread Akhil Das
Yep, it does. Thanks Best Regards On Thu, Jun 26, 2014 at 6:11 PM, jamborta wrote: > thanks a lot. I have tried restricting the memory usage before, but it > seems > it was the issue with the number of cores available. > > I am planning to run this on a yarn cluster, I assume yarn's resource >

Re: Spark standalone network configuration problems

2014-06-26 Thread Shannon Quinn
Still running into the same problem. /etc/hosts on the master says 127.0.0.1localhost machine1 is the same address set in spark-env.sh for SPARK_MASTER_IP. Any other ideas? On 6/26/14, 3:11 AM, Akhil Das wrote: Hi Shannon, It should be a configuration issue, check in your /

Re: Spark standalone network configuration problems

2014-06-26 Thread Akhil Das
Do you have machine1 in your workers /etc/hosts also? If so try telneting from your machine2 to machine1 on port 5060. Also make sure nothing else is running on port 5060 other than Spark (*lsof -i:5060*) Thanks Best Regards On Thu, Jun 26, 2014 at 6:35 PM, Shannon Quinn wrote: >

Re: Spark standalone network configuration problems

2014-06-26 Thread Shannon Quinn
Both /etc/hosts have each other's IP addresses in them. Telneting from machine2 to machine1 on port 5060 works just fine. Here's the output of lsof: user@machine1:~/spark/spark-1.0.0-bin-hadoop2$ lsof -i:5060 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java23985 user 30u I

Re: jsonFile function in SQLContext does not work

2014-06-26 Thread Yin Huai
Yes. It will be added in later versions. Thanks, Yin On Wed, Jun 25, 2014 at 3:39 PM, durin wrote: > Hi Yin an Aaron, > > thanks for your help, this was indeed the problem. I've counted 1233 blank > lines using grep, and the code snippet below works with those. > > From what you said, I guess

Re: Spark standalone network configuration problems

2014-06-26 Thread Akhil Das
Can you paste your spark-env.sh file? Thanks Best Regards On Thu, Jun 26, 2014 at 7:01 PM, Shannon Quinn wrote: > Both /etc/hosts have each other's IP addresses in them. Telneting from > machine2 to machine1 on port 5060 works just fine. > > Here's the output of lsof: > > user@machine1:~/spar

Re: Spark vs Google cloud dataflow

2014-06-26 Thread Aureliano Buendia
On Thu, Jun 26, 2014 at 10:58 AM, Sean Owen wrote: > My first reaction was that Dataflow mapped more to Summingbird, as part > Summingbird is for map/reduce. Dataflow is the third generation of google's map/reduce, and it generalizes map/reduce the way Spark does. See more about this here: http:

Re: Spark standalone network configuration problems

2014-06-26 Thread Shannon Quinn
export SPARK_MASTER_IP="192.168.1.101" export SPARK_MASTER_PORT="5060" export SPARK_LOCAL_IP="127.0.0.1" That's it. If I comment out the SPARK_LOCAL_IP or set it to be the same as SPARK_MASTER_IP, that's when it throws the "address already in use" error. If I leave it as the localhost IP, that'

Re: Where Can I find the full documentation for Spark SQL?

2014-06-26 Thread Gianluca Privitera
Not that I know of, they will probably add more with next version since Spark SQL is getting a lot of attention. Gianluca On 26 Jun 2014, at 00:10, guxiaobo1982 mailto:guxiaobo1...@qq.com>> wrote: the api only says this : public JavaSchemaRDD

Re: LiveListenerBus throws exception and weird web UI bug

2014-06-26 Thread Baoxu Shi(Dash)
Hi Pei-Lun, I have the same problem there. The Issue is SPARK-2228, there also someone posted a pull request on that, but he only eliminate this exception but not the side effects. I think the problem may due to the hard-coded private val EVENT_QUEUE_CAPACITY = 1 in core/src/main/scala/

Re: Hadoop interface vs class

2014-06-26 Thread Robert James
Yes. As far as I can tell, Spark seems to be including Hadoop 1 via its transitive dependency: http://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10/1.0.0 - shows a dependency on Hadoop 1.0.4, which I'm perplexed by. On 6/26/14, Sean Owen wrote: > You seem to have the binary for Had

Fine-grained mesos execution hangs on Debian 7.4

2014-06-26 Thread Fedechicco
Hello, as from object, when I run scala spark-shell on our mesos (0.19) cluster some spark slaves just hang at the end of the staging phase for any given elaboration. The cluster has mixed OSes (Ubuntu 14.04 / Debian 7.4), but if I run the same shell and commands using coarse grained mode everyth

Re: Hadoop interface vs class

2014-06-26 Thread Sean Owen
Yes it does. The idea is to override the dependency if needed. I thought you mentioned that you had built for Hadoop 2. On Jun 26, 2014 11:07 AM, "Robert James" wrote: > Yes. As far as I can tell, Spark seems to be including Hadoop 1 via > its transitive dependency: > http://mvnrepository.com/ar

Serialization of objects

2014-06-26 Thread Sameer Tilak
Hi everyone, Aaron, thanks for your help so far. I am trying to serialize objects that I instantiate from a 3rd party library namely instances of com.wcohen.ss.Jaccard, and com.wcohen.ss.BasicStringWrapper. However, I am having problems with serialization. I am (at least trying to) using Kryo fo

Re: Where Can I find the full documentation for Spark SQL?

2014-06-26 Thread Michael Armbrust
The programming guide is part of the standard documentation: http://spark.apache.org/docs/latest/sql-programming-guide.html Regarding specifics about SQL syntax and functions, I'd recommend using a HiveContext and the HQL method currently, as that is much more complete than the basic SQL parser pr

Re: Spark vs Google cloud dataflow

2014-06-26 Thread Nicholas Chammas
On Thu, Jun 26, 2014 at 10:15 AM, Aureliano Buendia wrote: > On a good day, it takes AWS spot instances 15 - 20 minutes to bring up a > 30 node cluster. This makes it non-efficient for computations which may > take only 10 - 15 minutes. I feel like there should be an issue or something to track

Re: About StorageLevel

2014-06-26 Thread Andrew Or
Hi Kang, You raise a good point. Spark does not automatically cache all your RDDs. Why? Simply because the application may create many RDDs, and not all of them are to be reused. After all, there is only so much memory available to each executor, and caching an RDD adds some overhead especially if

Re: Spark vs Google cloud dataflow

2014-06-26 Thread Michael Bach Bui
"The current problem with Spark is the big overhead and cost of bringing up a cluster. On a good day, it takes AWS spot instances 15 - 20 minutes to bring up a 30 node cluster. This makes it non-efficient for computations which may take only 10 - 15 minutes." Hmm, this is a misleading message.The

Spark Streaming RDD transformation

2014-06-26 Thread Bill Jay
Hi all, I am current working on a project that requires to transform each RDD in a DStream to a Map. Basically, when we get a list of data in each batch, we would like to update the global map. I would like to return the map as a single RDD. I am currently trying to use the function *transform*.

Re: Hadoop interface vs class

2014-06-26 Thread Robert James
On 6/26/14, Sean Owen wrote: > Yes it does. The idea is to override the dependency if needed. I thought > you mentioned that you had built for Hadoop 2. I'm very confused :-( I downloaded the Spark distro for Hadoop 2, and installed it on my machine. But the code doesn't have a reference to tha

Improving Spark multithreaded performance?

2014-06-26 Thread Kyle Ellrott
I'm working to set up a calculation that involves calling mllib's SVMWithSGD.train several thousand times on different permutations of the data. I'm trying to run the separate jobs using a threadpool to dispatch the different requests to a spark context connected a Mesos's cluster, using course sch

Running new code on a Spark Cluster

2014-06-26 Thread Pat Ferrel
I’ve created a CLI driver for a Spark version of a Mahout job called "item similarity" with several tests that all work fine on local[4] Spark standalone. The code even reads and writes to clustered HDFS. But switching to clustered Spark has a problem that seems tied to a broadcast and/or serial

Re: Fine-grained mesos execution hangs on Debian 7.4

2014-06-26 Thread Sébastien Rainville
Hello Federico, is it working with the 1.0 branch? In either branch, make sure that you have this commit: https://github.com/apache/spark/commit/1132e472eca1a00c2ce10d2f84e8f0e79a5193d3 I never saw the behavior you are describing, but that commit is important if you are running in fine-grained mod

Re: Spark Streaming RDD transformation

2014-06-26 Thread Sean Owen
If you want to transform an RDD to a Map, I assume you have an RDD of pairs. The method collectAsMap() creates a Map from the RDD in this case. Do you mean that you want to update a Map object using data in each RDD? You would use foreachRDD() in that case. Then you can use RDD.foreach to do somet

Re: Hadoop interface vs class

2014-06-26 Thread Sean Owen
On Thu, Jun 26, 2014 at 1:44 PM, Robert James wrote: > I downloaded the Spark distro for Hadoop 2, and installed it on my > machine. But the code doesn't have a reference to that path - it uses > sbt for dependencies. As far as I can tell, using sbt or maven or ivy > will always result in a tran

Re: Spark Streaming RDD transformation

2014-06-26 Thread Bill Jay
Thanks, Sean! I am currently using foreachRDD to update the global map using data in each RDD. The reason I want to return a map as RDD instead of just updating the map is that RDD provides many handy methods for output. For example, I want to save the global map into files in HDFS for each batch

Re: Spark vs Google cloud dataflow

2014-06-26 Thread Nicholas Chammas
On Thu, Jun 26, 2014 at 2:26 PM, Michael Bach Bui wrote: The overhead of bringing up a AWS Spark spot instances is NOT the > inherent problem of Spark. That’s technically true, but I’d be surprised if there wasn’t a lot of room for improvement in spark-ec2 regarding cluster launch+config times.

Re: Spark vs Google cloud dataflow

2014-06-26 Thread Aureliano Buendia
On Thu, Jun 26, 2014 at 9:42 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > > That’s technically true, but I’d be surprised if there wasn’t a lot of > room for improvement in spark-ec2 regarding cluster launch+config times. > Unfortunately, this is a spark support issue, but an AWS on

RE: Running new code on a Spark Cluster

2014-06-26 Thread Muttineni, Vinay
Hi Pat, Did you try accessing the broadcast variable value outside the Map? https://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/HashBiMap.html As per the document in the link above, it looks like HashBitMap can indeed be serialized. -Original Message- F

Spark job tracker.

2014-06-26 Thread abhiguruvayya
How to track map/reduce task in realtime. In hadoop map/reduce i am doing it by creating a job and printing the status of the running application in real time. Is there a similar way to do this in spark? Please let me know. -- View this message in context: http://apache-spark-user-list.1001560.

Spark-submit failing on cluster

2014-06-26 Thread ajatix
My setup --- I have a private cluster running on 4 nodes. I want to use the spark-submit script to execute spark applications on the cluster. I am using Mesos to manage the cluster. This is the command I ran on local mode, which ran successfully --- ./bin/spark-submit --master local --class org.

Re: Spark job tracker.

2014-06-26 Thread abhiguruvayya
I don't want to track it on the cluster UI. Once i launch the job i would to like to print the status. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-tracker-tp8367p8370.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Spark vs Google cloud dataflow

2014-06-26 Thread Nicholas Chammas
Hmm, I remember a discussion on here about how the way in which spark-ec2 rsyncs stuff to the cluster for setup could be improved, and I’m assuming there are other such improvements to be made. Perhaps those improvements don’t matter much when compared to EC2 instance launch times, but I’m not sure

Re: ElasticSearch enrich

2014-06-26 Thread boci
Thanks. I without local option I can connect with es remote, now I only have one problem. How can I use elasticsearch-hadoop with spark streaming? I mean DStream doesn't have "saveAsHadoopFiles" method, my second problem the output index is depend by the input data. Thanks ---

Re: Spark standalone network configuration problems

2014-06-26 Thread Shannon Quinn
My *best guess* (please correct me if I'm wrong) is that the master (machine1) is sending the command to the worker (machine2) with the localhost argument as-is; that is, machine2 isn't doing any weird address conversion on its end. Consequently, I've been focusing on the settings of the maste

Re: ElasticSearch enrich

2014-06-26 Thread Holden Karau
Hi b0c1, I have an example of how to do this in the repo for my talk as well, the specific example is at https://github.com/holdenk/elasticsearchspark/blob/master/src/main/scala/com/holdenkarau/esspark/IndexTweetsLive.scala . Since DStream doesn't have a saveAsHadoopDataset we use foreachRDD and t

SparkSQL- saveAsParquetFile

2014-06-26 Thread anthonyjschu...@gmail.com
Hi all: I am attempting to execute a simple test of the SparkSQL system capability of persisting to parquet files... My code is: val conf = new SparkConf() .setMaster( """local[1]""") .setAppName("test") implicit val sc = new SparkContext(conf) val sqlContext = new org.apache.spark

SparkSQL- Nested CaseClass Parquet failure

2014-06-26 Thread anthonyjschu...@gmail.com
Hello all: I am attempting to persist a parquet file comprised of a SchemaRDD of nested case classes... Creating a schemaRDD object seems to work fine, but exception is thrown when I attempt to persist this object to a parquet file... my code: case class Trivial(trivial: String = "trivial", l

Re: Spark-submit failing on cluster

2014-06-26 Thread ajatix
Rectified the issue by providing the executor uri location in the input ./bin/spark-submit --master mesos://:5050 --class org.apache.spark.examples.SparkPi --driver-java-options -Dspark.executor.uri=hdfs://:9000/new/spark-1.0.0-hadoop-2.4.0.tgz /opt/spark-examples-1.0.0-hadoop2.4.0.jar 10 I am st

Re: ElasticSearch enrich

2014-06-26 Thread boci
Wow, thanks your fast answer, it's help a lot... b0c1 -- Skype: boci13, Hangout: boci.b...@gmail.com On Thu, Jun 26, 2014 at 11:48 PM, Holden Karau wrote: > Hi b0c1,

Re: ElasticSearch enrich

2014-06-26 Thread Holden Karau
Just your luck I happened to be working on that very talk today :) Let me know how your experiences with Elasticsearch & Spark go :) On Thu, Jun 26, 2014 at 3:17 PM, boci wrote: > Wow, thanks your fast answer, it's help a lot... > > b0c1 > > > ---

Re: Running new code on a Spark Cluster

2014-06-26 Thread Pat Ferrel
No, what did you have in mind? I assumed they’d work from the docs and it does using local[4] but not sure if the broadcast does any actual serializing in that case. I certainly could be off base about my suspicions since I’m just learning to interpret Spark error messages. On Jun 26, 2014,

Re: SparkSQL- Nested CaseClass Parquet failure

2014-06-26 Thread Michael Armbrust
Nested parquet is not supported in 1.0, but is part of the upcoming 1.0.1 release. On Thu, Jun 26, 2014 at 3:03 PM, anthonyjschu...@gmail.com < anthonyjschu...@gmail.com> wrote: > Hello all: > I am attempting to persist a parquet file comprised of a SchemaRDD of > nested > case classes... > > Cr

Task progress in ipython?

2014-06-26 Thread Xu (Simon) Chen
I am pretty happy with using pyspark with ipython notebook. The only issue is that I need to look at the console output or spark ui to track task progress. I wonder if anyone thought of or better wrote something to display some progress bars on the same page when I evaluate a cell in ipynb? I know

Google Cloud Engine adds out of the box Spark/Shark support

2014-06-26 Thread Mayur Rustagi
https://groups.google.com/forum/#!topic/gcp-hadoop-announce/EfQms8tK5cE I suspect they are using thr own builds.. has anybody had a chance to look at it? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi

Re: Spark job tracker.

2014-06-26 Thread Mayur Rustagi
You can use SparkListener interface to track the tasks.. another is to use JSON patch (https://github.com/apache/spark/pull/882) & track tasks with json api Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi On Fri, Jun 27, 201

Re: SparkSQL- Nested CaseClass Parquet failure

2014-06-26 Thread anthonyjschu...@gmail.com
Thanks. That might be a good note to add to the official Programming Guide... On Thu, Jun 26, 2014 at 5:05 PM, Michael Armbrust [via Apache Spark User List] wrote: > Nested parquet is not supported in 1.0, but is part of the upcoming 1.0.1 > release. > > > On Thu, Jun 26, 2014 at 3:03 PM, [hidd

Re: Spark standalone network configuration problems

2014-06-26 Thread Shannon Quinn
In the interest of completeness, this is how I invoke spark: [on master] > sbin/start-all.sh > spark-submit --py-files extra.py main.py iPhone'd > On Jun 26, 2014, at 17:29, Shannon Quinn wrote: > > My *best guess* (please correct me if I'm wrong) is that the master > (machine1) is sending t

numpy + pyspark

2014-06-26 Thread Avishek Saha
Hi all, Instead of installing numpy in each worker node, is it possible to ship numpy (via --py-files option maybe) while invoking the spark-submit? Thanks, Avishek

Re: Improving Spark multithreaded performance?

2014-06-26 Thread Aaron Davidson
I don't have specific solutions for you, but the general things to try are: - Decrease task size by broadcasting any non-trivial objects. - Increase duration of tasks by making them less fine-grained. How many tasks are you sending? I've seen in the past something like 25 seconds for ~10k total m

Re: About StorageLevel

2014-06-26 Thread tomsheep...@gmail.com
Thank u Andrew, that's very helpful. I still have some doubts on a simple trial: I opened a spark shell in local mode, and typed in val r=sc.parallelize(0 to 50) val r2=r.keyBy(x=>x).groupByKey(10) and then I invoked the count action several times on it, r2.count (multiple times) The first

RE: About StorageLevel

2014-06-26 Thread Liu, Raymond
I think there is a shuffle stage involved. And the future count job will depends on the first job’s shuffle stages’s output data directly as long as it is still available. Thus it will be much faster. Best Regards, Raymond Liu From: tomsheep...@gmail.com [mailto:tomsheep...@gmail.com] Sent: Frid

RE: About StorageLevel

2014-06-26 Thread tomsheep...@gmail.com
Thanks Raymond! I was just reading the source code of ShuffledRDD, and found the the ShuffleFetcher, which wraps BlockManager, does the magic. The shuffled partitions will be stored in disk(?) just as what cacheManager does in a persist operation. Is that to say, whenever there is a shuffle stag

Re: LiveListenerBus throws exception and weird web UI bug

2014-06-26 Thread Pei-Lun Lee
Hi Baoxu, thanks for sharing. 2014-06-26 22:51 GMT+08:00 Baoxu Shi(Dash) : > Hi Pei-Lun, > > I have the same problem there. The Issue is SPARK-2228, there also someone > posted a pull request on that, but he only eliminate this exception but not > the side effects. > > I think the problem may du

RE: Prediction using Classification with text attributes in Apache Spark MLLib

2014-06-26 Thread lmk
Thanks Alexander, That gave me a clear idea of what I can look for in MLLib. Regards, lmk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Prediction-using-Classification-with-text-attributes-in-Apache-Spark-MLLib-tp8166p8395.html Sent from the Apache Spark

Re: Spark standalone network configuration problems

2014-06-26 Thread Akhil Das
Hi Shannon, How about a setting like the following? (just removed the quotes) export SPARK_MASTER_IP=192.168.1.101 export SPARK_MASTER_PORT=5060 #export SPARK_LOCAL_IP=127.0.0.1 Not sure whats happening in your case, it could be that your system is not able to bind to 192.168.1.101 address. What

Re: Spark standalone network configuration problems

2014-06-26 Thread sujeetv
Try to explicitly set set the "spark.driver.host" property to the master's IP. Sujeet -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-standalone-network-configuration-problems-tp8304p8396.html Sent from the Apache Spark User List mailing list archive

Re: JavaRDD.mapToPair throws NPE

2014-06-26 Thread Andrew Ash
I think this may be similar to https://issues.apache.org/jira/browse/SPARK-2292 so follow that ticket to see how it gets resolved. Andrew On Tue, Jun 24, 2014 at 5:44 PM, Mingyu Kim wrote: > Hi all, > > I’m trying to use JavaRDD.mapToPair(), but it fails with NPE on the > executor. The PairFun

Spark Streaming to capture packets from interface

2014-06-26 Thread swezzz
Hi.. I am new to Spark . Is it possible to capture live packets from a network interface through spark streaming? Is there a library or any built in classes to bind to the network interface directly? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Stre

org.jboss.netty.channel.ChannelException: Failed to bind to: master/1xx.xx..xx:0

2014-06-26 Thread MEETHU MATHEW
Hi all, My Spark(Standalone mode) was running fine till yesterday.But now I am getting   the following exeception when I am running start-slaves.sh or start-all.sh slave3: failed to launch org.apache.spark.deploy.worker.Worker: slave3:   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Thre

Map with filter on JavaRdd

2014-06-26 Thread ajay garg
Hi All, Is it possible to map and filter a javardd in a single operation? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Map-with-filter-on-JavaRdd-tp8401.html Sent from the Apache Spark User List mailing list archive at Nabble.com.