Re: Reading SequenceFiles from S3 with PySpark on EMR causes RACK_LOCAL locality

2015-07-17 Thread michal.klo...@gmail.com
Make sure you are setting num executors correctly M > On Jul 17, 2015, at 9:16 PM, Charles Menguy wrote: > > I am trying to use PySpark on EMR to analyze some data stored as > SequenceFiles on S3, but running into performance issues due to data > locality. Here is a very simple sample that

Re: Spark 1.5.1 standalone cluster - wrong Akka remoting config?

2015-10-08 Thread michal.klo...@gmail.com
Try setting spark.driver.host to the actual ip or hostname of the box submitting the work. More info the networking section in this link: http://spark.apache.org/docs/latest/configuration.html Also check the spark config for your application for these driver settings in the application web UI a

Re: Scalable JDBCRDD

2015-03-01 Thread michal.klo...@gmail.com
Jorn: Vertica Cody: I posited the limit just as an example of how jdbcrdd could be used least invasively. Let's say we used a partition on a time field -- we would still need to have N executions of those queries. The queries we have are very intense and concurrency is an issue even if the the

Re: Scalable JDBCRDD

2015-03-01 Thread michal.klo...@gmail.com
> I assume it's not viable to throw the query results into another table in > your database and then query that using the normal approach? > > --eric > >> On 3/1/15 4:28 AM, michal.klo...@gmail.com wrote: >> Jorn: Vertica >> >> Cody: I posited th

Re: Job submission API

2015-04-07 Thread michal.klo...@gmail.com
A SparkContext can submit jobs remotely. The spark-submit options in general can be populated into a SparkConf and passed in when you create a SparkContext. We personally have not had too much success with yarn-client remote submission, but standalone cluster mode was easy to get going. M >

Re: Querying Cluster State

2015-04-26 Thread michal.klo...@gmail.com
Not sure if there's a spark native way but we've been using consul for this. M > On Apr 26, 2015, at 5:17 AM, James King wrote: > > Thanks for the response. > > But no this does not answer the question. > > The question was: Is there a way (via some API call) to query the number and > type

Re: submitting to multiple masters

2015-04-28 Thread michal.klo...@gmail.com
According to the docs it should go like this: spark://host1:port1,host2:port2 https://spark.apache.org/docs/latest/spark-standalone.html#standby-masters-with-zookeeper Thanks M > On Apr 28, 2015, at 8:13 AM, James King wrote: > > I have multiple masters running and I'm trying to submit an ap

Re: How to get Master UI with ZooKeeper HA setup?

2015-05-12 Thread michal.klo...@gmail.com
I've been querying Zookeeper directly via the Zookeeper client tools, it has the ip of the current master leader in the master_status data. We are also running Exhibitor for zookeeper which has a nice UI for exploring if you want to look up manually Thanks, Michal > On May 12, 2015, at 1:28