Try setting spark.driver.host to the actual ip or hostname of the box
submitting the work. More info the networking section in this link:
http://spark.apache.org/docs/latest/configuration.html
Also check the spark config for your application for these driver settings in
the application web UI a
Make sure you are setting num executors correctly
M
> On Jul 17, 2015, at 9:16 PM, Charles Menguy wrote:
>
> I am trying to use PySpark on EMR to analyze some data stored as
> SequenceFiles on S3, but running into performance issues due to data
> locality. Here is a very simple sample that
I've been querying Zookeeper directly via the Zookeeper client tools, it has
the ip of the current master leader in the master_status data. We are also
running Exhibitor for zookeeper which has a nice UI for exploring if you want
to look up manually
Thanks,
Michal
> On May 12, 2015, at 1:28
According to the docs it should go like this:
spark://host1:port1,host2:port2
https://spark.apache.org/docs/latest/spark-standalone.html#standby-masters-with-zookeeper
Thanks
M
> On Apr 28, 2015, at 8:13 AM, James King wrote:
>
> I have multiple masters running and I'm trying to submit an ap
Not sure if there's a spark native way but we've been using consul for this.
M
> On Apr 26, 2015, at 5:17 AM, James King wrote:
>
> Thanks for the response.
>
> But no this does not answer the question.
>
> The question was: Is there a way (via some API call) to query the number and
> type
A SparkContext can submit jobs remotely.
The spark-submit options in general can be populated into a SparkConf and
passed in when you create a SparkContext.
We personally have not had too much success with yarn-client remote submission,
but standalone cluster mode was easy to get going.
M
>
> I assume it's not viable to throw the query results into another table in
> your database and then query that using the normal approach?
>
> --eric
>
>> On 3/1/15 4:28 AM, michal.klo...@gmail.com wrote:
>> Jorn: Vertica
>>
>> Cody: I posited th
Jorn: Vertica
Cody: I posited the limit just as an example of how jdbcrdd could be used least
invasively. Let's say we used a partition on a time field -- we would still
need to have N executions of those queries. The queries we have are very
intense and concurrency is an issue even if the the