Digging up this thread to ask a follow-up question:

What is the intended use for /root/spark/conf/core-site.xml?

It seems that both /root/spark/bin/pyspark and /root/
ephemeral-hdfs/bin/hadoop point to /root/ephemeral-hdfs/conf/core-site.xml.
If I specify S3 access keys in spark/conf, Spark doesn't seem to pick them
up.

Nick


On Fri, Mar 7, 2014 at 4:10 PM, Nicholas Chammas <nicholas.cham...@gmail.com
> wrote:

> Mayur,
>
> So looking at the section on environment variables 
> here<http://spark.incubator.apache.org/docs/latest/configuration.html#environment-variables>,
> are you saying to set these options via SPARK_JAVA_OPTS -D? On a related
> note, in looking around I just discovered this command line tool for
> modifying XML files called 
> XMLStarlet<http://xmlstar.sourceforge.net/overview.php>.
> Perhaps I should instead set these S3 keys directly in the right
> core-site.xml using XMLStarlet.
>
> Devs/Everyone,
>
> On a related note, I discovered that Spark (on EC2) reads Hadoop options
> from /root/ephemeral-hdfs/conf/core-site.xml.
>
> This is surprising given the variety of copies of core-site.xml on the EC2
> cluster that gets built by spark-ec2. A quick search yields the following
> relevant results (snipped):
>
> find / -name core-site.xml 2> /dev/null
>
> /root/mapreduce/conf/core-site.xml
> /root/persistent-hdfs/conf/core-site.xml
> /root/ephemeral-hdfs/conf/core-site.xml
> /root/spark/conf/core-site.xml
>
>
> It looks like both pyspark and ephemeral-hdfs/bin/hadoop read configs from
> the ephemeral-hdfs core-site.xml file. The latter is expected; the former
> is not. Is this intended behavior?
>
> I expected pyspark to read configs from the spark core-site.xml file. The
> moment I remove my AWS credentials from the ephemeral-hdfs config file,
> pyspark cannot open files in S3 without me providing the credentials
> in-line.
>
> I also guessed that the config file under /root/mapreduce might be a kind
> of base config file that both Spark and Hadoop would read from first, and
> then override with configs from the other files. The path to the config
> suggests that, but it doesn't appear to be the case. Adding my AWS keys to
> that file seemed to affect neither Spark nor ephemeral-hdfs/bin/hadoop.
>
> Nick
>
>
> On Fri, Mar 7, 2014 at 2:07 PM, Mayur Rustagi <mayur.rust...@gmail.com>wrote:
>
>> Set them as environment variable at boot & configure both stacks to call
>> on that..
>>
>> Mayur Rustagi
>> Ph: +1 (760) 203 3257
>> http://www.sigmoidanalytics.com
>>  @mayur_rustagi <https://twitter.com/mayur_rustagi>
>>
>>
>>
>> On Fri, Mar 7, 2014 at 9:32 AM, Nicholas Chammas <
>> nicholas.cham...@gmail.com> wrote:
>>
>>> On spinning up a Spark cluster in EC2, I'd like to set a few configs
>>> that will allow me to access files in S3 without having to specify my AWS
>>> access and secret keys over and over, as described 
>>> here<http://stackoverflow.com/a/3033403/877069>
>>> .
>>>
>>> The properties are fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey.
>>>
>>> Is there a way to set these properties programmatically so that Spark
>>> (via the shell) and Hadoop (via distcp) are both aware of and use the
>>> values?
>>>
>>> I don't think SparkConf does what I need because I want Hadoop to also
>>> be aware of my AWS keys. When I set those properties using conf.set() in
>>> pyspark, distcp didn't appear to be aware of them.
>>>
>>> Nick
>>>
>>>
>>> ------------------------------
>>> View this message in context: Setting properties in core-site.xml for
>>> Spark and Hadoop to 
>>> access<http://apache-spark-user-list.1001560.n3.nabble.com/Setting-properties-in-core-site-xml-for-Spark-and-Hadoop-to-access-tp2402.html>
>>> Sent from the Apache Spark User List mailing list 
>>> archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com.
>>>
>>
>>
>

Reply via email to