Dear Spark community,
I faced the following issue with trying accessing data on S3a, my code is
the following:
val sparkConf = new SparkConf()
val sc = new SparkContext(sparkConf)
sc.hadoopConfiguration.set("fs.s3a.impl",
"org.apache.hadoop.fs.s3a.S3AFileSystem")
sc.hadoopConfiguration.set("fs.s
Dear Spark community,
I faced the following issue with trying accessing data on S3a, my code is
the following:
val sparkConf = new SparkConf()
val sc = new SparkContext(sparkConf)
sc.hadoopConfiguration.set("fs.s3a.impl",
"org.apache.hadoop.fs.s3a.S3AFileSystem")
sc.hadoopConfiguration.set("fs.s
Hi Blaz,
I did, the same result
Thank you,
Konstantin Kudryavtsev
On Wed, Dec 30, 2015 at 12:54 PM, Blaž Šnuderl wrote:
> Try setting s3 credentials using keys specified here
> https://github.com/Aloisius/hadoop-s3a/blob/master/README.md
>
> Blaz
> On Dec 30, 2015 6:48
all the Executor JVMs
> on each Worker?
>
> On Dec 30, 2015, at 12:45 PM, KOSTIANTYN Kudriavtsev <
> kudryavtsev.konstan...@gmail.com> wrote:
>
> Dear Spark community,
>
> I faced the following issue with trying accessing data on S3a, my code is
> the following:
>
EC2 instances in the cluster - and handles autoscaling
> very well - and at some point, you will want to autoscale.
>
> On Wed, Dec 30, 2015 at 1:08 PM, KOSTIANTYN Kudriavtsev <
> kudryavtsev.konstan...@gmail.com> wrote:
>
>> Chris,
>>
>> good question, as you
;
> Can you define those properties in hdfs-site.xml and make sure it is
> visible in the class path when you spark-submit? It looks like a conf
> sourcing issue to me.
>
> Cheers,
>
> Sent from my iPhone
>
> On 30 Dec, 2015, at 1:59 pm, KOSTIANTYN Kudriavtsev <
s,
>
> Jerry
>
> Sent from my iPhone
>
> On 30 Dec, 2015, at 2:31 pm, KOSTIANTYN Kudriavtsev <
> kudryavtsev.konstan...@gmail.com> wrote:
>
> Hi Jerry,
>
> I want to run different jobs on different S3 buckets - different AWS creds
> - on the same instance
Hi all,
I'm trying to use different spark-default.conf per user, i.e. I want to
have spark-user1.conf and etc. Is it a way to pass a path to appropriate
conf file when I'm using standalone spark installation?
Also, is it possible to configure different hdfs-site.xml and pass it as
well with spark-
e
>
> On 30 Dec, 2015, at 2:31 pm, KOSTIANTYN Kudriavtsev <
> kudryavtsev.konstan...@gmail.com> wrote:
>
> Hi Jerry,
>
> I want to run different jobs on different S3 buckets - different AWS creds
> - on the same instances. Could you shed some light if it's possible
rameters do you plan to change in hdfs-site.xml ?
> If the parameter only affects hdfs NN / DN, passing hdfs-site.xml
> wouldn't take effect, right ?
>
> Cheers
>
> On Thu, Dec 31, 2015 at 10:48 AM, KOSTIANTYN Kudriavtsev <
> kudryavtsev.konstan...@gmail.com> wrote:
t user using different spark.conf via
> --properties-file when spark-submit
>
> HTH,
>
> Jerry
>
> Sent from my iPhone
>
> On 31 Dec, 2015, at 2:06 pm, KOSTIANTYN Kudriavtsev <
> kudryavtsev.konstan...@gmail.com> wrote:
>
> Hi Jerry,
>
> what you su
Hi guys,
the only one big issue with this approach:
> spark.hadoop.s3a.access.key is now visible everywhere, in logs, in spark
> webui and is not secured at all...
On Jan 2, 2016, at 11:13 AM, KOSTIANTYN Kudriavtsev
wrote:
> thanks Jerry, it works!
> really appreciate your help
ilter.params="type=kerberos,kerberos.principal=HTTP/mybox@MYDOMAIN,kerberos.keytab=/some/keytab"
>
>
>
>
> On Thu, Jan 7, 2016 at 10:35 AM, Kostiantyn Kudriavtsev
> wrote:
> I’m afraid I missed where this property must be specified? I added it to
> spark-xxx
I know, but I need only to hide/protect web ui at least with servlet/filter api
On Jan 7, 2016, at 4:59 PM, Ted Yu wrote:
> Without kerberos you don't have true security.
>
> Cheers
>
> On Thu, Jan 7, 2016 at 1:56 PM, Kostiantyn Kudriavtsev
> wrote:
> can I
No, I don’t
why do I need to have HDP installed? I don’t use Hadoop at all and I’d like to
read data from local filesystem
On Jul 2, 2014, at 9:10 PM, Denny Lee wrote:
> By any chance do you have HDP 2.1 installed? you may need to install the
> utils and update the env variables per
> http:/
ou don't actually need it per se - its just that some of the Spark
>> libraries are referencing Hadoop libraries even if they ultimately do not
>> call them. When I was doing some early builds of Spark on Windows, I
>> admittedly had Hadoop on Windows running as well and
Hi all,
Could you please share your the best practices on writing logs in Spark? I’m
running it on YARN, so when I check logs I’m bit confused…
Currently, I’m writing System.err.println to put a message in log and access it
via YARN history server. But, I don’t like this way… I’d like to use
l
Hi Sam,
I tried Spark on Cloudera a couple month age, any there were a lot of issues…
Fortunately, I was able to switch to Hortonworks and exerting works perfect. In
general, you can try two mode: standalone and via YARN. Personally, I found
using Spark via YARN more comfortable special for adm
Hi,
try this one
http://simpletoad.blogspot.com/2014/07/runing-spark-unit-test-on-windows-7.html
it’s more about fixing windows-specific issue, but code snippet gives general
idea
just run etl and check output w/ Assert(s)
On Jul 29, 2014, at 6:29 PM, soumick86 wrote:
> Is there any example
Hi there,
I've started using Spark recently and evaluating possible use cases in our
company.
I'm trying to save RDD as compressed Sequence file. I'm able to save
non-compressed file be calling:
counts.saveAsSequenceFile(output)
where counts is my RDD (IntWritable, Text). However, I didn't
I’d prefer to find good example of using saveAsNewAPIHadoopFile with different
OutputFormat implementations (not only orc, but EsOutputFormat, etc). Any
common example
On Apr 16, 2014, at 4:51 PM, Brock Bose wrote:
> Howdy all,
> I recently saw that the OrcInputFormat/OutputFormat's have
21 matches
Mail list logo