SparkSQL integration issue with AWS S3a

2015-12-30 Thread KOSTIANTYN Kudriavtsev
Dear Spark community, I faced the following issue with trying accessing data on S3a, my code is the following: val sparkConf = new SparkConf() val sc = new SparkContext(sparkConf) sc.hadoopConfiguration.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") sc.hadoopConfiguration.set("fs.s

SparkSQL integration issue with AWS S3a

2015-12-30 Thread KOSTIANTYN Kudriavtsev
Dear Spark community, I faced the following issue with trying accessing data on S3a, my code is the following: val sparkConf = new SparkConf() val sc = new SparkContext(sparkConf) sc.hadoopConfiguration.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") sc.hadoopConfiguration.set("fs.s

Re: SparkSQL integration issue with AWS S3a

2015-12-30 Thread KOSTIANTYN Kudriavtsev
Hi Blaz, I did, the same result Thank you, Konstantin Kudryavtsev On Wed, Dec 30, 2015 at 12:54 PM, Blaž Šnuderl wrote: > Try setting s3 credentials using keys specified here > https://github.com/Aloisius/hadoop-s3a/blob/master/README.md > > Blaz > On Dec 30, 2015 6:48

Re: SparkSQL integration issue with AWS S3a

2015-12-30 Thread KOSTIANTYN Kudriavtsev
all the Executor JVMs > on each Worker? > > On Dec 30, 2015, at 12:45 PM, KOSTIANTYN Kudriavtsev < > kudryavtsev.konstan...@gmail.com> wrote: > > Dear Spark community, > > I faced the following issue with trying accessing data on S3a, my code is > the following: >

Re: SparkSQL integration issue with AWS S3a

2015-12-30 Thread KOSTIANTYN Kudriavtsev
EC2 instances in the cluster - and handles autoscaling > very well - and at some point, you will want to autoscale. > > On Wed, Dec 30, 2015 at 1:08 PM, KOSTIANTYN Kudriavtsev < > kudryavtsev.konstan...@gmail.com> wrote: > >> Chris, >> >> good question, as you

Re: SparkSQL integration issue with AWS S3a

2015-12-30 Thread KOSTIANTYN Kudriavtsev
; > Can you define those properties in hdfs-site.xml and make sure it is > visible in the class path when you spark-submit? It looks like a conf > sourcing issue to me. > > Cheers, > > Sent from my iPhone > > On 30 Dec, 2015, at 1:59 pm, KOSTIANTYN Kudriavtsev <

Re: SparkSQL integration issue with AWS S3a

2015-12-31 Thread KOSTIANTYN Kudriavtsev
s, > > Jerry > > Sent from my iPhone > > On 30 Dec, 2015, at 2:31 pm, KOSTIANTYN Kudriavtsev < > kudryavtsev.konstan...@gmail.com> wrote: > > Hi Jerry, > > I want to run different jobs on different S3 buckets - different AWS creds > - on the same instance

pass custom spark-conf

2015-12-31 Thread KOSTIANTYN Kudriavtsev
Hi all, I'm trying to use different spark-default.conf per user, i.e. I want to have spark-user1.conf and etc. Is it a way to pass a path to appropriate conf file when I'm using standalone spark installation? Also, is it possible to configure different hdfs-site.xml and pass it as well with spark-

Re: SparkSQL integration issue with AWS S3a

2015-12-31 Thread KOSTIANTYN Kudriavtsev
e > > On 30 Dec, 2015, at 2:31 pm, KOSTIANTYN Kudriavtsev < > kudryavtsev.konstan...@gmail.com> wrote: > > Hi Jerry, > > I want to run different jobs on different S3 buckets - different AWS creds > - on the same instances. Could you shed some light if it's possible

Re: pass custom spark-conf

2015-12-31 Thread KOSTIANTYN Kudriavtsev
rameters do you plan to change in hdfs-site.xml ? > If the parameter only affects hdfs NN / DN, passing hdfs-site.xml > wouldn't take effect, right ? > > Cheers > > On Thu, Dec 31, 2015 at 10:48 AM, KOSTIANTYN Kudriavtsev < > kudryavtsev.konstan...@gmail.com> wrote:

Re: SparkSQL integration issue with AWS S3a

2016-01-02 Thread KOSTIANTYN Kudriavtsev
t user using different spark.conf via > --properties-file when spark-submit > > HTH, > > Jerry > > Sent from my iPhone > > On 31 Dec, 2015, at 2:06 pm, KOSTIANTYN Kudriavtsev < > kudryavtsev.konstan...@gmail.com> wrote: > > Hi Jerry, > > what you su

Re: SparkSQL integration issue with AWS S3a

2016-01-06 Thread Kostiantyn Kudriavtsev
Hi guys, the only one big issue with this approach: > spark.hadoop.s3a.access.key is now visible everywhere, in logs, in spark > webui and is not secured at all... On Jan 2, 2016, at 11:13 AM, KOSTIANTYN Kudriavtsev wrote: > thanks Jerry, it works! > really appreciate your help

Re: spark ui security

2016-01-07 Thread Kostiantyn Kudriavtsev
ilter.params="type=kerberos,kerberos.principal=HTTP/mybox@MYDOMAIN,kerberos.keytab=/some/keytab" > > > > > On Thu, Jan 7, 2016 at 10:35 AM, Kostiantyn Kudriavtsev > wrote: > I’m afraid I missed where this property must be specified? I added it to > spark-xxx

Re: spark ui security

2016-01-07 Thread Kostiantyn Kudriavtsev
I know, but I need only to hide/protect web ui at least with servlet/filter api On Jan 7, 2016, at 4:59 PM, Ted Yu wrote: > Without kerberos you don't have true security. > > Cheers > > On Thu, Jan 7, 2016 at 1:56 PM, Kostiantyn Kudriavtsev > wrote: > can I

Re: Run spark unit test on Windows 7

2014-07-02 Thread Kostiantyn Kudriavtsev
No, I don’t why do I need to have HDP installed? I don’t use Hadoop at all and I’d like to read data from local filesystem On Jul 2, 2014, at 9:10 PM, Denny Lee wrote: > By any chance do you have HDP 2.1 installed? you may need to install the > utils and update the env variables per > http:/

Re: Run spark unit test on Windows 7

2014-07-03 Thread Kostiantyn Kudriavtsev
ou don't actually need it per se - its just that some of the Spark >> libraries are referencing Hadoop libraries even if they ultimately do not >> call them. When I was doing some early builds of Spark on Windows, I >> admittedly had Hadoop on Windows running as well and

Spark logging strategy on YARN

2014-07-03 Thread Kostiantyn Kudriavtsev
Hi all, Could you please share your the best practices on writing logs in Spark? I’m running it on YARN, so when I check logs I’m bit confused… Currently, I’m writing System.err.println to put a message in log and access it via YARN history server. But, I don’t like this way… I’d like to use l

Re: Starting with spark

2014-07-24 Thread Kostiantyn Kudriavtsev
Hi Sam, I tried Spark on Cloudera a couple month age, any there were a lot of issues… Fortunately, I was able to switch to Hortonworks and exerting works perfect. In general, you can try two mode: standalone and via YARN. Personally, I found using Spark via YARN more comfortable special for adm

Re: Unit Testing (JUnit) with Spark

2014-07-29 Thread Kostiantyn Kudriavtsev
Hi, try this one http://simpletoad.blogspot.com/2014/07/runing-spark-unit-test-on-windows-7.html it’s more about fixing windows-specific issue, but code snippet gives general idea just run etl and check output w/ Assert(s) On Jul 29, 2014, at 6:29 PM, soumick86 wrote: > Is there any example

Spark output compression on HDFS

2014-04-02 Thread Kostiantyn Kudriavtsev
Hi there, I've started using Spark recently and evaluating possible use cases in our company. I'm trying to save RDD as compressed Sequence file. I'm able to save non-compressed file be calling: counts.saveAsSequenceFile(output) where counts is my RDD (IntWritable, Text). However, I didn't

Re: using saveAsNewAPIHadoopFile with OrcOutputFormat

2014-04-16 Thread Kostiantyn Kudriavtsev
I’d prefer to find good example of using saveAsNewAPIHadoopFile with different OutputFormat implementations (not only orc, but EsOutputFormat, etc). Any common example On Apr 16, 2014, at 4:51 PM, Brock Bose wrote: > Howdy all, > I recently saw that the OrcInputFormat/OutputFormat's have