Re: Spark3 on k8S reading encrypted data from HDFS with KMS in HA

2020-08-19 Thread Michel Sumbul
7;m not the only one using Spark with HDFS encrypted with KMS :-) >> >> Thanks, >> Michel >> >> Le jeu. 13 août 2020 à 14:32, Michel Sumbul a >> écrit : >> >>> Hi guys, >>> >>> Does anyone try Spark3 on k8s reading data from H

Re: Spark3 on k8S reading encrypted data from HDFS with KMS in HA

2020-08-19 Thread Prashant Sharma
only one using Spark with HDFS encrypted with KMS :-) > > Thanks, > Michel > > Le jeu. 13 août 2020 à 14:32, Michel Sumbul a > écrit : > >> Hi guys, >> >> Does anyone try Spark3 on k8s reading data from HDFS encrypted with KMS >> in HA mode (with kerb

Re: Spark3 on k8S reading encrypted data from HDFS with KMS in HA

2020-08-15 Thread Michel Sumbul
d with KMS :-) Thanks, Michel Le jeu. 13 août 2020 à 14:32, Michel Sumbul a écrit : > Hi guys, > > Does anyone try Spark3 on k8s reading data from HDFS encrypted with KMS in > HA mode (with kerberos)? > > I have a wordcount job running with Spark3 reading data on HDFS (hadoop &

Spark3 on k8S reading encrypted data from HDFS with KMS in HA

2020-08-13 Thread Michel Sumbul
Hi guys, Does anyone try Spark3 on k8s reading data from HDFS encrypted with KMS in HA mode (with kerberos)? I have a wordcount job running with Spark3 reading data on HDFS (hadoop 3.1) everything secure with kerberos. Everything works fine if the data folder is not encrypted (spark on k8s). If

Data from HDFS

2018-04-22 Thread Zois Theodoros
Hello, I am reading data from HDFS in a Spark application and as far as I read each HDFS block is 1 partition for Spark by default. Is there any way to select only 1 block from HDFS to read in my Spark application? Thank you, Thodoris

Re: Spark loads data from HDFS or S3

2017-12-13 Thread Jörn Franke
ote: > > Hi​ > > I have a few of questions about a structure of HDFS and S3 when Spark-like > loads data from two storage. > > Generally, when Spark loads data from HDFS, HDFS supports data locality and > already own distributed file on datanodes, right? Spark could just

Re: Spark loads data from HDFS or S3

2017-12-13 Thread Sebastian Nagel
> ​ > > > I have a few of questions about a structure of HDFS and S3 when Spark-like > loads data from two storage. > > > Generally, when Spark loads data from HDFS, HDFS supports data locality and > already own distributed > file on datanodes, right? Spark could jus

Spark loads data from HDFS or S3

2017-12-13 Thread Philip Lee
Hi ​ I have a few of questions about a structure of HDFS and S3 when Spark-like loads data from two storage. Generally, when Spark loads data from HDFS, HDFS supports data locality and already own distributed file on datanodes, right? Spark could just process data on workers. What about S3

Re: Can Spark read input data from HDFS centralized cache?

2016-01-25 Thread Ted Yu
Please see also: http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html According to Chris Nauroth, an hdfs committer, it's extremely difficult to use the feature correctly. The feature also brings operational complexity. Since off-heap memory is used

Re: Can Spark read input data from HDFS centralized cache?

2016-01-25 Thread Ted Yu
Have you read this thread ? http://search-hadoop.com/m/uOzYttXZcg1M6oKf2/HDFS+cache&subj=RE+hadoop+hdfs+cache+question+do+client+processes+share+cache+ Cheers On Mon, Jan 25, 2016 at 1:23 PM, Jia Zou wrote: > I configured HDFS to cache file in HDFS's cache, like following: > > hdfs cacheadmin

Can Spark read input data from HDFS centralized cache?

2016-01-25 Thread Jia Zou
I configured HDFS to cache file in HDFS's cache, like following: hdfs cacheadmin -addPool hibench hdfs cacheadmin -addDirective -path /HiBench/Kmeans/Input -pool hibench But I didn't see much performance impacts, no matter how I configure dfs.datanode.max.locked.memory Is it possible that Spa

Re: How to load partial data from HDFS using Spark SQL

2016-01-02 Thread swetha kasireddy
//filtered data frame > df.count > > On Sat, Jan 2, 2016 at 11:56 AM, SRK wrote: > >> Hi, >> >> How to load partial data from hdfs using Spark SQL? Suppose I want to load >> data based on a filter like >> >> "Select * from table where id = "

Re: How to load partial data from HDFS using Spark SQL

2016-01-01 Thread UMESH CHAUDHARY
Ok, so whats wrong in using : var df=HiveContext.sql("Select * from table where id = ") //filtered data frame df.count On Sat, Jan 2, 2016 at 11:56 AM, SRK wrote: > Hi, > > How to load partial data from hdfs using Spark SQL? Suppose I want to load > data based on a f

How to load partial data from HDFS using Spark SQL

2016-01-01 Thread SRK
Hi, How to load partial data from hdfs using Spark SQL? Suppose I want to load data based on a filter like "Select * from table where id = " using Spark SQL with DataFrames, how can that be done? The idea here is that I do not want to load the whole data into memory when I use the

Re: ClassCastException while reading data from HDFS through Spark

2015-10-07 Thread UMESH CHAUDHARY
reading data from HDFS through Spark. It throws > *java.lang.ClassCastException: > org.apache.hadoop.io.LongWritable cannot be cast to > org.apache.hadoop.io.BytesWritable* at line no 6. I never used > LongWritable in my code, no idea how the data was in that format. > > Note : I

ClassCastException while reading data from HDFS through Spark

2015-10-07 Thread Vinoth Sankar
I'm just reading data from HDFS through Spark. It throws *java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.BytesWritable* at line no 6. I never used LongWritable in my code, no idea how the data was in that format. Note : I'm

Re: SparkSQL: Reading data from hdfs and storing into multiple paths

2015-10-02 Thread Michael Armbrust
Once you convert your data to a dataframe (look at spark-csv), try df.write.partitionBy("", "mm").save("..."). On Thu, Oct 1, 2015 at 4:11 PM, haridass saisriram < haridass.saisri...@gmail.com> wrote: > Hi, > > I am trying to find a simple example to read a data file on HDFS. The > file has

SparkSQL: Reading data from hdfs and storing into multiple paths

2015-10-01 Thread haridass saisriram
Hi, I am trying to find a simple example to read a data file on HDFS. The file has the following format a , b , c ,,mm a1,b1,c1,2015,09 a2,b2,c2,2014,08 I would like to read this file and store it in HDFS partitioned by year and month. Something like this /path/to/hdfs//mm I want to