Hello,

Just finished reading the Hive-Architecture pdf, and failed to find the
answers I was hoping for. So here I am, hoping this community will shed some
light.

I think I know what the answers will be, I need that bolted down and
secured.

 

We are concerned on how data is transferred between data-nodes and hive,
especially when it comes to clusters were there's no SSL between nodes.

 

And this is the user-case:

1.       Table employee is a Hive table, with SerDe

2.       MapReduce job accesses the table Employees which holds Encrypted
data

3.       SerDe decrypts the data

4.       Post-SerDe output is returned to the MapReduce job and saved to a
new Hive table using a new Encryption implementation

 

The flow, as I think it currently is should be:

MapReduce Job -- > Read table metadata -- > SerDe creates map-reduce job --
> distributes across nodes

 

Which means that data is decrypted on the local nodes and then sent in
clear-text back to the original map-reduce job to be saved in a new table.

Is that correct? :(

 

Reply via email to