Thank you Alan. -----Original Message----- From: Alan Gates [mailto:ga...@hortonworks.com] Sent: Thursday, July 18, 2013 5:45 PM To: user@hive.apache.org Subject: Re: Hive Architecture - Execution on nodes
On Jul 18, 2013, at 1:40 PM, Tzur Turkenitz wrote: > Hello, > Just finished reading the Hive-Architecture pdf, and failed to find the answers I was hoping for. So here I am, hoping this community will shed some light. > I think I know what the answers will be, I need that bolted down and secured. > > We are concerned on how data is transferred between data-nodes and hive, especially when it comes to clusters were there's no SSL between nodes. > > And this is the user-case: > 1. Table employee is a Hive table, with SerDe > 2. MapReduce job accesses the table Employees which holds Encrypted data > 3. SerDe decrypts the data > 4. Post-SerDe output is returned to the MapReduce job and saved to a new Hive table using a new Encryption implementation > > The flow, as I think it currently is should be: > MapReduce Job -- > Read table metadata -- > SerDe creates map-reduce job -- > distributes across nodes > > Which means that data is decrypted on the local nodes and then sent in clear-text back to the original map-reduce job to be saved in a new table. > Is that correct? L No. Data deserialization (which is what a serde does, not decryption) is done as part of reading in the map reduce job. Mainly only query parsing, validation, and planning is done on the client node. Alan. >