Most enterprise databases provide Data Encryption of some form. For
example Introduction
to Transparent Data Encryption (oracle.com)
<https://docs.oracle.com/database/121/ASOAG/introduction-to-transparent-data-encryption.htm#ASOAG10272>

As far as I know Hive supports text and sequence file column
level encryption that in turn rely on hdfs data encryption. see here
<https://support.huawei.com/enterprise/en/doc/EDOC1100020163/742cbdb6/using-the-hive-column-encryption-function#:~:text=Hive%20supports%20encryption%20of%20one,the%20related%20columns%20are%20encrypted.>


In general this seems to be left  to the underlying storage. Most customers
rely on tools like Protegrity  <https://www.protegrity.com/>tokenization
solutions  <https://www.protegrity.com/> before data is stored in data
warehouse like Hive or Cloud databases etc

There should be no reason for Spark not to support it at least in simplest
form. For example within PySpark one can create the table explicitly on
Hive trying to encrypt columns ID and CLUSTERED below

sqltext  = ""
if (spark.sql("SHOW TABLES IN test like 'randomDataPy'").count() == 1):
  rows = spark.sql(f"""SELECT COUNT(1) FROM
{fullyQualifiedTableName}""").collect()[0][0]
  print ("number of rows is ",rows)
else:
  print("\nTable test.randomDataPy does not exist, creating table ")
  sqltext = """
     CREATE TABLE test.randomDataPy(
       ID INT
     , CLUSTERED INT
     , SCATTERED INT
     , RANDOMISED INT
     , RANDOM_STRING VARCHAR(50)
     , SMALL_VC VARCHAR(50)
     , PADDING  VARCHAR(4000)
    )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
       WITH SERDEPROPERTIES ('column.encode.columns'='ID, CLUSTERED',
'column.encode.classname'='org.apache.hadoop.hive.serde2.AESRewriter')
       STORED AS TEXTFILE
    """
  spark.sql(sqltext)

Disclaimer: I have not tried it myself  but worth trying to see if it works.

HTH


LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*





*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 21 Jan 2021 at 11:44, Jacek Laskowski <[email protected]> wrote:

> Hi,
>
> Never heard of it (and have once been tasked to explore a similar use
> case). I'm curious how you'd like it to work? (no idea how Hive does this
> either)
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books <https://books.japila.pl/>
> Follow me on https://twitter.com/jaceklaskowski
>
> <https://twitter.com/jaceklaskowski>
>
>
> On Sat, Dec 19, 2020 at 2:38 AM john washington <[email protected]>
> wrote:
>
>> Dear Spark team members,
>>
>> Can you please advise if Column-level encryption is available in Spark
>> SQL?
>> I am aware that HIVE supports column level encryption.
>>
>> Appreciate your response.
>>
>> Thanks,
>> John
>>
>

Reply via email to