Hi, Why don't add a random regexp in regexp substitution, i.e. https://onlinerandomtools.com/generate-random-data-from-regexp
@*JB*Δ <http://jbigdata.fr/jbigdata/index.html> Le sam. 16 mars 2019 à 18:39, Mich Talebzadeh <mich.talebza...@gmail.com> a écrit : > Hi, > > I am looking at Description column of a bank statement (CSV download) that > has the following format > > scala> account_table.printSchema > root > |-- TransactionDate: date (nullable = true) > |-- TransactionType: string (nullable = true) > |-- Description: string (nullable = true) > |-- Value: double (nullable = true) > |-- Balance: double (nullable = true) > |-- AccountName: string (nullable = true) > |-- AccountNumber: string (nullable = true) > > The column description for BACS payments contains the name of the > individual who paid into the third party account. I need to mask the name > but cannot simply use a literal as below for all contents of descriptions > column! > > f1.withColumn("Description", lit("*** Masked > ***")).select('Description.as("Who paid") > > So I try the following combination > > f1.select(trim(substring(substring_index('Description, ",", > 1),2,50)).as("name in clear"), > reverse(regexp_replace(regexp_replace(regexp_replace(substring(regexp_replace('Description, > "^['A-Z]", "XX"),2,6),"[A-F]","X")," ","X"),"[,]","R")).as("Masked")).show > +------------------+------+ > | in clear|Masked| > +------------------+------+ > | FATAH SABAH|HXTXXX| > | C HIGGINSON|GIHXXX| > | SOLTA A|XTLOSX| > +------------------+------+ > > This seems to work as it not only masks the name but also makes it > consistent for all names (in other words, the same username gets the same > mask). > > Are there any better alternatives? > > Thanks > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > >