[ 
https://issues.apache.org/jira/browse/SPARK-53154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18063658#comment-18063658
 ] 

Holden Karau commented on SPARK-53154:
--------------------------------------

This could be good, we'd want to impl it on the Scala side though.

> Add HMAC to pyspark.sql.functions
> ---------------------------------
>
>                 Key: SPARK-53154
>                 URL: https://issues.apache.org/jira/browse/SPARK-53154
>             Project: Spark
>          Issue Type: New Feature
>          Components: PySpark
>    Affects Versions: 3.5.6, 4.0.0
>            Reporter: Andrew Gross
>            Priority: Minor
>
> It would be extremely helpful to have access to the HMAC function in PySpark. 
>  I run in to a lot of situations where I need to generate pre-signed S3 URLs 
> across a large dataframe, and it can be quite slow to implement with a UDF.  
>  
> I was able to create a [working HMAC implementation in 
> PySpark|https://github.com/andrewgross/pyspark_utils/blob/main/src/pyspark_utils/hmac.py#L10]
>  however it hangs when trying to generate the signature for S3. Best I can 
> figure it is happening because the call graph of pyspark functions is getting 
> too deep and hanging the VM (usually hits around the 3rd nested HMAC call).
>  
> It seems like the best option would be to expose the HMAC function in PySpark 
> functions.
>  
> Suggested Interface
> {{pyspark.sql.functions.hmac({_}key: ColumnOrStr{_}, message: ColumnOrStr, 
> {_}hash_function: str = "sha256"{_}) -> BinaryColumn (similar to result of 
> .to_binary)}}
>  
> Not sure how hard it would be to expose other hash functions, but sha256 is 
> the priority for most use cases I have seen.  Exact types for the input 
> columns are flexible, strings or binary columns of byte arrays could all work.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to