Re: Using UDF based on Numpy functions in Spark SQL

Sean Owen Wed, 23 Dec 2020 15:50:57 -0800

Why do you want to use this function instead of the built-in stddev
function?


On Wed, Dec 23, 2020 at 2:52 PM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Hi,
>
>
> This is a shot in the dark so to speak.
>
>
> I would like to use the standard deviation std offered by numpy in
> PySpark. I am using SQL for now
>
>
> The code as below
>
>
>   sqltext = f"""
>
>   SELECT
>
>           rs.Customer_ID
>
>         , rs.Number_of_orders
>
>         , rs.Total_customer_amount
>
>         , rs.Average_order
>
>         , rs.Standard_deviation
>
>   FROM
>
>   (
>
>         SELECT cust_id AS Customer_ID,
>
>         COUNT(amount_sold) AS Number_of_orders,
>
>         SUM(amount_sold) AS Total_customer_amount,
>
>         AVG(amount_sold) AS Average_order,
>
>       *  STDDEV(amount_sold) AS Standard_deviation*
>
>         FROM {DB}.{table}
>
>         GROUP BY cust_id
>
>         HAVING SUM(amount_sold) > 94000
>
>         AND AVG(amount_sold) < STDDEV(amount_sold)
>
>   ) rs
>
>   ORDER BY
>
>           3 DESC
>
>   """
>
>   spark.sql(sqltext)
>
> Now if I wanted to use UDF based on numpy STD function, I can do
>
> import numpy as np
> from pyspark.sql.functions import UserDefinedFunction
> from pyspark.sql.types import DoubleType
> udf = UserDefinedFunction(np.std, DoubleType())
>
> How can I use that udf with spark SQL? I gather this is only possible
> through functional programming?
>
> Thanks,
>
> Mich
>
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>

Re: Using UDF based on Numpy functions in Spark SQL

Reply via email to