Why do you want to use this function instead of the built-in stddev function?
On Wed, Dec 23, 2020 at 2:52 PM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Hi, > > > This is a shot in the dark so to speak. > > > I would like to use the standard deviation std offered by numpy in > PySpark. I am using SQL for now > > > The code as below > > > sqltext = f""" > > SELECT > > rs.Customer_ID > > , rs.Number_of_orders > > , rs.Total_customer_amount > > , rs.Average_order > > , rs.Standard_deviation > > FROM > > ( > > SELECT cust_id AS Customer_ID, > > COUNT(amount_sold) AS Number_of_orders, > > SUM(amount_sold) AS Total_customer_amount, > > AVG(amount_sold) AS Average_order, > > * STDDEV(amount_sold) AS Standard_deviation* > > FROM {DB}.{table} > > GROUP BY cust_id > > HAVING SUM(amount_sold) > 94000 > > AND AVG(amount_sold) < STDDEV(amount_sold) > > ) rs > > ORDER BY > > 3 DESC > > """ > > spark.sql(sqltext) > > Now if I wanted to use UDF based on numpy STD function, I can do > > import numpy as np > from pyspark.sql.functions import UserDefinedFunction > from pyspark.sql.types import DoubleType > udf = UserDefinedFunction(np.std, DoubleType()) > > How can I use that udf with spark SQL? I gather this is only possible > through functional programming? > > Thanks, > > Mich > > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > >