Hi We are using a haversine distance function for this, and wrapping it in udf.
from pyspark.sql.functions import acos, cos, sin, lit, toRadians, udf from pyspark.sql.types import * def haversine_distance(long_x, lat_x, long_y, lat_y): return acos( sin(toRadians(lat_x)) * sin(toRadians(lat_y)) + cos(toRadians(lat_x)) * cos(toRadians(lat_y)) * cos(toRadians(long_x) - toRadians(long_y)) ) * lit(6371.0) distudf = udf(haversine_distance, FloatType()) in case you just want to use just Spark SQL, you can still utilize the functions shown above to implement in SQL. Any reason you do not want to use UDF? Credit <https://stackoverflow.com/questions/38994903/how-to-sum-distances-between-data-points-in-a-dataset-using-pyspark> On Fri, Apr 9, 2021 at 10:19 PM Rao Bandaru <rao.m...@outlook.com> wrote: > Hi All, > > > > I have a requirement to calculate distance between four > coordinates(Latitude1, Longtitude1, Latitude2, Longtitude2) in the *pysaprk > dataframe *with the help of from *geopy* import *distance *without using > *UDF* (user defined function)*,*Please help how to achieve this scenario > and do the needful. > > > > Thanks, > > Ankamma Rao B > -- Best Regards, Ayan Guha