Normally a family of joins (left, right outter, inner) are performed on two dataframes using columns for the comparison ie left("acol") === ight("acol") . the comparison operator of the "left" dataframe does something internally and produces a column that i assume is used by the join.
What I want is to create my own comparison operation (i have a case where i want to use some fuzzy matching between rows and if they fall within some threshold we allow the join to happen) so it would look something like left.join(right, my_fuzzy_udf (left("cola"),right("cola"))) Where my_fuzzy_udf is my defined UDF. My main concern is the column that would have to be output what would its value be ie what would the function need to return that the udf susbsystem would then turn to a column to be evaluated by the join. Thanks in advance for any advice