Hi All, I have two tables with same schema but different data. I have to join the tables based on one column and then do a group by the same column name.
now the data in that column in two table might/might not exactly match. (Ex - column name is "title". Table1. title = "doctor" and Table2. title = "doc") doctor and doc are actually same titles. >From performance point of view where i have data volume in TB , i am not sure if i can achieve this using the sql statement. What would be the best approach of solving this problem. Should i look for MLLIB apis? Spark Gurus any pointers? Thanks, Suniti