Hi All,

I have two tables with same schema but different data. I have to join the
tables based on one column and then do a group by the same column name.

now the data in that column in two table might/might not exactly match. (Ex
- column name is "title". Table1. title = "doctor"   and Table2. title =
"doc") doctor and doc are actually same titles.

>From performance point of view where i have data volume in TB , i am not
sure if i can achieve this using the sql statement. What would be the best
approach of solving this problem. Should i look for MLLIB apis?

Spark Gurus any pointers?

Thanks,
Suniti

Reply via email to