Is it always the case that one title is a substring of another ? -- Not
always. One title can have values like D.O.C, doctor_{areacode},
doc_{dep,areacode}

On Mon, Mar 14, 2016 at 10:39 PM, Wail Alkowaileet <wael....@gmail.com>
wrote:

> I think you need some sort of fuzzy join ?
> Is it always the case that one title is a substring of another ?
>
> On Tue, Mar 15, 2016 at 6:46 AM, Suniti Singh <suniti.si...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I have two tables with same schema but different data. I have to join the
>> tables based on one column and then do a group by the same column name.
>>
>> now the data in that column in two table might/might not exactly match.
>> (Ex - column name is "title". Table1. title = "doctor"   and Table2. title
>> = "doc") doctor and doc are actually same titles.
>>
>> From performance point of view where i have data volume in TB , i am not
>> sure if i can achieve this using the sql statement. What would be the best
>> approach of solving this problem. Should i look for MLLIB apis?
>>
>> Spark Gurus any pointers?
>>
>> Thanks,
>> Suniti
>>
>>
>>
>
>
> --
>
> *Regards,*
> Wail Alkowaileet
>

Reply via email to