The best way to run this today is probably to manually convert the query into a join. I.e. create a dataframe that has all the numbers in it, and join/outer join it with the other table. This way you avoid parsing a gigantic string.
On Fri, Dec 4, 2015 at 10:36 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Have you seen this JIRA ? > > [SPARK-8077] [SQL] Optimization for TreeNodes with large numbers of > children > > From the numbers Michael published, 1 million numbers would still need 250 > seconds to parse. > > On Fri, Dec 4, 2015 at 10:14 AM, Madabhattula Rajesh Kumar < > mrajaf...@gmail.com> wrote: > >> Hi, >> >> How to use/best practices "IN" clause in Spark SQL. >> >> Use Case :- Read the table based on number. I have a List of numbers. >> For example, 1million. >> >> Regards, >> Rajesh >> > >