Hi, may be I am getting confused as always :) , but the requirement looked pretty simple to me to be implemented in SQL, or it is just the euphoria of Christmas eve
Anyways, in case the above can be implemented in SQL, then I can have a look at it. Yes, indeed there are bespoke scenarios where dataframes may apply and RDD are used, but for UDF's I prefer SQL as well, but that may be a personal idiosyncrasy. The Oreilly book on data algorithms using SPARK, pyspark uses dataframes and RDD API's :) Regards, Gourav Sengupta On Fri, Dec 24, 2021 at 6:11 PM Sean Owen <sro...@gmail.com> wrote: > This is simply not generally true, no, and not in this case. The > programmatic and SQL APIs overlap a lot, and where they do, they're > essentially aliases. Use whatever is more natural. > What I wouldn't recommend doing is emulating SQL-like behavior in custom > code, UDFs, etc. The native operators will be faster. > Sometimes you have to go outside SQL where necessary, like in UDFs or > complex aggregation logic. Then you can't use SQL. > > On Fri, Dec 24, 2021 at 12:05 PM Gourav Sengupta < > gourav.sengu...@gmail.com> wrote: > >> Hi, >> >> yeah I think that in practice you will always find that dataframes can >> give issues regarding a lot of things, and then you can argue. In the SPARK >> conference, I think last year, it was shown that more than 92% or 95% use >> the SPARK SQL API, if I am not mistaken. >> >> I think that you can do the entire processing at one single go. >> >> Can you please write down the end to end SQL and share without the 16000 >> iterations? >> >> >> Regards, >> Gourav Sengupta >> >> >> On Fri, Dec 24, 2021 at 5:16 PM Andrew Davidson <aedav...@ucsc.edu> >> wrote: >> >>> Hi Sean and Gourav >>> >>> >>> >>> Thanks for the suggestions. I thought that both the sql and dataframe >>> apis are wrappers around the same frame work? Ie. catalysts. >>> >>> >>> >>> I tend to mix and match my code. Sometimes I find it easier to write >>> using sql some times dataframes. What is considered best practices? >>> >>> >>> >>> Here is an example >>> >>> >>> >>>