There's another thing that's not mentioned … it's primarily a problem for Scala. Due to static typing, we need a very large number of function overloads for the Scala version of each function, whereas in SQL/Python they are just one. There's a limit on how many functions we can add, and it also makes it difficult to browse through the docs when there are a lot of functions.
On Thu, Jan 28, 2021 at 1:09 PM, Maciej < mszymkiew...@gmail.com > wrote: > > Just my two cents on R side. > > > > On 1/28/21 10:00 PM, Nicholas Chammas wrote: > > >> On Thu, Jan 28 , 2021 at 3:40 PM Sean Owen < srowen@ gmail. com ( >> sro...@gmail.com ) > wrote: >> >> >>> It isn't that regexp_extract_all (for example) is useless outside SQL, >>> just, where do you draw the line? Supporting 10s of random SQL functions >>> across 3 other languages has a cost, which has to be weighed against >>> benefit, which we can never measure well except anecdotally: one or two >>> people say "I want this" in a sea of hundreds of thousands of users. >>> >> >> >> >> +1 to this, but I will add that Jira and Stack Overflow activity can >> sometimes give good signals about API gaps that are frustrating users. If >> there is an SO question with 30K views about how to do something that >> should have been easier, then that's an important signal about the API. >> >> >> >>> For this specific case, I think there is a fine argument that >>> regexp_extract_all should be added simply for consistency with >>> regexp_extract. I can also see the argument that regexp_extract was a step >>> too far, but, what's public is now a public API. >>> >> >> >> >> I think in this case a few references to where/how people are having to >> work around missing a direct function for regexp_extract_all could help >> guide the decision. But that itself means we are making these decisions on >> a case-by-case basis. >> >> >> From a user perspective, it's definitely conceptually simpler to have SQL >> functions be consistent and available across all APIs. >> >> >> >> Perhaps if we had a way to lower the maintenance burden of keeping >> functions in sync across SQL/Scala/Python/R, it would be easier for >> everyone to agree to just have all the functions be included across the >> board all the time. >> > > > > Python aligns quite well with Scala so that might be fine, but R is a bit > tricky thing. Especially lack of proper namespaces makes it rather risky > to have packages that export hundreds of functions. sparkly handles this > neatly with NSE, but I don't think we're going to go this way. > > > >> >> >> Would, for example, some sort of automatic testing mechanism for SQL >> functions help here? Something that uses a common function testing >> specification to automatically test SQL, Scala, Python, and R functions, >> without requiring maintainers to write tests for each language's version >> of the functions. Would that address the maintenance burden? >> > > > > With R we don't really test most of the functions beyond the simple > "callability". One the complex ones, that require some non-trivial > transformations of arguments, are fully tested. > > > -- > Best regards, > Maciej Szymkiewicz > > Web: https:/ / zero323. net ( https://zero323.net ) > Keybase: https:/ / keybase. io/ zero323 ( https://keybase.io/zero323 ) > Gigs: https:/ / www. codementor. io/ @ zero323 ( > https://www.codementor.io/@zero323 ) > PGP: A30CEF0C31A501EC >
smime.p7s
Description: S/MIME Cryptographic Signature