There's another thing that's not mentioned … it's primarily a problem for 
Scala. Due to static typing, we need a very large number of function overloads 
for the Scala version of each function, whereas in SQL/Python they are just 
one. There's a limit on how many functions we can add, and it also makes it 
difficult to browse through the docs when there are a lot of functions.

On Thu, Jan 28, 2021 at 1:09 PM, Maciej < mszymkiew...@gmail.com > wrote:

> 
> Just my two cents on R side.
> 
> 
> 
> On 1/28/21 10:00 PM, Nicholas Chammas wrote:
> 
> 
>> On Thu, Jan 28 , 2021 at 3:40 PM Sean Owen < srowen@ gmail. com (
>> sro...@gmail.com ) > wrote:
>> 
>> 
>>> It isn't that regexp_extract_all (for example) is useless outside SQL,
>>> just, where do you draw the line? Supporting 10s of random SQL functions
>>> across 3 other languages has a cost, which has to be weighed against
>>> benefit, which we can never measure well except anecdotally: one or two
>>> people say "I want this" in a sea of hundreds of thousands of users.
>>> 
>> 
>> 
>> 
>> +1 to this, but I will add that Jira and Stack Overflow activity can
>> sometimes give good signals about API gaps that are frustrating users. If
>> there is an SO question with 30K views about how to do something that
>> should have been easier, then that's an important signal about the API.
>> 
>> 
>> 
>>> For this specific case, I think there is a fine argument that
>>> regexp_extract_all should be added simply for consistency with
>>> regexp_extract. I can also see the argument that regexp_extract was a step
>>> too far, but, what's public is now a public API.
>>> 
>> 
>> 
>> 
>> I think in this case a few references to where/how people are having to
>> work around missing a direct function for regexp_extract_all could help
>> guide the decision. But that itself means we are making these decisions on
>> a case-by-case basis.
>> 
>> 
>> From a user perspective, it's definitely conceptually simpler to have SQL
>> functions be consistent and available across all APIs.
>> 
>> 
>> 
>> Perhaps if we had a way to lower the maintenance burden of keeping
>> functions in sync across SQL/Scala/Python/R, it would be easier for
>> everyone to agree to just have all the functions be included across the
>> board all the time.
>> 
> 
> 
> 
> Python aligns quite well with Scala so that might be fine, but R is a bit
> tricky thing. Especially lack of proper namespaces makes it rather risky
> to have packages that export hundreds of functions. sparkly handles this
> neatly with NSE, but I don't think we're going to go this way.
> 
> 
> 
>> 
>> 
>> Would, for example, some sort of automatic testing mechanism for SQL
>> functions help here? Something that uses a common function testing
>> specification to automatically test SQL, Scala, Python, and R functions,
>> without requiring maintainers to write tests for each language's version
>> of the functions. Would that address the maintenance burden?
>> 
> 
> 
> 
> With R we don't really test most of the functions beyond the simple
> "callability". One the complex ones, that require some non-trivial
> transformations of arguments, are fully tested.
> 
> 
> -- 
> Best regards,
> Maciej Szymkiewicz
> 
> Web: https:/ / zero323. net ( https://zero323.net )
> Keybase: https:/ / keybase. io/ zero323 ( https://keybase.io/zero323 )
> Gigs: https:/ / www. codementor. io/ @ zero323 (
> https://www.codementor.io/@zero323 )
> PGP: A30CEF0C31A501EC
>

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to