Re: Aggregate functions on collections, collection functions and MAXWRITETIME

2022-12-09 Thread Josh McKenzie
> I wonder if there’s any value in deciding on a general approach to guide > these discussions in future? Are we aiming to look like SQL as we evolve, and > if so which products do we want to be informed by? CQL's proximity to SQL is a blessing and a curse IMO. How close do we expect ourselves t

Re: Aggregate functions on collections, collection functions and MAXWRITETIME

2022-12-09 Thread Benedict
Why would that require a fancy query optimiser? It doesn’t seem like it would be much more difficult than modifying our function binding rules and having a specialised version. > On 9 Dec 2022, at 13:26, Andrés de la Peña wrote: > >  > Note that specialized collection functions are also an

Re: Aggregate functions on collections, collection functions and MAXWRITETIME

2022-12-09 Thread Andrés de la Peña
Note that specialized collection functions are also an opportunity for optimization. For example, COLLECTION_COUNT reads only the first bytes of a serialized collection, since those bytes contain the number of elements in that collection. The most simple implementation of COUNT(UNNEST(collection))

Re: Aggregate functions on collections, collection functions and MAXWRITETIME

2022-12-09 Thread Benedict
Right, this is basically my view - it can be syntactic sugar for UNNEST subqueries as and when we offer those (if not now), but I think we should be able to apply any UDA or aggregate to collections with some syntax that’s ergonomic. I don’t think APPLY is the right way to express it, my versio

Re: Aggregate functions on collections, collection functions and MAXWRITETIME

2022-12-09 Thread Benedict
The idea isn’t that dissent is important, but that interrogating changes like this in a broader forum is valuable - and should be the norm. Nobody is suggesting that anything untoward has happened previously, only that we can do better going forward. It was by chance that I saw one of these tick

Re: Aggregate functions on collections, collection functions and MAXWRITETIME

2022-12-09 Thread Claude Warren, Jr via dev
I still think that semantically it makes sense to have a function that applies an aggregate to various collection types. So rather than building ARRAY_MAX do APPLY(MAX, column)) or APPLY(MAX(column)) it is clear what is being requested and APPLY can be the source of truth for which aggregate funct

Re: Aggregate functions on collections, collection functions and MAXWRITETIME

2022-12-09 Thread Andrés de la Peña
Indeed this discussion is useful now that we know that there is dissension about these changes. However, as those changes were happening none of the persons involved on them felt the need of a discuss thread, and I opened this thread as soon as Benedict objected the changes. I think this is perfect

Re: Aggregate functions on collections, collection functions and MAXWRITETIME

2022-12-08 Thread J. D. Jordan
I think this thread proves the point that a DISCUSS thread for API changes on dev@ will get more viewpoints than just having something in JIRA. I think this thread has been useful and should result in us having a better user facing API than without it.On Dec 8, 2022, at 1:57 PM, Andrés de la Peña

Re: Aggregate functions on collections, collection functions and MAXWRITETIME

2022-12-08 Thread Andrés de la Peña
> > I expect we’ll rehash it every API thread otherwise. Since you bring up the topic, I understand that opposing to every single reviewed decision that has been taken on CASSANDRA-17811, CASSANDRA-8877, CASSANDRA-17425 and CASSANDRA-18085 could make an argument in favour of the policy demanding

Re: Aggregate functions on collections, collection functions and MAXWRITETIME

2022-12-08 Thread Benedict
It feels like this is a recurring kind of discussion, and I wonder if there’s any value in deciding on a general approach to guide these discussions in future? Are we aiming to look like SQL as we evolve, and if so which products do we want to be informed by? I expect we’ll rehash it every API

Re: Aggregate functions on collections, collection functions and MAXWRITETIME

2022-12-08 Thread Benedict
>> 1) Do they offer ARRAY_SUM or ARRAY_AVG? > Yes, a quick search on Google shows some examples: > https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/68fdFR3LWhx7KtHc9Iv5Qg > https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/gxz1nB7GclxNO5mBd~rn8g > https://docs.upsolver.com/sqlake/functions-and-

Re: Aggregate functions on collections, collection functions and MAXWRITETIME

2022-12-08 Thread Benedict
>> 1) Do they offer ARRAY_SUM or ARRAY_AVG? > Yes, a quick search on Google shows some examples: > https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/68fdFR3LWhx7KtHc9Iv5Qg > https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/gxz1nB7GclxNO5mBd~rn8g > https://docs.upsolver.com/sqlake/functions-and-

Re: Aggregate functions on collections, collection functions and MAXWRITETIME

2022-12-08 Thread Andrés de la Peña
> > 1) Do they offer ARRAY_SUM or ARRAY_AVG? Yes, a quick search on Google shows some examples: https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/68fdFR3LWhx7KtHc9Iv5Qg https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/gxz1nB7GclxNO5mBd~rn8g https://docs.upsolver.com/sqlake/functions-and-operato

Re: Aggregate functions on collections, collection functions and MAXWRITETIME

2022-12-08 Thread Benedict
1) Do they offer ARRAY_SUM or ARRAY_AVG? 2) Do they define ARRAY_COUNT or ARRAY_LENGTH? 3) A map is a collection in C* parlance, but I gather from below you expect these methods not to operate on them? Does ARRAY_MAX operate on single values? If we are to base our decisions on norms elsewhere, w

Re: Aggregate functions on collections, collection functions and MAXWRITETIME

2022-12-08 Thread Andrés de la Peña
"ARRAY_MAX" and "ARRAY_MIN" functions to get the max/min element in a list are not an uncommon practice. You can find them in SparkSQL, Amazon Timestream, Teradata, etc. Since we have what we call collections instead or arrays, it makes sense to call the analogous functions "COLLECTION_MAX", "COLLE

Re: Aggregate functions on collections, collection functions and MAXWRITETIME

2022-12-08 Thread Claude Warren, Jr via dev
I think the semantics of the situation is important here. Let’s take MAX as our example aggregate function.. We all expect that in a DB context MAX(column) will return the value of the column with the maximum value. That is the expected semantics of MAX. The question here is that there are d

Re: Aggregate functions on collections, collection functions and MAXWRITETIME

2022-12-08 Thread Benedict
I meant unnest, not unwrap. > On 8 Dec 2022, at 10:34, Benedict wrote: > >  >  >> I do not think we should have functions that aggregate across rows and >> functions that operate within a row use the same name. > > I’m sympathetic to that view for sure. I wouldn’t be too disappointed by that

Re: Aggregate functions on collections, collection functions and MAXWRITETIME

2022-12-08 Thread Avi Kivity via dev
IMO it's wrong to change an aggregate's meaning from "aggregate across GROUPs or entire SELECT" to "aggregate within column". Aggregation is long established in SQL and it will just confuse experienced database users. PostgresQL maintains the meaning of max: CREATE TABLE tab (     x int[] ); INS

Re: Aggregate functions on collections, collection functions and MAXWRITETIME

2022-12-08 Thread Benedict
 > I do not think we should have functions that aggregate across rows and > functions that operate within a row use the same name. I’m sympathetic to that view for sure. I wouldn’t be too disappointed by that outcome, and SQL engines seem to take a similar approach, however they mostly rely on

Re: Aggregate functions on collections, collection functions and MAXWRITETIME

2022-12-07 Thread Andrés de la Peña
The comparator for collections is the lexicographical compare on the collection items. That might nor be the more useful thing but it's not impossible to imagine cases where that ordering can be useful. To make a random example, you can use a list column to store the name and surnames of a person,

Re: Aggregate functions on collections, collection functions and MAXWRITETIME

2022-12-06 Thread J. D. Jordan
If the functionality truly has never actually worked, then throwing an error that MAX is not supported for collections seems reasonable. But we should throw an error, I do not think we should have functions that aggregate across rows and functions that operate within a row use the same name. My

Re: Aggregate functions on collections, collection functions and MAXWRITETIME

2022-12-06 Thread Benedict
As far as I am aware it has never worked in a release, and so deprecating it is probably not as challenging as you think. Only folk that have been able to parse the raw bytes of the collection in storage format would be affected - which we can probably treat as zero. > On 6 Dec 2022, at 17:31,

Re: Aggregate functions on collections, collection functions and MAXWRITETIME

2022-12-06 Thread Jeremiah D Jordan
> 1. I think it is a mistake to offer a function MAX that operates over rows > containing collections, returning the collection with the most elements. This > is just a nonsensical operation to support IMO. We should decide as a > community whether we “fix” this aggregation, or remove it. The c

Re: Aggregate functions on collections, collection functions and MAXWRITETIME

2022-12-06 Thread Benedict
Thanks Andres, I think community input on direction here will be invaluable. There’s a bunch of interrelated tickets, and my opinions are as follows: 1. I think it is a mistake to offer a function MAX that operates over rows containing collections, returning the collection with the most element