I'm afraid you cannot have a proper tabular formatting or an expand to multiple rows (which changes significantly the semantics)
Indeed the result of the final func is returned by CQL as a whole column and currently there is no way to change the output formatting On Thu, Sep 27, 2018 at 6:55 AM, Riccardo Ferrari <ferra...@gmail.com> wrote: > Thank you Doan, > > Indeed I'm using a FINALFUNC to compute the average already. > Bit more context, I'm working on bucketized data, each bucket has already > an 'event count' and an 'average' in it. > My functions look as follow: > > //SFUNC > CREATE OR REPLACE FUNCTION summaryState(state map<text, frozen< > tuple<bigint, bigint, bigint, bigint> >>, name text, avgloadtime int, > eventcount int) > CALLED ON NULL INPUT > RETURNS map<text, frozen< tuple<bigint, bigint, bigint, bigint> >> > LANGUAGE java > AS $$ > if (name != null) { > com.datastax.driver.core.TupleValue stats = > (com.datastax.driver.core.TupleValue)state.get(name); > > if (stats == null) { > com.datastax.driver.core.TupleType statsType = > com.datastax.driver.core.TupleType.of(com.datastax. > driver.core.ProtocolVersion.NEWEST_SUPPORTED, com.datastax.driver.core. > CodecRegistry.DEFAULT_INSTANCE, com.datastax.driver.core.DataType.bigint(), > com.datastax.driver.core.DataType.bigint(),com. > datastax.driver.core.DataType.bigint(),com.datastax.driver. > core.DataType.bigint()); > > stats = statsType.newValue(Long.MAX_VALUE, 0L, 0L, 0L); > } > > //Track min > Long min_ = (Long) stats.getLong(0); > min_ = min_ < avgloadtime ? min_ : avgloadtime; > stats.setLong(0, min_); > > //Track max > Long max_ = (Long) stats.getLong(1); > max_ = max_ > avgloadtime ? max_ : avgloadtime; > stats.setLong(1, max_); > > //Unroll average > Long avgSum = (Long) stats.getLong(2); > avgSum = avgSum + avgloadtime; > stats.setLong(2, avgSum); > > //Event count > Long sampleSum = (Long) stats.getLong(3); > sampleSum = sampleSum + eventcount; > stats.setLong(3, sampleSum); > > state.put(name, stats); > } > return state; > $$; > > //FINALFUNC > CREATE OR REPLACE FUNCTION summaryFinal (state map<text, frozen< > tuple<bigint, bigint, bigint, bigint> >>) > CALLED ON NULL INPUT > RETURNS map<text, frozen< tuple<bigint, bigint, bigint, bigint> >> > LANGUAGE java > AS $$ > for (Object name : state.keySet()) { > com.datastax.driver.core.TupleValue stats = > (com.datastax.driver.core.TupleValue) state.get(name); > > Long avgSum = stats.getLong(2); > Long sampleSum = stats.getLong(3); > > // Workaround: I can't escame the '/' using cql and had to use > Math.pow > double avg_ = avgSum * Math.pow(sampleSum, -1); > stats.setLong(2, new Double(avg_).longValue()); > > state.put(name, stats); > } > return state; > $$; > //AGGREGATE > CREATE OR REPLACE AGGREGATE summary(text, int, int) > SFUNC summaryState > STYPE map<text, frozen< tuple<bigint, bigint, bigint, bigint> >> > FINALFUNC summaryFinal > INITCOND {}; > > This gives me the following output: > <TABLE>.summary(event, averageloadtime, count) > ------------------------------------------------------------ > ------------------ > {'<item1>': (365, 870, 617, 2), '<item2>'': (381, 11668, 6024, 2)} > > I would like to have something lke: > | item | min | max | average | count | > ----------------------------------------------------------------------- > | <item1> | 365 | 870 | 617 | 2 | > | <item2> | 381 | 11668 | 6024 | 2 | > > Do you know if that is possible? > > On Wed, Sep 26, 2018 at 10:21 PM DuyHai Doan <doanduy...@gmail.com> wrote: > >> A hint to answer your Q3 is to use a final function to perform the >> flattening or transformation on the result of the aggregation >> >> The syntax of an UDA is: >> >> CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS] >> aggregateName(type1, type2, …) >> SFUNC accumulatorFunction >> STYPE stateType >> [FINALFUNC finalFunction] >> INITCOND initCond; >> >> >> The final return type will be the return type of the FINALFUNC and not >> necessarily the stateType >> >> More details by reading my blog post on it: http://www.doanduyhai.com/ >> blog/?p=1876 >> >> On Wed, Sep 26, 2018 at 3:58 PM, Riccardo Ferrari <ferra...@gmail.com> >> wrote: >> >>> Hi users! >>> >>> Given my Cassandra version 3.0.x I don't have the famous GROUP BY >>> operator available. So looking around I turned to UDAs. >>> >>> I'm aware all/most of the magic happens on the coordinator and the plan >>> is to keep the data volume low to avoid too much pressure. >>> >>> Q1: How much is low volume. It's obvious the answer is depends but, has >>> anyone some experience to share? >>> >>> Q2: Do I undestand correctly is does not support pagination? >>> >>> I need something as simple as extract `min`, `max`, `average` and >>> `count` per group where I don't know the actual group - I can't fire a >>> query per each group name. - so something like `SELECT my_uda(field1, >>> field2) WHERE ...;` >>> This leads to: >>> - a function that tracks min, max and sum up count and average. The >>> state is a tuple >>> - a final function that computes the average. >>> - the aggregate function that uses the previous two >>> the result is something like >>> | 'item': (min_value, max_value, avg_value, count) , 'item2': (...), >>> ...| >>> Q3: Is there a way to `flatten` or `explode` the result into multiple >>> lines ? >>> If Q3 answer is yes: Is there a way to create multiple columns out of >>> the result: >>> ||other_fileds | item | min | max | avg | count|| >>> >>> BONUS: Are there altenative? Should I really take into account upgrading >>> to 3.11.X ? >>> Thanks! >>> >> >>