Re: About UDF/UDA

DuyHai Doan Thu, 27 Sep 2018 00:15:01 -0700

I'm afraid you cannot have a proper tabular formatting or an expand to
multiple rows (which changes significantly the semantics)


Indeed the result of the final func is returned by CQL as a whole column
and currently there is no way to change the output formatting


On Thu, Sep 27, 2018 at 6:55 AM, Riccardo Ferrari <ferra...@gmail.com>
wrote:

> Thank you Doan,
>
>  Indeed I'm using a FINALFUNC to compute the average already.
> Bit more context, I'm working on bucketized data, each bucket has already
> an 'event count' and an 'average' in it.
> My functions look as follow:
>
> //SFUNC
> CREATE OR REPLACE FUNCTION summaryState(state map<text, frozen<
> tuple<bigint, bigint, bigint, bigint> >>, name text, avgloadtime int,
> eventcount int)
>     CALLED ON NULL INPUT
>     RETURNS map<text, frozen< tuple<bigint, bigint, bigint, bigint> >>
>     LANGUAGE java
>     AS $$
>         if (name != null) {
>             com.datastax.driver.core.TupleValue stats =
> (com.datastax.driver.core.TupleValue)state.get(name);
>
>             if (stats == null) {
>                 com.datastax.driver.core.TupleType statsType =
> com.datastax.driver.core.TupleType.of(com.datastax.
> driver.core.ProtocolVersion.NEWEST_SUPPORTED, com.datastax.driver.core.
> CodecRegistry.DEFAULT_INSTANCE, com.datastax.driver.core.DataType.bigint(),
> com.datastax.driver.core.DataType.bigint(),com.
> datastax.driver.core.DataType.bigint(),com.datastax.driver.
> core.DataType.bigint());
>
>                 stats = statsType.newValue(Long.MAX_VALUE, 0L, 0L, 0L);
>             }
>
>             //Track min
>             Long min_ = (Long) stats.getLong(0);
>             min_ = min_ < avgloadtime ?  min_ : avgloadtime;
>             stats.setLong(0, min_);
>
>             //Track max
>             Long max_ = (Long) stats.getLong(1);
>             max_ = max_ > avgloadtime ?  max_ : avgloadtime;
>             stats.setLong(1, max_);
>
>             //Unroll average
>             Long avgSum = (Long) stats.getLong(2);
>             avgSum = avgSum + avgloadtime;
>             stats.setLong(2, avgSum);
>
>             //Event count
>             Long sampleSum = (Long) stats.getLong(3);
>             sampleSum = sampleSum + eventcount;
>             stats.setLong(3, sampleSum);
>
>             state.put(name, stats);
>         }
>         return state;
>     $$;
>
> //FINALFUNC
> CREATE OR REPLACE FUNCTION summaryFinal (state map<text, frozen<
> tuple<bigint, bigint, bigint, bigint> >>)
>     CALLED ON NULL INPUT
>     RETURNS map<text, frozen< tuple<bigint, bigint, bigint, bigint> >>
>     LANGUAGE java
>     AS $$
>         for (Object name : state.keySet()) {
>             com.datastax.driver.core.TupleValue stats =
> (com.datastax.driver.core.TupleValue) state.get(name);
>
>             Long avgSum = stats.getLong(2);
>             Long sampleSum = stats.getLong(3);
>
>             // Workaround: I can't escame the '/' using cql and had to use
> Math.pow
>             double avg_ = avgSum * Math.pow(sampleSum, -1);
>             stats.setLong(2, new Double(avg_).longValue());
>
>             state.put(name, stats);
>         }
>         return state;
>     $$;
> //AGGREGATE
> CREATE OR REPLACE AGGREGATE summary(text, int, int)
>     SFUNC summaryState
>     STYPE map<text, frozen< tuple<bigint, bigint, bigint, bigint> >>
>     FINALFUNC summaryFinal
>     INITCOND {};
>
> This gives me the following output:
> <TABLE>.summary(event, averageloadtime, count)
> ------------------------------------------------------------
> ------------------
>  {'<item1>': (365, 870, 617, 2), '<item2>'': (381, 11668, 6024, 2)}
>
> I would like to have something lke:
> | item        | min          | max        | average     | count |
> -----------------------------------------------------------------------
> | <item1>  | 365         | 870         | 617            | 2        |
> | <item2>  | 381         | 11668     | 6024          | 2        |
>
> Do you know if that is possible?
>
> On Wed, Sep 26, 2018 at 10:21 PM DuyHai Doan <doanduy...@gmail.com> wrote:
>
>> A hint to answer your Q3 is to use a final function to perform the
>> flattening or transformation on the result of the aggregation
>>
>> The syntax of an UDA is:
>>
>> CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS]
>> aggregateName(type1, type2, …)
>> SFUNC accumulatorFunction
>> STYPE stateType
>> [FINALFUNC finalFunction]
>> INITCOND initCond;
>>
>>
>> The final return type will be the return type of the FINALFUNC and not
>> necessarily the stateType
>>
>> More details by reading my blog post on it: http://www.doanduyhai.com/
>> blog/?p=1876
>>
>> On Wed, Sep 26, 2018 at 3:58 PM, Riccardo Ferrari <ferra...@gmail.com>
>> wrote:
>>
>>> Hi users!
>>>
>>> Given my Cassandra version 3.0.x I don't have the famous GROUP BY
>>> operator available. So looking around I turned to UDAs.
>>>
>>> I'm aware all/most of the magic happens on the coordinator and the plan
>>> is to keep the data volume low to avoid too much pressure.
>>>
>>> Q1: How much is low volume. It's obvious the answer is depends but, has
>>> anyone some experience to share?
>>>
>>> Q2: Do I undestand correctly is does not support pagination?
>>>
>>> I need something as simple as extract `min`, `max`, `average` and
>>> `count` per group where I don't know the actual group - I can't fire a
>>> query per each group name. - so something like `SELECT my_uda(field1,
>>> field2) WHERE ...;`
>>> This leads to:
>>> - a function that tracks min, max and sum up count and average. The
>>> state is a tuple
>>> - a final function that computes the average.
>>> - the aggregate function that uses the previous two
>>> the result is something like
>>> | 'item': (min_value, max_value, avg_value, count) , 'item2': (...),
>>> ...|
>>> Q3: Is there a way to `flatten` or `explode` the result into multiple
>>> lines ?
>>> If Q3 answer is yes: Is there a way to create multiple columns out of
>>> the result:
>>> ||other_fileds | item | min | max | avg | count||
>>>
>>> BONUS: Are there altenative? Should I really take into account upgrading
>>> to 3.11.X ?
>>> Thanks!
>>>
>>
>>

Re: About UDF/UDA

Reply via email to