Re: Aggregation of Set Data Type

Joseph Wonesh Fri, 26 Oct 2018 07:15:42 -0700

I was able to make this work using the following UDF/UDA:

CREATE OR REPLACE FUNCTION agg_set_func(state tuple<int, set<bigint>>, val
set<bigint>) CALLED ON NULL INPUT RETURNS tuple<int, set<bigint>> LANGUAGE
java AS
$$
    if (val == null) {
    return state;
    }
    Set<Long> s = state.getSet(1, Long.class);
    s.addAll(val);
    state.setSet(1, s);
return state;
$$;


CREATE OR REPLACE FUNCTION agg_set_func_final(state tuple<int,
set<bigint>>) CALLED ON NULL INPUT RETURNS set<bigint> LANGUAGE java AS
$$
    return state.getSet(1, Long.class);
$$;

CREATE AGGREGATE agg_set(set<bigint>)
SFUNC agg_set_func
STYPE tuple<int, set<bigint>>
FINALFUNC agg_set_func_final
INITCOND (0,{});

On Thu, Oct 25, 2018 at 3:55 PM Joseph Wonesh <joseph.won...@sticknfind.com>
wrote:

> Thank you for your reply. I actually found your blog post regarding this
> topic and browsed through it, but it did not yield the answer I was looking
> for. In fact, it seems impossible to do what I wish to do without defining
> a UDA for this specific use case -- something that is not practical to do
> when all of my queries use 'group by'.
>
> For example, I ave a query like this:
>
> select sum(a), avg(a), min(a), max(a), MY_UDF(my_set_column) from my_table
> group by a;
>
> I would hope that using a UDF for my_set_column would allow me to combine
> all of the my_set_columns passed in via group by, but I cannot pass state
> to the UDF. A UDA can accept state, but that would require me rewriting the
> whole query to be:
>
> select MY_UDA(a, my_set_column) from my_table;
>
> Additionally, I would need a separate UDA for each of the different group
> by clauses. Is there no way around this? I would really like to be able to
> simply add a data column of type set<bigint> and then get all of the unique
> members in this set across an aggregation.
>
> On Tue, Oct 23, 2018 at 1:44 PM DuyHai Doan <doanduy...@gmail.com> wrote:
>
>> You will need to use user defined aggregates for this
>>
>> Le 23 oct. 2018 16:46, "Joseph Wonesh" <joseph.won...@sticknfind.com> a
>> écrit :
>>
>>> Hello all,
>>>
>>>  I am trying to aggregate rows which each contain a column of
>>> Set<BIGINT>. I would like the result to contain the sum of all sets, where
>>> null would be equivalent to the empty set. I expected a query like: "select
>>> sum(my_set_column) from my_table group by my_key_column" to do this, but
>>> the set type is not supported by this aggregate. Does anyone know of a way
>>> to aggregate this using existing cassandra built-ins? Thanks!
>>>
>>> This message is private and confidential. If you have received message
>>> in error, please notify us and remove from your system.
>>
>>

-- 
This message is private and confidential. If you have received message in 
error, please notify us and remove from your system.

Re: Aggregation of Set Data Type

Reply via email to