I was able to make this work using the following UDF/UDA: CREATE OR REPLACE FUNCTION agg_set_func(state tuple<int, set<bigint>>, val set<bigint>) CALLED ON NULL INPUT RETURNS tuple<int, set<bigint>> LANGUAGE java AS $$ if (val == null) { return state; } Set<Long> s = state.getSet(1, Long.class); s.addAll(val); state.setSet(1, s); return state; $$;
CREATE OR REPLACE FUNCTION agg_set_func_final(state tuple<int, set<bigint>>) CALLED ON NULL INPUT RETURNS set<bigint> LANGUAGE java AS $$ return state.getSet(1, Long.class); $$; CREATE AGGREGATE agg_set(set<bigint>) SFUNC agg_set_func STYPE tuple<int, set<bigint>> FINALFUNC agg_set_func_final INITCOND (0,{}); On Thu, Oct 25, 2018 at 3:55 PM Joseph Wonesh <joseph.won...@sticknfind.com> wrote: > Thank you for your reply. I actually found your blog post regarding this > topic and browsed through it, but it did not yield the answer I was looking > for. In fact, it seems impossible to do what I wish to do without defining > a UDA for this specific use case -- something that is not practical to do > when all of my queries use 'group by'. > > For example, I ave a query like this: > > select sum(a), avg(a), min(a), max(a), MY_UDF(my_set_column) from my_table > group by a; > > I would hope that using a UDF for my_set_column would allow me to combine > all of the my_set_columns passed in via group by, but I cannot pass state > to the UDF. A UDA can accept state, but that would require me rewriting the > whole query to be: > > select MY_UDA(a, my_set_column) from my_table; > > Additionally, I would need a separate UDA for each of the different group > by clauses. Is there no way around this? I would really like to be able to > simply add a data column of type set<bigint> and then get all of the unique > members in this set across an aggregation. > > On Tue, Oct 23, 2018 at 1:44 PM DuyHai Doan <doanduy...@gmail.com> wrote: > >> You will need to use user defined aggregates for this >> >> Le 23 oct. 2018 16:46, "Joseph Wonesh" <joseph.won...@sticknfind.com> a >> écrit : >> >>> Hello all, >>> >>> I am trying to aggregate rows which each contain a column of >>> Set<BIGINT>. I would like the result to contain the sum of all sets, where >>> null would be equivalent to the empty set. I expected a query like: "select >>> sum(my_set_column) from my_table group by my_key_column" to do this, but >>> the set type is not supported by this aggregate. Does anyone know of a way >>> to aggregate this using existing cassandra built-ins? Thanks! >>> >>> This message is private and confidential. If you have received message >>> in error, please notify us and remove from your system. >> >> -- This message is private and confidential. If you have received message in error, please notify us and remove from your system.