Just for sake of clarification, then, what is the use-case for having UDFs in an UPDATE?
If they cannot read data from the data store, then all of the parameters to the UDF must be supplied by the client, correct? If the client has all the parameters, the client could perform the equivalent of the UDF on the client side, first, then send the results to the server, instead of pushing the computation work onto the server. So I am curious as to what one is supposed to use a UDF in an UPDATE for. Long-winded explanation of the use-case I was poking at using UPDATE UDFs for below for the morbidly curious. That morbidly curious, huh? The scenario is, roughly, that the application receives a set of data which is broken up over, say, four messages (A,B,C,D). However, the messages can arrive in any order, possibly with duplicates, and the data set is not complete until the all four messages are received. There are multiple message receivers in order to scale to the volume of messages coming in, so each of the four messages per data set could arrive at any receiver (in any chronological pattern), and each receiving station would then insert the partial data into Cassandra. I looked at the Cassandra SET implementation, thinking that I could just add ‘A’, ‘B’, ‘C’, ‘D’ (or 1,2,3,4) to a set with a secondary index. Then periodically search for where the set had all elements to spot which rows had a complete data set ready for processing. However, there does not appear to be an equality check for SETs. (Adding elements to a set is another place where UPDATE appears to allow for the “x = x <operator> <data>” pattern which added to my confusion about using a UDF in the UPDATE.) So instead of using sets, the idea was to have a UDF perform a bit-wise OR operation. Roughly: CREATE FUNCTION bitwise_or( a int, b int ) CALLED ON NULL INPUT RETURNS int LANGUAGE java AS 'return Integer.valueOf((a == null ? 0 : a)|(b == null ? 0 : b));'; Then as each message segment came in, I had intended, roughly: UPDATE MessageData SET messageComplete = bitwise_or(messageComplete,2), data2=… ; UPDATE MessageData SET messageComplete = bitwise_or(messageComplete,1), data1=… ; UPDATE MessageData SET messageComplete = bitwise_or(messageComplete,8), data4=… ; UPDATE MessageData SET messageComplete = bitwise_or(messageComplete,4), data3=… ; Then, with a secondary index on ‘messageComplete’, periodically scrape out all rows where messageComplete was equal to 15. (At most, sixteen unique values in the secondary index.) (And use a TTL to expire messages that did not eventually complete, etc. Boilerplate infrastructure, etc.) This was based upon my incorrect assumption about UPDATE UDFs, since this looked like an optimal way to avoid having all the clients perform read-updates patterns and worrying about the clients stepping on each others data, as well as handling cases where duplicate messages were received by different receivers. So it’s starting to look like I might need to use something else to perform the correlation between messages. —Kim From: Sylvain Lebresne <sylv...@datastax.com<mailto:sylv...@datastax.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Date: Friday, March 11, 2016 at 00:35 To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> Subject: Re: Using User Defined Functions in UPDATE queries UDF are usable in UPDATE statement as actually trying them shows, it's just the documented grammar that needs fixing. But as far as doing something like: UPDATE test_table SET data=max_int(data,5) WHERE idx='abc’; this is indeed *not* supported and likely never will. One big pillar of C* design is that normal writes like this don't do a read-before-write, both for performance and because of consistency constraints, so we can't have update depend on the previous value in any way. I'll note that maybe that make UDF useless for you and if so, I'm sorry, but you just can't use UDF in C* for that and you'd have to do a manual read-before-write client side to achieve this. For the sake of avoiding confusion, I will not that we do allow: UPDATE test_table SET c = c + 1 WHERE idx='abc'; if c is a counter, but that's a very special case. Counters have a completely separate path and implementation and do have a read-before-write (and are slower than normal update as a result).