On Wed, Nov 30, 2016 at 10:53 PM, Cody Yancey <yan...@uber.com> wrote:
> This is not a bug, and in fact changing it would be a serious bug. > > False. Absolutely no consumer would be broken by a change to guarantee an > identical time component that isn't broken already, for the simple reason > your code already has to handle that case, as it is in fact the majority > case RIGHT NOW. Users can hit this bug, in production, because unit tests > might not experienced it! The time component should be the time that the > command was processed by the coordinator node. > > would one expect a java/py/bash script that loops > > Individual Cassandra writes (which is what OP is referring to > specifically) are not loops. They are in almost every case atomic > operations that either succeed completely or fail completely. Allowing a > single atomic operation to witness multiple times in these corner cases is > not only surprising, as this thread demonstrates, it is also needlessly > restricting to what developers can use the database for, and provides NO > BENEFIT. > > Calling now PRIOR to initiating multiple inserts is in most cases > exactly what one does...the ONLY practice is to set the value before > initiating the sequence of calls > > Also false. Cassandra does not have a way of doing this on the coordinator > node rather than the client device, and as I already showed, the client > device is the wrong place to do it in situations where guaranteeing bounded > clock-skew actually makes a difference one way or the other. > > Thanks, > Cody > > > > On Wed, Nov 30, 2016 at 8:02 PM, daemeon reiydelle <daeme...@gmail.com> > wrote: > >> This is not a bug, and in fact changing it would be a serious bug. >> >> What it is is a wonderful case of bad coding: would one expect a >> java/py/bash script that loops on a bunch of read/execut/update calls where >> each iteration calls time to return the same exact time for the duration of >> the execution of the code? Whether the code runs for 5 seconds or 5 hours? >> >> Every call to a system call is unique, including within C*. Calling now >> PRIOR to initiating multiple inserts is in most cases exactly what one does >> to assure unique time stamps FOR THE BATCH OF INSERTS. To get a nearly >> identical system time as would be the uuid of the row, one tries to call >> time as close to just before the insert as possible. Then repeat. >> >> You have a logic issue in your code. If you want the same value for a set >> of calls, the ONLY practice is to set the value before initiating the >> sequence of calls. >> >> >> >> *.......* >> >> >> >> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London >> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>* >> >> On Wed, Nov 30, 2016 at 6:16 PM, Cody Yancey <yan...@uber.com> wrote: >> >>> Getting the same TimeUUID values might be a major problem. Getting two >>> different TimeUUIDs that at least have time component would not be a major >>> problem as this is the main case today. Getting different time components >>> is actually the corner case, and it is a corner case that breaks >>> Internet-of-Things applications. We can tightly control clock skew in our >>> cluster. We most definitely CANNOT control clock skew on the thousands of >>> sensors that write to our cluster. >>> >>> Thanks, >>> Cody >>> >>> On Wed, Nov 30, 2016 at 5:33 PM, Robert Wille <rwi...@fold3.com> wrote: >>> >>>> In my opinion, this is not broken and “fixing” it would break existing >>>> code. Consider a batch that includes multiple inserts, each of which >>>> inserts the value returned by now(). Getting the same UUID for each insert >>>> would be a major problem. >>>> >>>> Cheers >>>> >>>> Robert >>>> >>>> >>>> On Nov 30, 2016, at 4:46 PM, Todd Fast <t...@digitalexistence.com> >>>> wrote: >>>> >>>> FWIW I'd suggest opening a bug--this behavior is certainly quite >>>> unexpected and more than just a documentation issue. In general I can't >>>> imagine any desirable properties of the current implementation, and there >>>> are likely a bunch of latent bugs sitting out there, so it should be fixed. >>>> >>>> Todd >>>> >>>> On Wed, Nov 30, 2016 at 12:37 PM Terry Liu <t...@turnitin.com> wrote: >>>> >>>>> Sorry for my typo. Obviously, I meant: >>>>> "It appears that a single query that calls Cassandra's`now()` time >>>>> function *multiple times *may actually cause a query to write or >>>>> return different times." >>>>> >>>>> Less of a surprise now that I realize more about the implementation, >>>>> but I agree that more explicit documentation around when exactly the >>>>> "execution" of each now() statement happens and what implications it has >>>>> for the resulting timestamps would be helpful when running into this. >>>>> >>>>> Thanks for the quick responses! >>>>> >>>>> -Terry >>>>> >>>>> >>>>> >>>>> On Tue, Nov 29, 2016 at 2:45 PM, Marko Švaljek <msval...@gmail.com> >>>>> wrote: >>>>> >>>>> every now() call in statement is under the hood "replaced" with newly >>>>> generated uuid. >>>>> >>>>> It can happen that they belong to different milliseconds in time. >>>>> >>>>> If you need to have same timestamps you need to set them on the client >>>>> side. >>>>> >>>>> >>>>> @msvaljek <https://twitter.com/msvaljek> >>>>> >>>>> 2016-11-29 22:49 GMT+01:00 Terry Liu <t...@turnitin.com>: >>>>> >>>>> It appears that a single query that calls Cassandra's `now()` time >>>>> function may actually cause a query to write or return different times. >>>>> >>>>> Is this the expected or defined behavior, and if so, why does it >>>>> behave like this rather than evaluating `now()` once across an entire >>>>> statement? >>>>> >>>>> This really affects UPDATE statements but to test it more easily, you >>>>> could try something like: >>>>> >>>>> SELECT toTimestamp(now()) as a, toTimestamp(now()) as b >>>>> FROM keyspace.table >>>>> LIMIT 100; >>>>> >>>>> If you run that a few times, you should eventually see that the >>>>> timestamp returned moves onto the next millisecond mid-query. >>>>> >>>>> -- >>>>> *Software Engineer* >>>>> Turnitin - http://www.turnitin.com >>>>> t...@turnitin.com >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> *Software Engineer* >>>>> Turnitin - http://www.turnitin.com >>>>> t...@turnitin.com >>>>> >>>> >>>> >>> >> > Food for thought: Hive's UDFs introduced an annotation @UDFType(deterministic = false) http://dmtolpeko.com/2014/10/15/invoking-stateful-udf-at-map-and-reduce-side-in-hive/ The effect is the query planner can see when such a UDF is in use and determine the value once at the start of a very long query.