On Wed, Nov 30, 2016 at 10:53 PM, Cody Yancey <yan...@uber.com> wrote:

>     This is not a bug, and in fact changing it would be a serious bug.
>
> False. Absolutely no consumer would be broken by a change to guarantee an
> identical time component that isn't broken already, for the simple reason
> your code already has to handle that case, as it is in fact the majority
> case RIGHT NOW. Users can hit this bug, in production, because unit tests
> might not experienced it! The time component should be the time that the
> command was processed by the coordinator node.
>
>      would one expect a java/py/bash script that loops
>
> Individual Cassandra writes (which is what OP is referring to
> specifically) are not loops. They are in almost every case atomic
> operations that either succeed completely or fail completely. Allowing a
> single atomic operation to witness multiple times in these corner cases is
> not only surprising, as this thread demonstrates, it is also needlessly
> restricting to what developers can use the database for, and provides NO
> BENEFIT.
>
>     Calling now PRIOR to initiating multiple inserts is in most cases
> exactly what one does...the ONLY practice is to set the value before
> initiating the sequence of calls
>
> Also false. Cassandra does not have a way of doing this on the coordinator
> node rather than the client device, and as I already showed, the client
> device is the wrong place to do it in situations where guaranteeing bounded
> clock-skew actually makes a difference one way or the other.
>
> Thanks,
> Cody
>
>
>
> On Wed, Nov 30, 2016 at 8:02 PM, daemeon reiydelle <daeme...@gmail.com>
> wrote:
>
>> This is not a bug, and in fact changing it would be a serious bug.
>>
>> What it is is a wonderful case of bad coding: would one expect a
>> java/py/bash script that loops on a bunch of read/execut/update calls where
>> each iteration calls time to return the same exact time for the duration of
>> the execution of the code? Whether the code runs for 5 seconds or 5 hours?
>>
>> Every call to a system call is unique, including within C*. Calling now
>> PRIOR to initiating multiple inserts is in most cases exactly what one does
>> to assure unique time stamps FOR THE BATCH OF INSERTS. To get a nearly
>> identical system time as would be the uuid of the row, one tries to call
>> time as close to just before the insert as possible. Then repeat.
>>
>> You have a logic issue in your code. If you want the same value for a set
>> of calls, the ONLY practice is to set the value before initiating the
>> sequence of calls.
>>
>>
>>
>> *.......*
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London
>> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>>
>> On Wed, Nov 30, 2016 at 6:16 PM, Cody Yancey <yan...@uber.com> wrote:
>>
>>> Getting the same TimeUUID values might be a major problem. Getting two
>>> different TimeUUIDs that at least have time component would not be a major
>>> problem as this is the main case today. Getting different time components
>>> is actually the corner case, and it is a corner case that breaks
>>> Internet-of-Things applications. We can tightly control clock skew in our
>>> cluster. We most definitely CANNOT control clock skew on the thousands of
>>> sensors that write to our cluster.
>>>
>>> Thanks,
>>> Cody
>>>
>>> On Wed, Nov 30, 2016 at 5:33 PM, Robert Wille <rwi...@fold3.com> wrote:
>>>
>>>> In my opinion, this is not broken and “fixing” it would break existing
>>>> code. Consider a batch that includes multiple inserts, each of which
>>>> inserts the value returned by now(). Getting the same UUID for each insert
>>>> would be a major problem.
>>>>
>>>> Cheers
>>>>
>>>> Robert
>>>>
>>>>
>>>> On Nov 30, 2016, at 4:46 PM, Todd Fast <t...@digitalexistence.com>
>>>> wrote:
>>>>
>>>> FWIW I'd suggest opening a bug--this behavior is certainly quite
>>>> unexpected and more than just a documentation issue. In general I can't
>>>> imagine any desirable properties of the current implementation, and there
>>>> are likely a bunch of latent bugs sitting out there, so it should be fixed.
>>>>
>>>> Todd
>>>>
>>>> On Wed, Nov 30, 2016 at 12:37 PM Terry Liu <t...@turnitin.com> wrote:
>>>>
>>>>> Sorry for my typo. Obviously, I meant:
>>>>> "It appears that a single query that calls Cassandra's`now()` time
>>>>> function *multiple times *may actually cause a query to write or
>>>>> return different times."
>>>>>
>>>>> Less of a surprise now that I realize more about the implementation,
>>>>> but I agree that more explicit documentation around when exactly the
>>>>> "execution" of each now() statement happens and what implications it has
>>>>> for the resulting timestamps would be helpful when running into this.
>>>>>
>>>>> Thanks for the quick responses!
>>>>>
>>>>> -Terry
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Nov 29, 2016 at 2:45 PM, Marko Švaljek <msval...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> every now() call in statement is under the hood "replaced" with newly
>>>>> generated uuid.
>>>>>
>>>>> It can happen that they belong to  different milliseconds in time.
>>>>>
>>>>> If you need to have same timestamps you need to set them on the client
>>>>> side.
>>>>>
>>>>>
>>>>> @msvaljek <https://twitter.com/msvaljek>
>>>>>
>>>>> 2016-11-29 22:49 GMT+01:00 Terry Liu <t...@turnitin.com>:
>>>>>
>>>>> It appears that a single query that calls Cassandra's `now()` time
>>>>> function may actually cause a query to write or return different times.
>>>>>
>>>>> Is this the expected or defined behavior, and if so, why does it
>>>>> behave like this rather than evaluating `now()` once across an entire
>>>>> statement?
>>>>>
>>>>> This really affects UPDATE statements but to test it more easily, you
>>>>> could try something like:
>>>>>
>>>>> SELECT toTimestamp(now()) as a, toTimestamp(now()) as b
>>>>> FROM keyspace.table
>>>>> LIMIT 100;
>>>>>
>>>>> If you run that a few times, you should eventually see that the
>>>>> timestamp returned moves onto the next millisecond mid-query.
>>>>>
>>>>> --
>>>>> *Software Engineer*
>>>>> Turnitin - http://www.turnitin.com
>>>>> t...@turnitin.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Software Engineer*
>>>>> Turnitin - http://www.turnitin.com
>>>>> t...@turnitin.com
>>>>>
>>>>
>>>>
>>>
>>
>
Food for thought: Hive's UDFs introduced an annotation  @UDFType(deterministic
= false)

http://dmtolpeko.com/2014/10/15/invoking-stateful-udf-at-map-and-reduce-side-in-hive/

The effect is the query planner can see when such a UDF is in use and
determine the value once at the start of a very long query.

Reply via email to