[ 
https://issues.apache.org/jira/browse/FLINK-29091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lincoln lee updated FLINK-29091:
--------------------------------
    Description: 
RAND and RAND_INTEGER are declared as dynamic function (isDynamicFuntion 
returns true), as the declaration it should only evaluate once at query-level 
(not per record) for batch mode, FLINK-21713 did the similar fix for temporal 
functions.

But current behavior is completely a non-deterministic function which evaluated 
per record for both batch and streaming mode, it's not a good choice to break 
current behavior,  and the determinism of RAND function are also different 
across vendors:

[1] evaluated at query-level though it is treated as non-deterministic function 
[https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/deterministic-and-nondeterministic-functions?view=sql-server-ver16#built-in-function-determinism|https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/deterministic-and-nondeterministic-functions?view=sql-server-ver16#built-in-function-determinism)]

[2][ evaluated at row level:  
[https://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html#function_rand]|https://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html#function_rand)]

[3] evaluated at row level if not specifies a seed,  e.g., DBMS_RANDOM.normal, 
DBMS_RANDOM.value(1,10)  
[https://docs.oracle.com/database/timesten-18.1/TTPLP/d_random.htm#TTPLP71231|https://docs.oracle.com/database/timesten-18.1/TTPLP/d_random.htm#TTPLP71231)]

So keep the current behavior and update these two functions' definition to 
non-deterministic can avoid the affection to users, and make it clearly.

  was:
RAND and RAND_INTEGER are dynamic function (isDynamicFuntion returns true), it 
should only evaluate once at query-level (not per record) for batch mode, 
FLINK-21713 did the similar fix for temporal functions. Note this a break 
change for batch jobs.

Another choice is keep the current behavior and update these two functions' 
definition to non-deterministic, this can minimize the affection to users. The 
determinism of RAND function without seed param are different across vendors:

[1] evaluated at query-level though it is treated as non-deterministic function 
[https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/deterministic-and-nondeterministic-functions?view=sql-server-ver16#built-in-function-determinism|https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/deterministic-and-nondeterministic-functions?view=sql-server-ver16#built-in-function-determinism)]

[2][ evaluated at row level:  
[https://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html#function_rand]|https://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html#function_rand)]

[3] evaluated at row level if not specifies a seed,  e.g., DBMS_RANDOM.normal, 
DBMS_RANDOM.value(1,10)  
[https://docs.oracle.com/database/timesten-18.1/TTPLP/d_random.htm#TTPLP71231|https://docs.oracle.com/database/timesten-18.1/TTPLP/d_random.htm#TTPLP71231)]

 


> Fix the determinism declaration of the rand function to be consistent with 
> the current behavior
> -----------------------------------------------------------------------------------------------
>
>                 Key: FLINK-29091
>                 URL: https://issues.apache.org/jira/browse/FLINK-29091
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Planner
>            Reporter: lincoln lee
>            Priority: Major
>
> RAND and RAND_INTEGER are declared as dynamic function (isDynamicFuntion 
> returns true), as the declaration it should only evaluate once at query-level 
> (not per record) for batch mode, FLINK-21713 did the similar fix for temporal 
> functions.
> But current behavior is completely a non-deterministic function which 
> evaluated per record for both batch and streaming mode, it's not a good 
> choice to break current behavior,  and the determinism of RAND function are 
> also different across vendors:
> [1] evaluated at query-level though it is treated as non-deterministic 
> function 
> [https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/deterministic-and-nondeterministic-functions?view=sql-server-ver16#built-in-function-determinism|https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/deterministic-and-nondeterministic-functions?view=sql-server-ver16#built-in-function-determinism)]
> [2][ evaluated at row level:  
> [https://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html#function_rand]|https://dev.mysql.com/doc/refman/5.7/en/mathematical-functions.html#function_rand)]
> [3] evaluated at row level if not specifies a seed,  e.g., 
> DBMS_RANDOM.normal, DBMS_RANDOM.value(1,10)  
> [https://docs.oracle.com/database/timesten-18.1/TTPLP/d_random.htm#TTPLP71231|https://docs.oracle.com/database/timesten-18.1/TTPLP/d_random.htm#TTPLP71231)]
> So keep the current behavior and update these two functions' definition to 
> non-deterministic can avoid the affection to users, and make it clearly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to