Hi,

You are right. We can make use of it to do soft delete.
But there will be problems in other cases. For example, retract messages by
the whole row. I opened a jira[1] about this problem. Thanks for bring up
this discussion.

[1] https://issues.apache.org/jira/browse/FLINK-10188

Best, Hequn

On Tue, Aug 21, 2018 at 12:34 PM, 徐涛 <happydexu...@gmail.com> wrote:

> Hi Hequn,
> Another question, for some case, I think update the timestamp of the
> retract row is reasonable, for example, some user does not want to the hard
> delete, but the soft delete, so I write code when the retract row comes I
> only do the soft delete, but I want the update_timestamp different so the
> ETL program can know that this line has changed.
>
>
>     For example, if the value is updated from 1 to 2,
>
> previous row:  add (a, 1, 2018-08-20 20:18:10.286)
> retract row: delete (a, 1, 2018-08-20 20:18:10.386)
> new row: add (a, 2, 2018-08-20 20:18:10.486)
>
>
> 在 2018年8月21日,下午12:25,Hequn Cheng <chenghe...@gmail.com> 写道:
>
> Hi Henry,
>
> You are right that, in MySQL, SYSDATE returns the time at which it
> executes while LOCALTIMESTAMP returns a constant time that indicates the
> time at which the statement began to execute.
> But other database system seems don't have this constraint(correct me if
> I'm wrong). Sometimes we don't have to follow MySQL.
>
> Best, Hequn
>
> On Tue, Aug 21, 2018 at 10:21 AM, 徐涛 <happydexu...@gmail.com> wrote:
>
>> Hi Hequn,
>> Maybe I do not express clearly. I mean if only the update_timestamp of
>> the increment data is updated, it is not enough. Because from the sql, it
>> express the idea “all the time in the table is the same”, but actually each
>> item in the table may be different. It is a bit weird.
>>
>> Best, Henry
>>
>>
>>
>> 在 2018年8月21日,上午10:09,Hequn Cheng <chenghe...@gmail.com> 写道:
>>
>> Hi Henry,
>>
>> If you upsert by key 'article_id', the result is correct, i.e, the result
>> is (a, 2, 2018-08-20 20:18:10.486). What do you think?
>>
>> Best, Hequn
>>
>>
>>
>> On Tue, Aug 21, 2018 at 9:44 AM, 徐涛 <happydexu...@gmail.com> wrote:
>>
>>> Hi Hequn,
>>> However is it semantically correct? because the sql result is not equal
>>> to the bounded table.
>>>
>>>
>>> 在 2018年8月20日,下午8:34,Hequn Cheng <chenghe...@gmail.com> 写道:
>>>
>>> Hi Henry,
>>>
>>> Both sql output incrementally.
>>>
>>> However there are some problems if you use retract sink. You have to pay
>>> attention to the timestamp field since each time the value is different.
>>> For example, if the value is updated from 1 to 2,
>>>
>>> previous row:  add (a, 1, 2018-08-20 20:18:10.286)
>>> retract row: delete (a, 1, 2018-08-20 20:18:10.386)
>>> new row: add (a, 2, 2018-08-20 20:18:10.486)
>>>
>>> The retract row is different from the previous row because of the time
>>> field.
>>>
>>> Of course, this problem should be fixed later.
>>>
>>> Best, Hequn
>>>
>>> On Mon, Aug 20, 2018 at 6:43 PM, 徐涛 <happydexu...@gmail.com> wrote:
>>>
>>>> Hi All,
>>>> Like the following code,If I use retract stream, I think Flink is able
>>>> to know which item is modified( if praise has 10000 items now, when one
>>>> item comes to the stream, only very small amount of data is write to sink)
>>>>
>>>>    var praiseAggr = tableEnv.sqlQuery(*s"SELECT article_id,hll(uid) as PU 
>>>> FROM praise group by article_id**”* )
>>>>
>>>>         tableEnv.registerTable("finalTable", praiseAggr)
>>>>
>>>>    tableEnv.sqlUpdate(s"insert into sinkTableName SELECT * from 
>>>> finalTable")
>>>>
>>>>
>>>>         But if I use the following sql, by adding a dynamic timestamp
>>>> field:
>>>> var praiseAggr = tableEnv.sqlQuery(s"SELECT article_id,hll(uid) as 
>>>> PU,LOCALTIMESTAMP
>>>> as update_timestamp* FROM praise group by article_id**”* )
>>>>       Is the whole table flush to the sink? Or only the incremental
>>>> value will flush to the sink? Why?
>>>>
>>>> Thanks,
>>>> Henry
>>>>
>>>>
>>>
>>>
>>
>>
>
>

Reply via email to