Re: [DISCUSS] Changing return type of window properties (WINDOW_START/END)

Dawid Wysakowicz Tue, 12 Nov 2024 03:47:17 -0800

Thank you for your reply @Leonard

Firstly the example result is a little strange for me too, the print
> window_time looks incorrect, Could you post your entire example especially
> your session time zone?



You can modify any of the tests in WindowAggregateITCase[1], e.g.
testEventTimeTumbleWindow:

  @TestTemplate
  def testEventTimeTumbleWindow(): Unit = {
    val sql =
      """
        |SELECT
        |  `name`,
        |  window_start,
        |  window_end,
        |  window_time,
        |  COUNT(*),
        |  SUM(`bigdec`),
        |  MAX(`double`),
        |  MIN(`float`),
        |  COUNT(DISTINCT `string`),
        |  concat_distinct_agg(`string`)
        |FROM TABLE(
        |   TUMBLE(TABLE T1, DESCRIPTOR(rowtime), INTERVAL '5' SECOND))
        |GROUP BY `name`, window_start, window_end, window_time
      """.stripMargin

    val sink = new TestingAppendSink
    tEnv.sqlQuery(sql).toDataStream.addSink(sink)
    env.execute()
  }

and you get the misleading results for timestamp_ltz:


a,2020-10-10T00:00,2020-10-10T00:00:05,2020-10-09T16:00:04.999Z,4,11.10,5.0,1.0,2,Hi|Comment#1

a,2020-10-10T00:00:05,2020-10-10T00:00:10,2020-10-09T16:00:09.999Z,1,3.33,null,3.0,1,Comment#2

b,2020-10-10T00:00:05,2020-10-10T00:00:10,2020-10-09T16:00:09.999Z,2,6.66,6.0,3.0,2,Hello|Hi

b,2020-10-10T00:00:15,2020-10-10T00:00:20,2020-10-09T16:00:19.999Z,1,4.44,4.0,4.0,1,Hi

b,2020-10-10T00:00:30,2020-10-10T00:00:35,2020-10-09T16:00:34.999Z,1,3.33,3.0,3.0,1,Comment#3

null,2020-10-10T00:00:30,2020-10-10T00:00:35,2020-10-09T16:00:34.999Z,1,7.77,7.0,7.0,0,null

We aims to address window correctness issue in DST timezone, there’re
> detailed explanation in CALCITE-4563.


Could you please explain that a bit more? I don't understand the problem.
>From my point of view, the problem you're describing there originates
exactly from the fact that we mix up TIMESTAMP_LTZ with TIMESTAMP. The way
I see it is that we want to put TIMESTAMP_LTZ into the windows of TIMESTAMP
type. TIMESTAMP_LTZ has Instant semantics, and as such I don't really
understand how DST comes to play there. Instant clearly identifies a point
in time and thus should be nicely grouped into equal windows.

What you're describing in the linked JIRA, in my opinion, is that you have
a TIMESTAMP_LTZ time attribute (instant semantics), but you want to group
by wall clock semantics (TIMESTAMP). I think this should be achieved, if
necessary, by first casting the time attribute to TIMESTAMP and then
performing the grouping. The casting would already take care of the DST
shift.

I still believe that window_start, window_end and window_time should return
the same type based on the input time attribute type.

Happy to hear your thoughts.

Best,
Dawid

On Fri, 8 Nov 2024 at 08:21, Leonard Xu <xbjt...@gmail.com> wrote:

> Thanks Dawid for bringing this ticket to dev mailing list and Timo’s
> kindly ping.
>
> Firstly the example result is a little strange for me too, the print
> window_time looks incorrect, Could you post your entire example especially
> your session time zone?
>
> Back to the window_start/end return type, both window TVF and legacy
> SqlGroupedWindowFunction share same return type TIMESTAMP which means
> timestamp literal, and it’s by design. We aims to address window
> correctness issue in DST timezone, there’re detailed explanation in
> CALCITE-4563.
>
>
> Best,
> Leonard
>
> [1]https://issues.apache.org/jira/browse/CALCITE-4563
>
>
>
> >> I wanted to bring your attention to FLINK-36665[1].
> >> I believe the current behaviour is confusing and I'd like to fix it.
> >> However, since window operations are a very important feature I'd like
> to
> >> gather feedback on to what extent we should keep backwards
> compatibility.
> >>    1. How should newly submitted queries behave? Are we fine with
> changing
> >>    the inference of these functions or would you prefer to have a
> feature flag
> >>    that would let us revert to the old inference logic? My preference
> would be
> >>    to simply change the inference. The current behaviour is very
> confusing and
> >>    I'd keep the behaviour for restored queries (see 2.)
> >>    2. My plan for migrated queries (queries restored from a compiled
> plan)
> >>    is that they won't be impacted. They'll keep producing the same
> results. We
> >>    have the output types serialized in the compiled plan which we can
> use to
> >>    produce the same type as before.
> >> What do you think?
> >> Best,
> >> Dawid
> >> [1] https://issues.apache.org/jira/browse/FLINK-36665
> >
>
>

Re: [DISCUSS] Changing return type of window properties (WINDOW_START/END)

Reply via email to