Hi:
    I'm new to structured streaming. Because the built-in API cannot
perform the Count Distinct operation of Window, I want to use
dropDuplicates first, and then perform the window count.
   But in the process of using, there are two problems:
           1. Because it is streaming computing, in the process of
deduplication, the state needs to be cleared in time, which requires the
cooperation of watermark. Assuming my event time field is consistently
              increasing, and I set the watermark to 1 hour, does it mean
that the data at 10 o'clock will only be compared in these data from 9
o'clock to 10 o'clock, and the data before 9 o'clock will be cleared ?
           2. Because it is window deduplication, I set the watermark
before deduplication to the window size.But after deduplication, I need to
call withWatermark () again to set the watermark to the real
               watermark. Will setting the watermark again take effect?

     Thanks a lot !

Reply via email to