Hello flink team!
 
How to properly accumulate streaming data into the avro file partition by the hour.
 
My current implementation data from the data stream is converted to a table and it is saved in an avro file.
Similar to this:
 
    t_env.execute_sql("""
            CREATE TABLE mySink (
              id STRING,
              name STRING,
              data_ranges ARRAY<ROW<start BIGINT, end BIGINT>>,
              meta ARRAY<ROW<name STRING, text STRING>>,
              current_hour INT
              
            ) partitioned by(current_hour) WITH (
              'connector' = 'filesystem',
              'format' = 'avro',
              'path' = '/opt/pyflink-walkthrough/output/table',
              'sink.rolling-policy.rollover-interval' = '1 hour',
              'partition.time-extractor.timestamp-pattern'='$current_hour',
              'sink.partition-commit.delay'='1 hour',
              'sink.partition-commit.trigger'='process-time',
              'sink.partition-commit.policy.kind'='success-file'
              
            
            )
        """)
Maybe it can be done better? (I'm not sure if this works properly at all)

Reply via email to