Hello flink team!
How to properly accumulate streaming data into the avro file partition by the hour.
My current implementation data from the data stream is converted to a table and it is saved in an avro file.
Similar to this:
t_env.execute_sql("""
CREATE TABLE mySink (
id STRING,
name STRING,
data_ranges ARRAY<ROW<start BIGINT, end BIGINT>>,
meta ARRAY<ROW<name STRING, text STRING>>,
current_hour INT
) partitioned by(current_hour) WITH (
'connector' = 'filesystem',
'format' = 'avro',
'path' = '/opt/pyflink-walkthrough/output/table',
'sink.rolling-policy.rollover-interval' = '1 hour',
'partition.time-extractor.timestamp-pattern'='$current_hour',
'sink.partition-commit.delay'='1 hour',
'sink.partition-commit.trigger'='process-time',
'sink.partition-commit.policy.kind'='success-file'
)
""")
Maybe it can be done better? (I'm not sure if this works properly at all)