[ 
https://issues.apache.org/jira/browse/FLINK-35606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Khachatryan updated FLINK-35606:
--------------------------------------
    Description: 
Follow up the test for https://issues.apache.org/jira/browse/FLINK-26050

The problem occurs when using RocksDB and specific queries/jobs (please see the 
ticket for the detailed description).

To test the solution, run the following query with RocksDB as a state backend:

 
{code:java}
INSERT INTO top_5_highest_view_time
SELECT *
FROM   (
                SELECT   *,
                         ROW_NUMBER() OVER (PARTITION BY window_start, 
window_end ORDER BY view_time DESC) AS rownum
                FROM     (
                                  SELECT   window_start,
                                           window_end,
                                           product_id,
                                           SUM(view_time) AS view_time,
                                           COUNT(*)       AS cnt
                                  FROM     TABLE(TUMBLE(TABLE 
`shoe_clickstream`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES))
                                  GROUP BY window_start,
                                           window_end,
                                           product_id))
WHERE  rownum <= 5;{code}
 

With the feature disabled (default), the number of files in rocksdb working 
directory (as well as in the checkpoint) should grow indefinitely.

 

With feature enabled, the number of files should stays constant (as they should 
get merged with each other).

To enable the feature, set 
{code:java}
state.backend.rocksdb.manual-compaction.min-interval{code}
 set to 1 minute for example.

 

Please consult 
[https://github.com/apache/flink/blob/e7d7db3b6f87e53d9bace2a16cf95e5f7a79087a/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/sstmerge/RocksDBManualCompactionOptions.java#L29]
 for other options if necessary.

  was:
Follow up the test for https://issues.apache.org/jira/browse/FLINK-26050

The problem occurs when using RocksDB and specific queries/jobs (please see the 
ticket for the detailed description).

To test the solution, run the following query with RocksDB as a state backend:
```
INSERT INTO top_5_highest_view_time
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY window_start, window_end ORDER BY view_time 
DESC) AS rownum
FROM (
SELECT window_start,
window_end,
product_id,
SUM(view_time) AS view_time,
COUNT(*) AS cnt
FROM TABLE(TUMBLE(TABLE `shoe_clickstream`, DESCRIPTOR($rowtime), INTERVAL '10' 
MINUTES))
GROUP BY window_start,
window_end,
product_id))
WHERE rownum <= 5;
```
With the feature disabled (default), the number of files in rocksdb working 
directory (as well as in the checkpoint) should grow indefinitely.

With feature enabled, the number of files should stays constant (as they should 
get merged with each other).
To enable the feature, set 
`state.backend.rocksdb.manual-compaction.min-interval` set to 1 minute for 
example.

Pplease make 
[https://github.com/apache/flink/blob/e7d7db3b6f87e53d9bace2a16cf95e5f7a79087a/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/sstmerge/RocksDBManualCompactionOptions.java#L29]
 for other options


> Release Testing Instructions: Verify FLINK-26050 Too many small sst files in 
> rocksdb state backend when using time window created in ascending order
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-35606
>                 URL: https://issues.apache.org/jira/browse/FLINK-35606
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / State Backends
>            Reporter: Rui Fan
>            Assignee: Roman Khachatryan
>            Priority: Blocker
>              Labels: release-testing
>             Fix For: 1.20.0
>
>
> Follow up the test for https://issues.apache.org/jira/browse/FLINK-26050
> The problem occurs when using RocksDB and specific queries/jobs (please see 
> the ticket for the detailed description).
> To test the solution, run the following query with RocksDB as a state backend:
>  
> {code:java}
> INSERT INTO top_5_highest_view_time
> SELECT *
> FROM   (
>                 SELECT   *,
>                          ROW_NUMBER() OVER (PARTITION BY window_start, 
> window_end ORDER BY view_time DESC) AS rownum
>                 FROM     (
>                                   SELECT   window_start,
>                                            window_end,
>                                            product_id,
>                                            SUM(view_time) AS view_time,
>                                            COUNT(*)       AS cnt
>                                   FROM     TABLE(TUMBLE(TABLE 
> `shoe_clickstream`, DESCRIPTOR($rowtime), INTERVAL '10' MINUTES))
>                                   GROUP BY window_start,
>                                            window_end,
>                                            product_id))
> WHERE  rownum <= 5;{code}
>  
> With the feature disabled (default), the number of files in rocksdb working 
> directory (as well as in the checkpoint) should grow indefinitely.
>  
> With feature enabled, the number of files should stays constant (as they 
> should get merged with each other).
> To enable the feature, set 
> {code:java}
> state.backend.rocksdb.manual-compaction.min-interval{code}
>  set to 1 minute for example.
>  
> Please consult 
> [https://github.com/apache/flink/blob/e7d7db3b6f87e53d9bace2a16cf95e5f7a79087a/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/sstmerge/RocksDBManualCompactionOptions.java#L29]
>  for other options if necessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to