Copilot commented on code in PR #10149:
URL: https://github.com/apache/gravitino/pull/10149#discussion_r2877967778


##########
docs/manage-statistics-in-gravitino.md:
##########
@@ -307,16 +307,17 @@ For example, if you set an extra property `foo` to `bar` 
for Lance storage optio
 For Lance remote storage, you can refer to the document 
[here](https://lancedb.github.io/lance/usage/storage/).
 
 
-| Configuration item                                                   | 
Description                          | Default value                        | 
Required | Since version |
-|----------------------------------------------------------------------|--------------------------------------|--------------------------------------|----------|---------------|
-| `gravitino.stats.partition.storageOption.location`                   | The 
location of Lance files          | `${GRAVITINO_HOME}/data/lance`       | No    
   | 1.0.0         |
-| `gravitino.stats.partition.storageOption.maxRowsPerFile`             | The 
maximum rows per file            | `1000000`                            | No    
   | 1.0.0         |
-| `gravitino.stats.partition.storageOption.maxBytesPerFile`            | The 
maximum bytes per file           | `104857600`                          | No    
   | 1.0.0         |
-| `gravitino.stats.partition.storageOption.maxRowsPerGroup`            | The 
maximum rows per group           | `1000000`                            | No    
   | 1.0.0         |
-| `gravitino.stats.partition.storageOption.readBatchSize`              | The 
batch record number when reading | `10000`                              | No    
   | 1.0.0         |
-| `gravitino.stats.partition.storageOption.datasetCacheSize`           | size 
of dataset cache for Lance      | `0`, It means we don't use the cache | No     
  | 1.0.0         |
-| `gravitino.stats.partition.storageOption.metadataFileCacheSizeBytes` | The 
Lance's metadata file cache size | `102400`                             | No    
   | 1.0.0         |
-| `gravitino.stats.partition.storageOption.indexCacheSizeBytes`        | The 
Lance's index cache size         | `102400`                             | No    
   | 1.0.0         |
+| Configuration item                                                   | 
Description                                                | Default value      
                  | Required | Since version |
+|----------------------------------------------------------------------|------------------------------------------------------------|--------------------------------------|----------|---------------|
+| `gravitino.stats.partition.storageOption.location`                   | The 
location of Lance files                                | 
`${GRAVITINO_HOME}/data/lance`       | No       | 1.0.0         |
+| `gravitino.stats.partition.storageOption.maxRowsPerFile`             | The 
maximum rows per file                                  | `1000000`              
              | No       | 1.0.0         |
+| `gravitino.stats.partition.storageOption.maxBytesPerFile`            | The 
maximum bytes per file                                 | `104857600`            
              | No       | 1.0.0         |
+| `gravitino.stats.partition.storageOption.maxRowsPerGroup`            | The 
maximum rows per group                                 | `1000000`              
              | No       | 1.0.0         |
+| `gravitino.stats.partition.storageOption.readBatchSize`              | The 
batch record number when reading                       | `10000`                
              | No       | 1.0.0         |
+| `gravitino.stats.partition.storageOption.datasetCacheSize`           | size 
of dataset cache for Lance                            | `0`, It means we don't 
use the cache | No       | 1.0.0         |
+| `gravitino.stats.partition.storageOption.metadataFileCacheSizeBytes` | The 
Lance's metadata file cache size                       | `102400`               
              | No       | 1.0.0         |
+| `gravitino.stats.partition.storageOption.indexCacheSizeBytes`        | The 
Lance's index cache size                               | `102400`               
              | No       | 1.0.0         |
+| `gravitino.stats.partition.storageOption.maxStatisticsPerUpdate`     | 
Maximum number of statistics allowed per update operation  | `100`              
                  | No       | 1.0.0         |

Review Comment:
   The newly added `maxStatisticsPerUpdate` row lists `Since version` as 
`1.0.0`, but this option is introduced in this PR. Please update the `Since 
version` column to the version where this configuration is first available (the 
repo currently targets `1.2.0-SNAPSHOT`).
   ```suggestion
   | `gravitino.stats.partition.storageOption.maxStatisticsPerUpdate`     | 
Maximum number of statistics allowed per update operation  | `100`              
                  | No       | 1.2.0-SNAPSHOT |
   ```



##########
core/src/test/java/org/apache/gravitino/stats/storage/TestLancePartitionStatisticStorage.java:
##########
@@ -378,6 +378,55 @@ public void testLancePartitionStatisticStorageWithCache() 
throws Exception {
     storage.close();
   }
 
+  @Test
+  public void testExceedMaxStatisticsPerUpdateLimit() throws Exception {
+    PartitionStatisticStorageFactory factory = new 
LancePartitionStatisticStorageFactory();
+
+    String metalakeName = "metalake";
+    String catalogName = "catalog";
+    String schemaName = "schema";
+    String tableName = "table";
+
+    MetadataObject metadataObject =
+        MetadataObjects.of(
+            Lists.newArrayList(catalogName, schemaName, tableName), 
MetadataObject.Type.TABLE);
+
+    EntityStore entityStore = mock(EntityStore.class);
+    TableEntity tableEntity = mock(TableEntity.class);
+    when(entityStore.get(any(), any(), any())).thenReturn(tableEntity);
+    when(tableEntity.id()).thenReturn(101L);
+    FieldUtils.writeField(GravitinoEnv.getInstance(), "entityStore", 
entityStore, true);
+
+    String location = 
Files.createTempDirectory("lance_stats_exceed_test").toString();
+    Map<String, String> properties = Maps.newHashMap();
+    properties.put("location", location);
+    // Default limit is 100, generate 101 statistics to exceed the limit
+
+    LancePartitionStatisticStorage storage =
+        (LancePartitionStatisticStorage) factory.create(properties);
+
+    // Generate 101 statistics which exceeds the default limit of 100
+    int count = 101;
+    Map<MetadataObject, Map<String, Map<String, StatisticValue<?>>>> 
originData =
+        generateData(metadataObject, count, 1);
+    Map<MetadataObject, List<PartitionStatisticsUpdate>> statisticsToUpdate =
+        convertData(originData);
+
+    List<MetadataObjectStatisticsUpdate> objectUpdates = Lists.newArrayList();
+    for (Map.Entry<MetadataObject, List<PartitionStatisticsUpdate>> entry :
+        statisticsToUpdate.entrySet()) {
+      objectUpdates.add(MetadataObjectStatisticsUpdate.of(entry.getKey(), 
entry.getValue()));
+    }
+
+    IllegalArgumentException exception =
+        Assertions.assertThrows(
+            IllegalArgumentException.class,
+            () -> storage.updateStatistics(metalakeName, objectUpdates));
+    Assertions.assertTrue(exception.getMessage().contains("exceeds the maximum 
limit"));
+
+    storage.close();

Review Comment:
   This test creates a temporary directory and a 
`LancePartitionStatisticStorage` instance but doesn’t guarantee cleanup if an 
assertion fails. Please use try/finally (or try-with-resources) to always 
`close()` the storage and delete the created temp/Lance dataset directory (as 
the other tests in this class do) to avoid leaking temp files/resources across 
test runs.
   ```suggestion
       String location = null;
       LancePartitionStatisticStorage storage = null;
       try {
         location = 
Files.createTempDirectory("lance_stats_exceed_test").toString();
         Map<String, String> properties = Maps.newHashMap();
         properties.put("location", location);
         // Default limit is 100, generate 101 statistics to exceed the limit
   
         storage = (LancePartitionStatisticStorage) factory.create(properties);
   
         // Generate 101 statistics which exceeds the default limit of 100
         int count = 101;
         Map<MetadataObject, Map<String, Map<String, StatisticValue<?>>>> 
originData =
             generateData(metadataObject, count, 1);
         Map<MetadataObject, List<PartitionStatisticsUpdate>> 
statisticsToUpdate =
             convertData(originData);
   
         List<MetadataObjectStatisticsUpdate> objectUpdates = 
Lists.newArrayList();
         for (Map.Entry<MetadataObject, List<PartitionStatisticsUpdate>> entry :
             statisticsToUpdate.entrySet()) {
           objectUpdates.add(MetadataObjectStatisticsUpdate.of(entry.getKey(), 
entry.getValue()));
         }
   
         IllegalArgumentException exception =
             Assertions.assertThrows(
                 IllegalArgumentException.class,
                 () -> storage.updateStatistics(metalakeName, objectUpdates));
         Assertions.assertTrue(exception.getMessage().contains("exceeds the 
maximum limit"));
       } finally {
         if (storage != null) {
           storage.close();
         }
         if (location != null) {
           FileUtils.deleteDirectory(new File(location));
         }
       }
   ```



##########
core/src/main/java/org/apache/gravitino/stats/storage/LancePartitionStatisticStorage.java:
##########
@@ -177,6 +180,14 @@ public LancePartitionStatisticStorage(Map<String, String> 
properties) {
         indexCacheSize > 0,
         "Lance partition statistics storage indexCacheSizeBytes must be 
positive");
 
+    this.maxStatisticsPerUpdate =
+        Integer.parseInt(
+            properties.getOrDefault(
+                MAX_STATISTICS_PER_UPDATE, 
String.valueOf(DEFAULT_MAX_STATISTICS_PER_UPDATE)));
+    Preconditions.checkArgument(

Review Comment:
   New validation is added for `maxStatisticsPerUpdate`, but there is no test 
covering invalid configurations (e.g., `0`, negative values, or non-integer 
values). Please add a dedicated test that asserts the factory/storage creation 
fails with an appropriate exception/message when `maxStatisticsPerUpdate` is 
invalid, to match the project’s “every change must have a corresponding test” 
rule.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to