wgtmac commented on code in PR #3586:
URL: https://github.com/apache/parquet-java/pull/3586#discussion_r3315110625


##########
parquet-column/src/main/java/org/apache/parquet/column/impl/ColumnValueCollector.java:
##########
@@ -45,6 +45,10 @@ class ColumnValueCollector {
   private SizeStatistics.Builder sizeStatisticsBuilder;
   private GeospatialStatistics.Builder geospatialStatisticsBuilder;
 
+  // track the required `num_nulls` field in DataPageHeaderV2
+  // 
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift
+  private int nullCount;

Review Comment:
   Could we keep this counter in ColumnWriterBase instead? nullCount is 
page-level metadata, like valueCount and pageRowCount, while 
ColumnValueCollector is otherwise about optional statistics and bloom filters. 
Keeping it with the other page counters would make the ownership clearer.



##########
parquet-column/src/main/java/org/apache/parquet/column/impl/ColumnWriterBase.java:
##########
@@ -403,6 +404,7 @@ void writePage() {
   abstract void writePage(
       int rowCount,
       int valueCount,
+      int nullCount,

Review Comment:
   This adds a V2-only value to the shared writePage signature, so 
ColumnWriterV1 has to accept an unused parameter. Consider keeping nullCount on 
the V2 path only, either through a V2-specific helper or a protected getter 
from the base class.



##########
parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestParquetWriter.java:
##########
@@ -858,4 +863,68 @@ public void testNoFlushAfterException() throws Exception {
     FileSystem fs = file.getFileSystem(conf);
     assertTrue(!fs.exists(file) || fs.getFileStatus(file).getLen() == 0);
   }
+
+  @Test
+  public void testV2PageNullCountWithStatisticsDisabled() throws Exception {
+    // Regression test: when using PARQUET_2_0 with statistics disabled on a 
nullable column,
+    // DataPageHeaderV2.num_nulls must still contain the correct null count 
(not -1).
+    MessageType schema = Types.buildMessage()
+        .required(INT32)
+        .named("id")
+        .optional(BINARY)
+        .as(stringType())
+        .named("value")
+        .named("test_schema");
+
+    File file = temp.newFile();
+    temp.delete();

Review Comment:
   This should delete only the placeholder file, not the TemporaryFolder root. 
file.delete() is enough to free the output path and avoids changing the test 
fixture lifecycle.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to