XiangpengHao commented on code in PR #15325:
URL: https://github.com/apache/datafusion/pull/15325#discussion_r2008190901

##########
datafusion/wasmtest/src/lib.rs:
##########
@@ -182,4 +182,29 @@ mod test {
         let task_ctx = ctx.task_ctx();
         let _ = collect(physical_plan, task_ctx).await.unwrap();
     }
+
+    #[wasm_bindgen_test(unsupported = tokio::test)]
+    async fn test_parquet_write() {
+        let schema = Arc::new(Schema::new(vec![
+            Field::new("id", DataType::Int32, false),
+            Field::new("value", DataType::Utf8, false),
+        ]));
+
+        let data: Vec<ArrayRef> = vec![
+            Arc::new(Int32Array::from(vec![1])),
+            Arc::new(StringArray::from(vec!["a"])),
+        ];
+
+        let batch = RecordBatch::try_new(schema.clone(), data).unwrap();
+        let mut buffer = Vec::new();
+        let mut writer = datafusion::parquet::arrow::ArrowWriter::try_new(
+            &mut buffer,
+            schema.clone(),
+            None,
+        )
+        .unwrap();
+
+        writer.write(&batch).unwrap();

Review Comment:
   I agree; I think the current code tests the re-exported Parquet 
functionalities, not touching the DataFusion-related code. Ideally, we should 
test the end-to-end Parquet reading process.
   
   The process roughly looks like this:
   1. Create a [in-memory 
object_store](https://docs.rs/object_store/latest/object_store/memory/struct.InMemory.html),
 and put the Parquet data you generated into the object_store.
   2. [Register the 
object_store](https://github.com/apache/datafusion/blob/3269f01b42021cfab181577d579b0544808b4fca/datafusion/core/src/execution/context/mod.rs#L494)
 along with the path to the DataFusion.
   3. Run a SQL query from the DataFusion side to see if the results can be 
read back.
   
   A loosely related test can be found here: 
https://github.com/XiangpengHao/parquet-viewer/blob/main/src/tests.rs#L9



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to