Re: [PR] feat: Update Parquet row filtering to handle type coercion [datafusion]

via GitHub Thu, 30 May 2024 07:15:49 -0700


jeffreyssmith2nd commented on code in PR #10716:
URL: https://github.com/apache/datafusion/pull/10716#discussion_r1620812645



##########
datafusion/core/src/datasource/schema_adapter.rs:
##########
@@ -75,9 +75,16 @@ pub trait SchemaAdapter: Send + Sync {
 
 /// Creates a `SchemaMapping` that can be used to cast or map the columns
 /// from the file schema to the table schema.
-pub trait SchemaMapper: Send + Sync {
+pub trait SchemaMapper: Debug + Send + Sync {
     /// Adapts a `RecordBatch` to match the `table_schema` using the stored 
mapping and conversions.
     fn map_batch(&self, batch: RecordBatch) -> 
datafusion_common::Result<RecordBatch>;
+
+    /// Adapts a `RecordBatch` that does not have all the columns (as defined 
in the schema).

Review Comment:
   As I understand it, when `DatafusionArrowPredicate::evaluate` is called, the 
`RecordBatch` only contains one column. If we use the `map_batch` function, it 
indexes into the `RecordBatch` as if all the columns are there.
   
   For example, if we have a schema with fields: `[{name: "value", type: 
"Float64"},{name: "time", type: "Timestamp"}]`, then map_batch will try to 
index at `1` for time, but the `RecordBatch` won't actually have that index 
since only one column is passed in.
   
   This definitely may be the wrong way to achieve what I wanted, but the 
intent was to look the field up by name in the `table_schema` so that we don't 
have that indexing problem.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Update Parquet row filtering to handle type coercion [datafusion]

Reply via email to