jeffreyssmith2nd commented on code in PR #10716:
URL: https://github.com/apache/datafusion/pull/10716#discussion_r1620812645
##########
datafusion/core/src/datasource/schema_adapter.rs:
##########
@@ -75,9 +75,16 @@ pub trait SchemaAdapter: Send + Sync {
/// Creates a `SchemaMapping` that can be used to cast or map the columns
/// from the file schema to the table schema.
-pub trait SchemaMapper: Send + Sync {
+pub trait SchemaMapper: Debug + Send + Sync {
/// Adapts a `RecordBatch` to match the `table_schema` using the stored
mapping and conversions.
fn map_batch(&self, batch: RecordBatch) ->
datafusion_common::Result<RecordBatch>;
+
+ /// Adapts a `RecordBatch` that does not have all the columns (as defined
in the schema).
Review Comment:
As I understand it, when `DatafusionArrowPredicate::evaluate` is called, the
`RecordBatch` only contains one column. If we use the `map_batch` function, it
indexes into the `RecordBatch` as if all the columns are there.
For example, if we have a schema with fields: `[{name: "value", type:
"Float64"},{name: "time", type: "Timestamp"}]`, then map_batch will try to
index at `1` for time, but the `RecordBatch` won't actually have that index
since only one column is passed in.
This definitely may be the wrong way to achieve what I wanted, but the
intent was to look the field up by name in the `table_schema` so that we don't
have that indexing problem.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]