alamb commented on code in PR #20063:
URL: https://github.com/apache/datafusion/pull/20063#discussion_r2760898107
##########
datafusion/proto-common/src/from_proto/mod.rs:
##########
@@ -401,55 +406,83 @@ impl TryFrom<&protobuf::ScalarValue> for ScalarValue {
schema_ref.try_into()?
} else {
return Err(Error::General(
- "Invalid schema while deserializing ScalarValue::List"
+ "Invalid schema while deserializing nested ScalarValue"
.to_string(),
));
};
+ // IPC dictionary batch IDs are assigned when encoding the
schema, but our protobuf
Review Comment:
I feel like somehow I missing something key -- this seems like a pretty
massive overhead to create / decode a single value here.
That being said, it seems like the existing code is also doing the overhead,
so maybe it is fine for now 🤔
I wonder if we could pull this logic for creating the schema into its own
function to try and reduce the size of the overall method / make it easier to
understand
##########
datafusion/proto/tests/cases/roundtrip_physical_plan.rs:
##########
@@ -2564,3 +2564,22 @@ fn custom_proto_converter_intercepts() -> Result<()> {
Ok(())
}
+
+#[test]
Review Comment:
With the code change reverted, this test fails like
---- cases::roundtrip_physical_plan::roundtrip_call_null_scalar_struct_dict
stdout ----
Error: Plan("General error: Error encoding ScalarValue::List as IPC: Ipc
error: no dict id for field item")
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]