goldmedal commented on code in PR #11547:
URL: https://github.com/apache/datafusion/pull/11547#discussion_r1688090369
##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -1785,6 +1798,98 @@ fn from_substrait_literal(
}
}
}
+ Some(LiteralType::Map(m)) => {
+ // Each entry should start the name index from the same value,
then we increase it
+ // once at the end
+ let mut entry_name_idx = *name_idx;
+ let entries = m
+ .key_values
+ .iter()
+ .map(|kv| {
+ entry_name_idx = *name_idx;
+ let key_sv = from_substrait_literal(
+ kv.key.as_ref().unwrap(),
+ extensions,
+ dfs_names,
+ &mut entry_name_idx,
+ )?;
+ let value_sv = from_substrait_literal(
+ kv.value.as_ref().unwrap(),
+ extensions,
+ dfs_names,
+ &mut entry_name_idx,
+ )?;
+ ScalarStructBuilder::new()
Review Comment:
> this was the most high-level way of creating the map I could think of, lmk
if you have better ideas!
It looks good to me. If you intend to be more efficient, I suggest referring
to how `make_map_batch_internal` creates a `MapArray`.
https://github.com/apache/datafusion/blob/77311a5896272c7ed252d8cd53d48ec6ea7c0ccf/datafusion/functions-array/src/map.rs#L80
You need to partition the key and value pairs into two arrays, and build the
`MapArray` based on them. Maybe you can refer to `plan_make_map`.
https://github.com/apache/datafusion/blob/77311a5896272c7ed252d8cd53d48ec6ea7c0ccf/datafusion/functions-array/src/planner.rs#L104
However, I think it just some improvements. We don't need to do that in this
PR.
##########
datafusion/common/src/hash_utils.rs:
##########
@@ -692,6 +732,64 @@ mod tests {
assert_eq!(hashes[0], hashes[1]);
}
+ #[test]
+ // Tests actual values of hashes, which are different if forcing collisions
+ #[cfg(not(feature = "force_hash_collisions"))]
+ fn create_hashes_for_map_arrays() {
+ let mut builder =
+ MapBuilder::new(None, StringBuilder::new(), Int32Builder::new());
+ // Row 0
+ builder.keys().append_value("key1");
+ builder.keys().append_value("key2");
+ builder.values().append_value(10);
+ builder.values().append_value(11);
+ builder.append(true).unwrap();
+ // Row 1
+ builder.keys().append_value("key1");
+ builder.keys().append_value("key2");
+ builder.values().append_value(10);
+ builder.values().append_value(11);
+ builder.append(true).unwrap();
+ // Row 2
+ builder.keys().append_value("key1");
+ builder.keys().append_value("key2");
+ builder.values().append_value(10);
+ builder.values().append_value(12);
+ builder.append(true).unwrap();
+ // Row 3
+ builder.keys().append_value("key1");
+ builder.keys().append_value("key3");
+ builder.values().append_value(10);
+ builder.values().append_value(11);
+ builder.append(true).unwrap();
+ // Row 4
+ builder.keys().append_value("key1");
+ builder.values().append_value(10);
+ builder.append(true).unwrap();
+ // Row 5
+ builder.keys().append_value("key1");
+ builder.values().append_null();
+ builder.append(true).unwrap();
+ // Row 6
+ builder.append(true).unwrap();
+ // Row 7
+ builder.keys().append_value("key1");
+ builder.values().append_value(10);
+ builder.append(false).unwrap();
+
+ let array = Arc::new(builder.finish()) as ArrayRef;
+
+ let random_state = RandomState::with_seeds(0, 0, 0, 0);
+ let mut hashes = vec![0; array.len()];
+ create_hashes(&[array], &random_state, &mut hashes).unwrap();
+ assert_eq!(hashes[0], hashes[1]); // same value
+ assert_ne!(hashes[0], hashes[2]); // different key
+ assert_ne!(hashes[0], hashes[3]); // different value
Review Comment:
```suggestion
assert_ne!(hashes[0], hashes[2]); // different value
assert_ne!(hashes[0], hashes[3]); // different key
```
I guess the comments are wrong.
The difference between `Row 0` and `Row 2` is the value of `key2`: `{'key2':
11}` in Row 0 and `{'key2': 12}` in Row 2.
However, the difference between `Row 0` and `Row 3` is the key: `key2` in
Row 0 and `key3` in Row 3.
Did I say that correctly?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]