goldmedal commented on issue #11268: URL: https://github.com/apache/datafusion/issues/11268#issuecomment-2211734625
> btw, there is probably a downside for using MapBuilder, since we need to create different builder for different types (therefore many macros) which easily causes code bloat. We have similar issue while building array function before #7988. I'm not sure what is the best approach yet (probably current draft function is the best), worth to explore it. Besides the code bloat, I'm also concerned about the performance. Because `MapBuilder` doesn't provide batch append (I think it's not its main purpose), we can only append data entry by entry. I might expect it to have O(n) complexity. I did some benchmarks to compare the `MapBuilder` version and the `MapArray::from()` version. https://github.com/goldmedal/datafusion/blob/95198c17eddd0a708e61e66076a383d5671646b7/datafusion/functions/src/core/map.rs#L132 I haven't tried any optimizations for them. The results are as follows: ```rust // The first run is for warm-up Generating 1 random key-value pair Time elapsed in make_map() is: 449.737µs Time elapsed in make_map_batch() is: 129.096µs Generating 10 random key-value pairs Time elapsed in make_map() is: 24.932µs Time elapsed in make_map_batch() is: 18.007µs Generating 50 random key-value pairs Time elapsed in make_map() is: 34.587µs Time elapsed in make_map_batch() is: 17.037µs Generating 100 random key-value pairs Time elapsed in make_map() is: 66.02µs Time elapsed in make_map_batch() is: 19.027µs Generating 500 random key-value pairs Time elapsed in make_map() is: 586.476µs Time elapsed in make_map_batch() is: 63.832µs Generating 1000 random key-value pairs Time elapsed in make_map() is: 722.771µs Time elapsed in make_map_batch() is: 47.455µs ``` The scenario for this function may not be for large maps, but I think the `MapArray::from()` version always has better performance and won't cause code bloat. I prefer this version. What do you think? @alamb @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
