crepererum commented on issue #14256: URL: https://github.com/apache/datafusion/issues/14256#issuecomment-2624035825
I have another hunch, but still need to proof it: DataFusion uses a lot of ```rust fn f(param: impl Into<Type>, ...) -> ... { let param: Type = param.into(); ... } // call-site let x: OtherType = ...; f(x, ...); ``` This is mostly so that the caller can stick a lot of parameters into methods w/o needing to convert it to the right type. Sometimes it's also used for backwards compatible-ish changes[^1]. In Rust that `impl` constructs expands roughly to ```rust fn f<T>(param: T) -> ... where T: Into<Type> { let param: Type = param.into(); ... } ``` Since Rust generics are implemented using [monomorphization](https://rustc-dev-guide.rust-lang.org/backend/monomorph.html), this leads to the following issues: 1. **type replication:** The function including its body are compiled for every parameter `T`. And this also includes multiple copies of the LLVM IR and resulting machine code. Note that this includes the WHOLE body, not just the `let param: Type = param.into();` part[^2]. 2. **delayed compilation:** Since a crate that contains the aforementioned method cannot know all types `T` that you eventually gonna use, the compilation is done on the call site (instead the definition site). This however also means that you get even more copies of the function, because if you have a crate that defines the method and two crates that use it, the two user crates are compiled independently. My guess is that while this leads to "nice APIs", it leads to a lot of compilation overhead both within DataFusion and for its users. I would therefore suggest that we change methods to look like this: ```rust fn f(param: Type, ...) -> ... { ... } // call-site let x: OtherType = ...; f(x.into(), ...); ``` An intermediate solution that requires more code, has a slightly higher compilation overhead (due to more generics) but isn't as bad as the original version would be this: ```rust fn f(param: impl Into<Type>, ...) -> ... { f_impl(param.into(), ...) } // private helper method // may want to use `#[inline(never)]` here fn f_impl(param: Type, ...) -> ... { ... } // call-site let x: OtherType = ...; f(x, ...); ``` Technically this also applies to `impl IntoIterator<Item = Type>` and `impl Iterator<Item = Type>`, but there it's usually harder to fix. There we would have the following options, all of which are slower than the original one due to dynamic dispatch: ```rust // original version fn g(it: impl IntoIterator<Item = Type>) {} // unsized locals // requires https://doc.rust-lang.org/beta/unstable-book/language-features/unsized-locals.html fn g(it: dyn Iterator<Item = Type>) {} // box version fn g(it: Box<dyn Iterator<Item = Type>>) {} // pass mut ref fn g(it: &mut dyn Iterator<Item = Type>) {} ``` So I would leave the iterator cases out for now. [^1]: That's strictly speaking NOT true, changing `param: String` to `param: impl Into<String>` is NOT backwards compatible since it affects type inference on the call site. [^2]: This partial replication of code might one day be possible via [polymorphization](https://github.com/rust-lang/rust/issues/124962), but I don't expect this to happen within the next 3 years. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org