crepererum commented on issue #14256:
URL: https://github.com/apache/datafusion/issues/14256#issuecomment-2624035825

   I have another hunch, but still need to proof it: DataFusion uses a lot of
   
   ```rust
   fn f(param: impl Into<Type>, ...) -> ... {
       let param: Type = param.into();
       ...
   }
   
   // call-site
   let x: OtherType = ...;
   f(x, ...);
   ```
   
   This is mostly so that the caller can stick a lot of parameters into methods 
w/o needing to convert it to the right type. Sometimes it's also used for 
backwards compatible-ish changes[^1].
   
   In Rust that `impl` constructs expands roughly to
   
   ```rust
   fn f<T>(param: T) -> ...
   where
       T: Into<Type>
   {
       let param: Type = param.into();
       ...
   }
   ```
   
   Since Rust generics are implemented using 
[monomorphization](https://rustc-dev-guide.rust-lang.org/backend/monomorph.html),
 this leads to the following issues:
   
   1. **type replication:** The function including its body are compiled for 
every parameter `T`. And this also includes multiple copies of the LLVM IR and 
resulting machine code. Note that this includes the WHOLE body, not just the 
`let param: Type = param.into();` part[^2].
   2. **delayed compilation:** Since a crate that contains the aforementioned 
method cannot know all types `T` that you eventually gonna use, the compilation 
is done on the call site (instead the definition site). This however also means 
that you get even more copies of the function, because if you have a crate that 
defines the method and two crates that use it, the two user crates are compiled 
independently.
   
   My guess is that while this leads to "nice APIs", it leads to a lot of 
compilation overhead both within DataFusion and for its users. I would 
therefore suggest that we change methods to look like this:
   
   ```rust
   fn f(param: Type, ...) -> ... {
       ...
   }
   
   // call-site
   let x: OtherType = ...;
   f(x.into(), ...);
   ```
   
   An intermediate solution that requires more code, has a slightly higher 
compilation overhead (due to more generics) but isn't as bad as the original 
version would be this:
   
   ```rust
   fn f(param: impl Into<Type>, ...) -> ... {
       f_impl(param.into(), ...)
   }
   
   // private helper method
   // may want to use `#[inline(never)]` here
   fn f_impl(param: Type, ...) -> ... {
       ...
   }
   
   // call-site
   let x: OtherType = ...;
   f(x, ...);
   ```
   
   Technically this also applies to `impl IntoIterator<Item = Type>` and `impl 
Iterator<Item = Type>`, but there it's usually harder to fix. There we would 
have the following options, all of which are slower than the original one due 
to dynamic dispatch:
   
   ```rust
   // original version
   fn g(it: impl IntoIterator<Item = Type>) {}
   
   // unsized locals
   // requires 
https://doc.rust-lang.org/beta/unstable-book/language-features/unsized-locals.html
   fn g(it: dyn Iterator<Item = Type>) {}
   
   // box version
   fn g(it: Box<dyn Iterator<Item = Type>>) {}
   
   // pass mut ref
   fn g(it: &mut dyn Iterator<Item = Type>) {}
   ```
   
   So I would leave the iterator cases out for now.
   
   [^1]: That's strictly speaking NOT true, changing `param: String` to `param: 
impl Into<String>` is NOT backwards compatible since it affects type inference 
on the call site.
   [^2]: This partial replication of code might one day be possible via 
[polymorphization](https://github.com/rust-lang/rust/issues/124962), but I 
don't expect this to happen within the next 3 years.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to