grigorypas wrote: > > Can you please elaborate what do you mean by it "changes the semantics of > > alwaysinline"? I am introducing a new attribute flatten_deep both on clang > > side and LLVM side. alwaysinline should still mean the same thing. > > You said patch 2 will update the alwaysinliner pass. `alwaysinline` has > previously always inlined a function unless it was illegal to do so. You're > now maybe not inlining depending on the `flatten_deep` attribute, which seems > like a cost heuristic encoded in the IR to me. > > > To clarify, our primary use case at Meta is to completely flatten functions > > by inlining the entire call tree. The max depth parameter is not intended > > as a core part of the user workflow, but rather as a safeguard to prevent > > issues if the call tree happens to be extremely deep. > > So you want to completely flatten functions but not completely flatten > functions? What exactly is the use case of flattening these functions?
Thank you for the feedback! Let me clarify the design: ## `alwaysinline` Semantics Are Preserved The `alwaysinline` semantics are **not** being changed. The original `alwaysinline` logic is applied first and takes precedence. The `flatten_deep` logic runs in the same pass but is applied at the end, after the standard `alwaysinline` processing. If a function has `alwaysinline`, it will be inlined according to the existing rules (unless illegal to do so), completely independent of any `flatten_deep` attributes. You can see this in the suggested implementation here: [https://github.com/grigorypas/llvm-project/tree/full_flattening](https://github.com/grigorypas/llvm-project/tree/full_flattening) ## `flatten_deep` as a Natural Extension of `flatten` `flatten_deep(N)` is a natural extension of the existing `flatten` attribute. While they differ in implementation, the motivation is similar: - **`flatten`**: Inlines all immediate callsites (single level) - implemented at frontend by marking direct calls with `alwaysinline` - **`flatten_deep(N)`**: Inlines recursively/transitively up to N levels deep - requires backend support to propagate through the call tree Importantly, **full/deep flattening cannot be achieved today with existing attributes**. You can't achieve transitive inlining across the entire call tree with current mechanisms. ## Max Depth as a Safeguard The max depth parameter is not a cost heuristic - it's a safety limit: - **Primary use case**: Complete flattening of the call tree (large N) - **Max depth parameter**: A safeguard to prevent compile-time explosions with unexpectedly deep call trees This is similar to other compiler safety limits (e.g., `-fconstexpr-depth=N`) - we want to flatten the entire call tree in normal cases, but need a circuit breaker for pathological edge cases. ## Use Case This feature is useful for performance-critical code where eliminating call overhead across the entire call tree is beneficial, such as: - Deeply nested hot paths in performance-sensitive applications - **PGO scenarios with stale profiles**: When adding new functions to hot paths, `flatten_deep(N)` may help where default bottom-up inlining decisions rely on incomplete or stale profile data Does this clarification address your concerns? https://github.com/llvm/llvm-project/pull/165777 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
