Mogball added inline comments.

================
Comment at: mlir/lib/Transforms/Utils/CommutativityUtils.cpp:329-353
+  assert(frontPosition >= 0 && frontPosition < bfsOfOperands.size() &&
+         "`frontPosition` should be valid");
+  unsigned positionOfOperandToShift;
+  bool foundOperandToShift = false;
+  for (auto &indexedBfsOfOperand : llvm::enumerate(bfsOfOperands)) {
+    std::unique_ptr<OperandBFS> &bfsOfOperand = indexedBfsOfOperand.value();
+    if (bfsOfOperand->isSorted)
----------------
srishti-pm wrote:
> Mogball wrote:
> > srishti-pm wrote:
> > > Mogball wrote:
> > > > There is no way you need this much code. A `std::swap` between the 
> > > > current operand and the first unsorted position should be enough.
> > > If I do a swap, the sorting will no longer be stable and I believe that 
> > > there was a discussion that concluded with the fact that "we want stable 
> > > sorting".
> > That's true, but shifting like this is very slow as well. At this point, 
> > you might want to give `std::stable_sort` with a custom comparator that 
> > does extra BFS iterations on demand a try.
> So, this is what I think:
> 
> The number of commutative operands is not expected to be huge. So, we can 
> afford to do shifting. In most cases, we wouldn't have to shift more than 1 
> or 2 positions. But, the custom comparator might cost us a lot, considering 
> that each BFS could potentially be very large, especially for deep learning 
> models. So, doing the BFS traversals again and again for the same operand, 
> even though caching will be involved, doesn't sound like a good idea to me.
> 
> What are your views?
Very rough estimate: on its own, this function is N. Finding the smallest key 
is N, and then finding all matching elements is N. This function is called for 
each operand that needs to be moved, but the number of such operands decreases. 
So the sort itself averages out to be 3N^2 iterations over the operand list.

Now for traversals, doing BFS on demand inside the comparator doesn't mean it 
has to restart every time. It would do extra iterations on top of existing 
iteration results only when needed to break ties. In your case, you do an extra 
iteration of BFS for all operands if the current smallest key is identical, not 
just for the ones needed. It's hard to estimate the number of iterations of 
BFS, but certainly it's more in your case. Using `std::stable_sort` would also 
bring the complexity down to N logN


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124750/new/

https://reviews.llvm.org/D124750

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to