Merged #77 into main.
--
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/77#event-7004931146
You are receiving this because you are subscribed to this thread.
Message ID:
Thanks everyone for the discussions. We have agreed on the design principles
and will continue to explore scheduling options. Let's keep the RFC open for
final comments until the end of this week.
--
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/77#i
Thank you very much on the comments, suggestions, and discussion, and I'm quite
happy with how the design evolved over the course of the discussions!
--
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/77#issuecomment-1182157349
You are receiving this be
Thanks everyone for the very fruitful discussions! We indeed have a good path
forward and are aligned on the principles that for the end to end optimization
we will maintain function interface invariance and achieve graph level layout
optimization via a combination of local decisions, reconstruc
cc @Hzfengsy @wrongtest-intellif it would be great if you can also take a
followup look
--
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/77#issuecomment-1180607081
You are receiving this because you are subscribed to this thread.
Message ID:
following up on this, I think we are in broad stroke agreement that we can
achieve our goals with blocl/fn attributes in IR as well as builtin assume. As
a result, my original blocker for the RFC has been resolved, would still be
great to work together to flesh out the details of schedule primit
> In general it is helpful to first keep schedule decision local, e.g.
> introducing a caching stage (AC, BC in the example), the compose with another
> reflowing pass to bring the decision to consumer/producers.
My goal with the latest update wasn't to require global decisions, but to make
loc
Thanks @Lunderberg for the update, I think we are moving towards positive
direction of overall IR design. Some additional feedbacks:
## Keep Schedule Decisions Local to PrimFunc then Compose
On schedule primitives, to be pragmatic, it would be helpful to have some of
the cross PrimFunc re-flowi
These make sense, and agreed that the TIR->global feedback is important for
enabling the layout reflow. Going back through the discussion, I think we're
converging on agreement on what features are required, and the main question
remaining are how to best provide annotation for non-local inform
> For this, would the layout re-flowing occur periodically during optimization?
This is a point where likely different variation of (some sort of
search)algorithm might be necessary, our first step would be to allow the TIR
level to give such feedback to the global level(via a probabilistic spac
> Our design principle at TIR level ideally we start with one instance of
> possibility, then use probabilistic space of meta-schedule to represent
> multiple choices.
For this, would the layout re-flowing occur periodically during optimization?
Otherwise, including transformations in the perf
> In general, a PrimFunc's interface could only be changed when calls into the
> PrimFunc are also modified to remain compatible.
Agreed, that is what I originally intended to say
> Is there a better term than "scheduling primitive" to describe layout
> transformations that impact input/outpu
> Talking about “constraints”, it is also useful to talk about categories of
> them, roughly we can divide them into three categories.
I like this breakdown, and agree. In this categorization, what I've been
calling "constraints" would be "assumptions". Double-checking in `builtin.h`,
it look
Added some examples to build on top of @Lunderberg 's example
## Transformation
The main difference between annotation and special handling are:
- annotation is not necessarily to for correctness of the program, but it may
provide hints towards future optimizations
- Without annotation, the pro
> For example, we may introduce explicit cache stage to add the padding, and
> mark this block for later processing.
Wouldn't that require a "remove entirely" annotation that was suggested against
[here](https://github.com/apache/tvm-rfcs/pull/77#issuecomment-1163019805)? I
could see how we co
> > For example, we may introduce explicit cache stage to add the padding, and
> > mark this block for later processing.
>
> Wouldn't that require a "remove entirely" annotation that was suggested
> against
> [here](https://github.com/apache/tvm-rfcs/pull/77#issuecomment-1163019805)? I
> could
Writing out some of my thoughts, to see if there's a way to express the
constraints while only using existing TIR features. The main goals would be as
follows.
1. Allow simplification of expressions based on the values present in the
padding.
2. Allow local simplifications to take advantage of
Indeed if buffer is used in annotation value that will change the semantic of a
node, however, that are different ways to represent this, as long as it can be
reconstructed later. For example, we may introduce explicit cache stage to add
the padding, and mark this block for later processing.
--
> It doesn't add additional semantic, the computation semantic stays the same,
> it is a hint to the graph compiler.
My apologies, I had meant the semantics of a node from the perspective of a TIR
transformation, not the semantics from the perspective of the computation being
described. For a
> So long as the constraints can be statically searched for, this approach
> makes sense to me. I would be more concerned about adding additional
> semantics to existing nodes, such as a AttrStmt node
It doesn't add additional semantic, the computation semantic stays the same, it
is a hint to t
> Indeed it is important to avoid having a separate compute definition for each
> workload on a new target. In this particular case, all computation definition
> would start with the original layout. Then there is a "schedule
> transformation" like transform layout which will generate the new st
> I'm still a bit confused with this approach, specifically how one would avoid
> having a separate compute definition for each workload on a new target
Indeed it is important to avoid having a separate compute definition for each
workload on a new target. In this particular case, all computatio
> Introducing changes to TIR would needs some additional thoughts that deserves
> some extra consideration. Due to the N*M complexity (where N is the TIR
> possibilities and M is the number of primitives to be supported) that needs
> to be handled in implementation (by backend implementers and p
Adding some additional discussion with @csullivan .
We agree that:
- There are different ways to encode layout and padding decisions:
- E0: BufferConstraint(as element in the IR)
- E1: Composing a stage that transforms the layout(a loop that represents
the mapping)
- Non-local rewrites ar
Thanks @csullivan for providing the overview. I agree that non-local approaches
2-4 are necessary. From the examples in this RFC I can also see how the
components C0-C2 can be used to support these non-local approaches. C0 + C1
allows to specify the constraints during scheduling, and propagate b
Thanks for sharing the contextual pointers for the community @vinx13. Agreed
the approaches discussed are both valid. I would actually like to argue the
stronger point that they are complimentary and are only appearing to be
contrary because we are considering too narrow of a scope.
It can be
Thanks for the discussion. To provide more context, the A0 approach we
discussed is TIR-Relax layout rewriting
https://github.com/tlc-pack/relax/issues/162 (the general idea is to lift such
transformation in TIR scheduling into the graph, and then cancels out redundant
intermediate transformati
Thanks for the all great discussions! It is so excited that we will have a more
powerful ability to handle all things like paddings and imperfect tiles.
Since our team rely on the code path of s-tir, we are extremely interested in
the story on s-tir. I would be very appreciated if we have some d
we discussed this at the June 6 [community
meeting](https://discuss.tvm.apache.org/t/next-tvm-community-meeting-june-8-2022/12900).
a significant chunk of the meeting was spent presenting the RFC, and we had
about 15 minutes of discussion at the end.
i think there is more to be discussed here.
this is on the agenda for tomorrow's [community
meeting](https://discuss.tvm.apache.org/t/next-tvm-community-meeting-june-8-2022/12900).
Perhaps we could discuss in higher bandwidth there?
--
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/77#issuecomm
30 matches
Mail list logo