On Thu, Nov 10, 2016 at 1:56 PM, Nanley Chery <nanleych...@gmail.com> wrote:
> On Thu, Nov 10, 2016 at 12:52:44PM -0800, Jason Ekstrand wrote: > > On Nov 10, 2016 11:44, "Nanley Chery" <nanleych...@gmail.com> wrote: > > > > > > On Wed, Nov 09, 2016 at 06:43:23PM -0800, Jason Ekstrand wrote: > > > > On Wed, Nov 9, 2016 at 5:09 PM, Nanley Chery <nanleych...@gmail.com> > > wrote: > > > > > > > > > On Sat, Oct 22, 2016 at 10:50:31AM -0700, Jason Ekstrand wrote: > > > > > > This series does some fairly major surgery on color attachment > > surface > > > > > > state allocation and fill-out in the Intel Vulkan driver. This > is > > in > > > > > > preparation for doing color compression, fast-clears, and > > HiZ-capable > > > > > input > > > > > > attachments. Naturally, as with everything else I've done in the > > last 2 > > > > > > months, it also involves some non-trivial blorp work. > > > > > > > > > > > > Let's start off at the beginning... For a variety of reasons, we > > can't > > > > > > really know 100% of the details of an attachment's surface state > at > > any > > > > > > other places than vkCmdBeginRenderPass and vkCmdNextSubpss. The > > same > > > > > > applies for depth buffers if you consider 3DSTATE_DEPTH_BUFFER > and > > > > > friends > > > > > > to be the depth and stencil buffer's "surface state". That's a > > fairly > > > > > > strong statement, but there are a couple of reasons for this: > > > > > > > > > > > > 1) In order for fast-clears to work, the surface state has to > > contain > > > > > the > > > > > > clear color. (This is it's own packet for HiZ but not for > > color.) > > > > > We > > > > > > don't know the clear value until BeginRenderPass. This > means we > > > > > can't > > > > > > fully fill out the surface state in vkCmdCreateImageView. > > > > > > > > > > > > 2) The Vulkan spec requires that you be able to call > > > > > vkBeginCommandBuffer > > > > > > on a secondary command buffer with > > USAGE_RENDER_PASS_CONTINUE_BIT set > > > > > > but with a null framebuffer. In this case, the secondary is > > supposed > > > > > > to inherit the framebuffer from the primary. (This is not > > something > > > > > we > > > > > > have properly implemented until now.) This means that > anything > > that > > > > > is > > > > > > callable from a render-pass-continuing secondary command > buffer > > has > > > > > to > > > > > > be able to operate without knowing any surface details that > > aren't > > > > > part > > > > > > of the VkRenderPass object. Basically, all you know is the > > Vulkan > > > > > > format (not the isl format) and the sample count. > > > > > > > > > > > > Between the two of those, about the only two entrypoints left at > > which we > > > > > > actually know surface details are vkCmdBeginRenderPass and > > > > > vkCmdNextSubpass > > > > > > so we have to figure out how to do everything there. As it turns > > out, > > > > > this > > > > > > works out surprisingly well. The format and the sample count > turn > > out to > > > > > > be exactly the data we actually need in order to do all of our > > pipeline > > > > > > programming. The only hard part is refactoring things so that it > > pulls > > > > > the > > > > > > data from the render pass instead of the framebuffer. There are > a > > number > > > > > > of places where we were grabbing the image view for an attachment > > because > > > > > > we either wanted to shove something into blorp or because we > wanted > > the > > > > > > format and we were lazy. > > > > > > > > > > > > The approach taken in this patch series is the following: > > > > > > > > > > > > 1) Instead of allocating render target surface states in > > > > > vkCreateImageView, > > > > > > we allocate them as part of render pass setup in > > > > > vkCmdBeginRenderPass. > > > > > > All of the surface states we will ever need (including a null > > surface > > > > > > state) are allocated up-front out of a single contiguous > block. > > > > > > > > > > > > 2) For secondary command buffers with > > USAGE_RENDER_PASS_CONTINUE_BIT > > > > > set, > > > > > > we allocate storage for all of the surface states but don't > > actually > > > > > > fill them out. In the secondary command buffer, all binding > > tables > > > > > > refer to these surface states rather than the ones in the > > primary. > > > > > > > > > > > > 3) A blorp entrypoint is added that performs a clear operation > > without > > > > > > touching the depth/stencil buffer state and with a color > > attachment > > > > > > binding table explicitly provided by the caller. This means > > that > > > > > even > > > > > > our blorp clears are using the surface states allocated in > > > > > > vkCmdBeginRenderPass. Unfortunately, this turned out to be > > more work > > > > > > than expected because I had to add vertex shader support to > > blorp > > > > > along > > > > > > the way. > > > > > > > > > > > > 4) Here's the tricky bit. When vkCmdExecuteCommands is called > > during a > > > > > > render pass, we use transform feedback (yeah, crazy) to copy > the > > > > > block > > > > > > of surface states from the primary into the secondary right > > before > > > > > > executing the secondary. > > > > > > > > > > Could we perform a CPU memcpy at this stage? > > > > > > > > > > > > > We can but only if the secondary is created with the > > SIMULTANEOUS_USE_BIT > > > > unset. If SIMULTANEOUS_USE_BIT is set, then it may be used in > multiple > > > > primaries at the same time. In this case, since we don't know the > order > > > > they will be submitted to the GPU, we have to do the copy at the last > > > > minute. It may be a bit more performant to do the CPU memcpy in that > > case. > > > > > > > > I didn't do that yet because I wanted to let it bake for a while with > > just > > > > the GPU memcpy. The CPU memcpy is guaranteed to be reliable but the > GPU > > > > memcpy isn't. By leaving it as just GPU memcpy, we'll get better > tests > > > > coverage of the crazy path. If we find that it's causing problems, > > doing a > > > > CPU copy is a nice little optimization to have in our back pockets. > > > > > > > > > > Got it. For this series, I think it would be helpful to print some sort > > > of warning that surface states may be invalid in aub dumps. > > > > We could, I suppose. It would print every time someone uses a secondary > > but that's probably OK as long as it's only in debug builds. > > > > > We don't need to do the following now, but if we wanted to, I think we > > > could always do a CPU memcpy. One way to accomplish this is to make > > > secondaries with both SIMULTANEOUS_USE_BIT and RENDER_PASS_CONTINUE_BIT > > > set use a contiguous batch. We could then emit this batch into the > > > primary and replace the surface state block at vkCmdExecuteCommands(). > > > > Replacing surface state blocks is much harder than it sound. You not > only > > have to allocate and copy but you also have to go in and patch up every > > STATE_BASE_ADDRESS and probably patch up all the binding tables as well > > because their address relative to the center of the surface state pool > and > > with it their address relative to the surface states had changed. > > > > By replace the surface state block, I mean copy primary->state.render_pass_ > states > into secondary->state.render_pass_states. That is what we're doing with > the GPU memcpy isn't it? (I'm intentionally omitting bo and offset > details here). > Yes but we can only do a CPU memcpy of SIMULTANEOUS_USE_BIT isn't set. If it is set, then the only way to do a CPU copy is if we allocate a new copy of render_pass_states for each primary it's used in. If we make a copy of render_pass_states, then we have to update all of the binding tables to point to it. This means we have to allocate new binding tables which means we have to patch STATE_BASE_ADDRESS and on down the rabbit hole you go. In other words, we can't just patch up the one part of it from the CPU. As soon as we make a new copy of one part, we're making a new copy of almost everything. > > So, yes, we *can* do it in the sense that is allowed by the api but it > will > > add substantial complexity to ExecuteCommands. > > > > > This matches the following note from the spec: > > > > > > On some implementations, not using the > > > VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT bit enables command > > > buffers to be patched in-place if needed, rather than creating a copy > > > of the command buffer. > > > > > > - Nanley >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev