On Wed, Nov 2, 2016 at 9:43 AM, Chris Wilson <ch...@chris-wilson.co.uk> wrote:
> On Tue, Nov 01, 2016 at 03:35:13PM -0700, Jason Ekstrand wrote: > > +static void > > +write_reloc(const struct anv_device *device, void *p, uint64_t v) > > +{ > > + unsigned reloc_size = 0; > > + if (device->info.gen >= 8) { > > + /* From the Broadwell PRM Vol. 2a, MI_LOAD_REGISTER_MEM::MemoryAd > dress: > > + * > > + * "This field specifies the address of the memory location > where the > > + * register value specified in the DWord above will read from. > The > > + * address specifies the DWord location of the data. Range = > > + * GraphicsVirtualAddress[63:2] for a DWord register > GraphicsAddress > > + * [63:48] are ignored by the HW and assumed to be in correct > > + * canonical form [63:48] == [47]." > > + */ > > const int shift = 63 - 47; /* != 8 :) */ > Drp... I'll use the const int. :-) > > + reloc_size = sizeof(uint64_t); > > + *(uint64_t *)p = (((int64_t)v) << 8) >> 8; > > + } else { > > + reloc_size = sizeof(uint32_t); > > + *(uint32_t *)p = v; > > + } > > + > > + if (!device->info.has_llc) > > + anv_clflush_range(p, reloc_size); > > +} > > + > > +static void > > +anv_reloc_list_apply(struct anv_reloc_list *list, > > + struct anv_device *device, struct anv_bo *bo) > > +{ > > + for (size_t i = 0; i < list->num_relocs; i++) { > > + void *p = bo->map + list->relocs[i].offset; > > + > > + struct anv_bo *target_bo = list->reloc_bos[i]; > > + write_reloc(device, p, target_bo->offset + list->relocs[i].delta); > > + list->relocs[i].presumed_offset = bo->offset; > > + } > > +} > > + > > +/** > > + * This function applies the relocation for a command buffer and writes > the > > + * actual addresses into the buffers as per what we were told by the > kernel on > > + * the previous execbuf2 call. This should be safe to do because, for > each > > + * relocated address, we have two cases: > > + * > > + * 1) The target BO is inactive (as seen by the kernel). In this > case, it is > > + * not in use by the GPU so updating the address is 100% ok. It > won't be > > + * in-use by the GPU (from our context) again until the next > execbuf2 > > + * happens. If the kernel decides to move it in the next execbuf2, > it > > + * will have to do the relocations itself, but that's ok because it > should > > + * have all of the information needed to do so. > > + * > > + * 2) The target BO is active (as seen by the kernel). In this case, > it > > + * hasn't moved since the last execbuffer2 call because GTT > shuffling > > + * *only* happens inside the execbuffer2 ioctl. Since the target BO > > + * hasn't moved, our anv_bo::offset exactly matches the BO's GTT > address > > + * and the relocated value we are writing into the BO will be the > same as > > + * the value that is already there. > > + * > > + * There is also a possibility that the target BO is active but the > exact > > + * RENDER_SURFACE_STATE object we are writing the relocation into > isn't in > > + * use. In this case, the address currently in the > RENDER_SURFACE_STATE > > + * may be stale but it's still safe to write the relocation because > that > > + * particular RENDER_SURFACE_STATE object isn't in-use by the GPU > and > > + * won't be until the next execbuf2 call. > > + * > > + * By doing relocations on the CPU, we can tell the kernel that it > doesn't > > + * need to bother. We want to do this because the surface state buffer > is > > + * used by every command buffer so, if the kernel does the relocations, > it > > + * will always be busy and the kernel will always stall. This is also > > + * probably the fastest mechanism for doing relocations since the > kernel would > > + * have to make a full copy of all the relocations lists. > > + */ > > +static void > > +relocate_cmd_buffer(struct anv_cmd_buffer *cmd_buffer) > > +{ > > + for (uint32_t i = 0; i < cmd_buffer->execbuf2.bo_count; i++) { > > + if (cmd_buffer->execbuf2.bos[i]->offset == (uint64_t)-1) > > + return; > > + } > > Note you still have to make sure the reloc.presumed_offset matches the > value used to compute the contents of the buffer. Old relocations will > retain their value from the previous pass, new allocations will have > presumed_offset = 0 and hopefully the content will be delta. (Otherwise > you run into trouble if the kernel places the buffer at 0, it will then > skip the relocations pointing to it.) > So, I *thought* we were doing this correctly, but I think the current code is a bit on the busted side. I'm going to be sending out a v3 that has some general relocation fixes as well as more comments (along the line of what you said above) > That deserves a comment (if only to contrast the two paths). > > > + anv_reloc_list_apply(&cmd_buffer->surface_relocs, > > + cmd_buffer->device, > > + &cmd_buffer->device->surface_s > tate_block_pool.bo); > > + > > + struct anv_batch_bo **bbo; > > + u_vector_foreach(bbo, &cmd_buffer->seen_bbos) { > > + anv_reloc_list_apply(&(*bbo)->relocs, > > + cmd_buffer->device, &(*bbo)->bo); > > + } > > + > > + for (uint32_t i = 0; i < cmd_buffer->execbuf2.bo_count; i++) { > > + struct anv_bo *bo = cmd_buffer->execbuf2.bos[i]; > > + > > + cmd_buffer->execbuf2.objects[i].offset = bo->offset; > > /* bo->offset = last execobj.offset */ > > *reloc.offset = bo->offset + delta; > reloc.presumed_offset = bo->offset; > execobj.offset = bo->offset; > > Looks good. > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev