On Thu, Oct 29, 2015 at 12:32 AM, Iago Toral <ito...@igalia.com> wrote: > On Wed, 2015-10-28 at 10:58 -0700, Kristian Høgsberg wrote: >> On Wed, Oct 28, 2015 at 10:01:40AM +0100, Samuel Iglesias Gonsálvez wrote: >> > There is no opinions about this issue or reviews of the proposed patch >> > after one week. >> > >> > This is just a reminder in case you have missed it :-) >> >> Thanks for the reminder! How about something like this instead? > > Yeah, that works too. I was a bit concerned that this same problem may > be affecting other places and this would only address it for > brw_send_indirect_message, but after a quick review we don't generally > need to hold pointers to previous instructions and the places where we > do, like in brw_ENDIF or brw_WHILE we are careful to create the > instructions we need before we look for pointers to others (which we do > using indices into the store anyway).
Yea, it's not a general fix, but I'm not sure how we'd do that. In your patch you were special casing the send instruction, which also only covers some cases. What I had in mind when I wrote this alternative fix was that: 1) it's a lot simpler and 2) it's local to the cause of the problem and 3) doesn't suggest that you can safely hold on to inst pointers (shows the opposite, in fact) > Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> > > I'll push this patch tomorrow if nobody else objects. Thanks, good find. > Thanks Kristian! > >> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c >> b/src/mesa/drivers/dri/i965/brw_eu_emit.c >> index ebd811f..cd5c726 100644 >> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c >> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c >> @@ -2511,12 +2511,20 @@ brw_send_indirect_message(struct brw_codegen *p, >> struct brw_reg desc) >> { >> const struct brw_device_info *devinfo = p->devinfo; >> - struct brw_inst *send, *setup; >> + struct brw_inst *send; >> + int setup; >> >> assert(desc.type == BRW_REGISTER_TYPE_UD); >> >> + /* We hold on to the setup instruction (the SEND in the direct case, the >> OR >> + * in the indirect case) by its index in the instruction store. The >> + * pointer returned by next_insn() may become invalid if emitting the >> SEND >> + * in the indirect case reallocs the store. >> + */ >> + >> if (desc.file == BRW_IMMEDIATE_VALUE) { >> - setup = send = next_insn(p, BRW_OPCODE_SEND); >> + setup = p->nr_insn; >> + send = next_insn(p, BRW_OPCODE_SEND); >> brw_set_src1(p, send, desc); >> >> } else { >> @@ -2531,7 +2539,8 @@ brw_send_indirect_message(struct brw_codegen *p, >> * caller can specify additional descriptor bits with the usual >> * brw_set_*_message() helper functions. >> */ >> - setup = brw_OR(p, addr, desc, brw_imm_ud(0)); >> + setup = p->nr_insn; >> + brw_OR(p, addr, desc, brw_imm_ud(0)); >> >> brw_pop_insn_state(p); >> >> @@ -2543,7 +2552,7 @@ brw_send_indirect_message(struct brw_codegen *p, >> brw_set_src0(p, send, retype(payload, BRW_REGISTER_TYPE_UD)); >> brw_inst_set_sfid(devinfo, send, sfid); >> >> - return setup; >> + return &p->store[setup]; >> } >> >> static struct brw_inst * >> >> >> > Sam >> > >> > On 21/10/15 12:23, Iago Toral wrote: >> > > Hi, >> > > >> > > The problem is with code like this (see brw_send_indirect_message): >> > > >> > > setup = brw_OR(p, addr, desc, brw_imm_ud(0)); >> > > send = next_insn(p, BRW_OPCODE_SEND); >> > > ... >> > > return setup; >> > > >> > > If next_insn triggers a realloc of the instruction store, then the setup >> > > instruction pointer is no longer valid. Notice that this can happen >> > > anywhere where we keep pointers to previous instructions before creating >> > > new ones (!) >> > > >> > > The patch from Samuel fixes this by special-casing this for SEND >> > > instructions only (since we know that the indirect versions can hit >> > > this, maybe there are more situations though). It does so by trying to >> > > make sure that we never realloc the store with a SEND instruction. For >> > > this, we realloc before we reach the end of the current store (32 >> > > instructions before the limit) as long as the instruction is not a SEND >> > > (so that if it is a SEND we still have up to 32 opportunities to do the >> > > realloc without a different instruction before running out of space in >> > > the store). >> > > >> > > Iago >> > > >> > > On Wed, 2015-10-21 at 09:02 +0200, Samuel Iglesias Gonsalvez wrote: >> > >> Hello, >> > >> >> > >> I have found several invalid memory accesses when running >> > >> dEQP-GLES31.functional.ssbo.* tests on i965 driver (and gen7+). That >> > >> invalid memory accesses were unluckily happening when generating the >> > >> assembly instructions for SSBO stores for different compute shaders. >> > >> >> > >> However it looks like this problem could happen to other shaders and >> > >> situations. Because of that, I am going to explain the problem here: >> > >> >> > >> When generating a untyped surface write/read, i965 driver will end up >> > >> calling brw_send_indirect_message() through >> > >> brw_send_indirect_surface_message(). At brw_send_indirect_message()'s >> > >> 'else' branch, the code generates a load of the indirect descriptor to >> > >> an address register using an OR instruction and it also generates a new >> > >> SEND instruction; if this case happens, the OR instruction is returned. >> > >> brw_send_indirect_surface_message() uses that OR instruction to set mlen >> > >> and rlen's descriptor bits later. >> > >> >> > >> Just to give more context, when generating instructions in fs/vec4 >> > >> generators, i965 driver uses pointers to elements in the 'store' table >> > >> inside struct brw_codegen. That table has an initial size of 1024 but, >> > >> when it's full, it is resized (doubling its size each time, >> > >> see brw_next_insn()). This resize operation ends up calling >> > >> realloc(). However the returned pointer by realloc() could be different >> > >> and the old allocated memory would be free'd as part of the process. >> > >> >> > >> Back to the issue, if the p->store's resize happens when we get the >> > >> pointer to the SEND instruction at brw_send_indirect_message()'s 'else' >> > >> branch, we could have the following problem: >> > >> >> > >> The realloc() returns a new pointer and *free'd* the old allocation, so >> > >> the pointer we previously saved for the OR instruction at >> > >> brw_send_indirect_surface_message() becomes invalid (because it is a >> > >> pointer of the old allocation). Then, we access to that invalid pointer >> > >> when setting up rlen/mlen bits at the end of >> > >> brw_send_indirect_surface_message() and we would have undefined results. >> > >> >> > >> This issue is quite unlikely to happen but it is reproducible on >> > >> ~120 dEQP-GLES31.functional.ssbo.* tests, basically because they have >> > >> the same shaders except the buffer variable's data type. Those tests >> > >> were failing intermittently at different rates but valgrind helped to >> > >> find what was happening. >> > >> >> > >> I would like to expose publicly the problem and analyse possible >> > >> solutions for it along with the community. For the time being, a patch >> > >> is proposed to mitigate this issue, which is sent as a reply to this >> > >> one. >> > >> >> > >> What this work-around patch does is: book enough room for one or more >> > >> consecutive SEND* instructions at the end of the p->store table in order >> > >> to avoid reallocating p->store in the aforementioned case. The 32 value >> > >> was chosen arbitrary because it has low impact in p->store >> > >> (its initial size is 1024) and makes this issue much less likely to >> > >> happen. We could tune this number to a less conservative value if >> > >> needed. If you want to test it, that patch should be applied on top of >> > >> this Curro's patch [0] as it fixes a lot of compute shader compilation >> > >> errors (~700 dEQP-GLES31.functional.ssbo.* tests). I have setup a branch >> > >> with both patches in [1]. >> > >> >> > >> Feel free to comment about other solutions, ideas, opinions, etc. >> > >> >> > >> Thanks, >> > >> >> > >> Sam >> > >> >> > >> [0] See attachment at >> > >> http://lists.freedesktop.org/archives/mesa-dev/2015-October/097183.html >> > >> [1] $ git clone -b dEQP-functional-ssbo-fixes-v1 \ >> > >> https://github.com/Igalia/mesa.git >> > >> >> > >> Samuel Iglesias Gonsalvez (1): >> > >> i965: book space at the end of p->store for SEND opcodes to avoid >> > >> invalid memory access >> > >> >> > >> src/mesa/drivers/dri/i965/brw_eu_emit.c | 12 +++++++++++- >> > >> 1 file changed, 11 insertions(+), 1 deletion(-) >> > >> >> > > >> > > >> > > >> > _______________________________________________ >> > mesa-dev mailing list >> > mesa-dev@lists.freedesktop.org >> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev >> > > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev