Re: [Mesa-dev] abundance of branches in mesa.git
On Mon, Jun 22, 2015 at 5:36 AM, Ilia Mirkin wrote: > On Sun, Jun 21, 2015 at 11:33 PM, Michel Dänzer wrote: >> On 22.06.2015 00:31, Ilia Mirkin wrote: >>> On Sun, Jun 21, 2015 at 12:22 PM, Emil Velikov >>> wrote: On 20/06/15 10:01, Eirik Byrkjeflot Anonsen wrote: > Ilia Mirkin writes: > >> Hello, >> >> There are a *ton* of branches in the upstream mesa git. Here is a full >> list: >> > [...] >> is there >> any reason to keep these around with the exception of: >> >> master >> $version (i.e. 9.0, 10.0, mesa_7_7_branch, etc) > > Instead of outright deleting old branches, it would be possible to set > up an "archive" repository which mirrors all branches of the main > repository. And then delete "obsolete" branches only from the main > repository. Ideally, you would want a git hook to refuse to create a new > branch (in the main repository) if a branch by that name already exists > in the archive repository. Possibly with the exception that creating a > same-named branch on the same commit would be allowed. > > (And the same for tags, of course) > Personally I am fine with either approach - stay/nuke/move. But I'm thinking that having a mix of the two suggestions might be a nice middle ground. Write a script that nukes branches that are merged in master (check the top commit of the branch) and have an 'archive' repo that contains everything else (minus the stable branches). >> >> Sounds good to me, FWIW. >> >> >>> That still leaves a ton around, and curiously removes mesa_7_5 and mesa_7_6. >> >> I think the latter is expected, we were using a different branching >> model back in those days. >> >> >>>origin/amdgpu >> >> Note that this is a currently active branch, to be merged to master soon. > > Perhaps there's something I don't understand, but why is a feature > branch made available on the shared tree? In my view of things the > only branches on the shared mesa.git tree should be the version > branches. As you can see, a lot of feature branches are in the shared tree already, so there is a precedent. Sharing a branch among people in this way sometimes tends to be more convenient. The reason here is that it's the only mesa repository where most people from our team have commit access. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] i965: Don't consider uniform value locations in program uploads
On Thu, Jun 04, 2015 at 05:35:11PM -0700, Ben Widawsky wrote: > On Wed, Jun 03, 2015 at 09:32:55PM +0300, Pohjolainen, Topi wrote: > > On Wed, Jun 03, 2015 at 09:21:11PM +0300, Topi Pohjolainen wrote: > > > Shader programs are cached per stage (FS, VS, GS) using the > > > corresponding shader source identifier and compile time choices > > > as key. However, one not only stores the program binary but > > > a pair consisting of program binary and program data. The latter > > > represents the store of constants (such as uniforms) used by > > > the program. > > > > > > However, when programs are searched in the cache for reloading > > > only the program key representing the binary is considered > > > (see for example, brw_upload_wm_prog() and brw_search_cache()). > > > Hence, when programs are re-loaded from cache the first program > > > binary, program data pair is extracted without considering if > > > the program data matches the currently in use uniform storage > > > as well. > > > > > > My reasoning Why this actually works is because the key > > > contains the identifier of the corresponding gl_program that > > > represents the source code for the shader program. Hence, > > > two programs having identical source code still have unique > > > keys. > > > And therefore brw_try_upload_using_copy() never encounters > > > a case where a matching binary is found but the program data > > > doesn't match. > > > > In fact, thinking some more I think this is possible when the > > same, say fragment shader, is used with two different vertex > > shaders. This results into there being matching binaries but > > program data pointing to different storage. Looking at > > brw_upload_cache() I still can't see how failing > > brw_try_upload_using_copy() makes a difference. We only upload > > the program binary again (even though that is the part that > > actually matches). And then proceed the same way regardless > > of the result of brw_try_upload_using_copy(). The program data > > gets augmented with the key. > > > > But the point remains that when a program is reloaded through > > the brw_search_cache() only the key (and not the program data) > > is considered returning the first matching pair. > > > > I probably need to write a piglit test for this. > > > > > > > > My ultimate goal is to stop storing pointers to the individual > > > components of a uniform but to store only a pointer to the > > > "struct gl_uniform_storage" instead, and allow > > > gen6_upload_push_constants() to iterate over individual > > > components and array elements. This is needed to be able to > > > convert 32-bits floats to fp16 - otherwise there is only > > > pointer to 32-bits without knowing its type (int, float, etc) > > > let alone its target precision. > > > > > > No regression in jenkins. However, we talked about this with > > > Ken and this doesn't really tell much as piglit doesn't really > > > re-use shader sources during one execution. > > > > > > Signed-off-by: Topi Pohjolainen > > > CC: Kenneth Graunke > > > CC: Tapani P\344lli > > > --- > > > src/mesa/drivers/dri/i965/brw_program.c | 6 -- > > > 1 file changed, 6 deletions(-) > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_program.c > > > b/src/mesa/drivers/dri/i965/brw_program.c > > > index e5c0d3c..7f5fde8 100644 > > > --- a/src/mesa/drivers/dri/i965/brw_program.c > > > +++ b/src/mesa/drivers/dri/i965/brw_program.c > > > @@ -576,12 +576,6 @@ brw_stage_prog_data_compare(const struct > > > brw_stage_prog_data *a, > > > if (memcmp(a, b, offsetof(struct brw_stage_prog_data, param))) > > >return false; > > > > > > - if (memcmp(a->param, b->param, a->nr_params * sizeof(void *))) > > > - return false; > > > - > > > - if (memcmp(a->pull_param, b->pull_param, a->nr_pull_params * > > > sizeof(void *))) > > > - return false; > > > - > > > return true; > > > } > > > > > I am looking at a lot of this code for the first time, and I have a kind of > wild > guess. > > The first time you upload a program, the program (kinda annoying that > brw_upload_item_data doesn't seem to actually do that). Malloc a pointer (tmp, > item->key), store the program and aux there. Set that pointer as the key. > > The aux data lives at key + key_size. > > Indeed search_cache() only checks the key. For WM it does contain the > urb_entry data that I think would change if number of uniforms differed. So > for > your example above with 2 VS sharing an FS, if the number of uniforms are the > same, then the program should be identical in the FS, right? Similarly for the > GS with input_varyings. I think generally this is the behavior you'd want. > > brw_try_upload_using_copy() seems correct to me as it does do the aux_compare > (and falls back to memcmp). Well, I've been looking this quite a bit now, and I'm still somewhat confused what brw_upload_cache() tries to achieve with brw_try_upload_using_copy(). If you check brw_try_upload_using_copy() you
Re: [Mesa-dev] [RFC] i965: Don't consider uniform value locations in program uploads
On Mon, Jun 22, 2015 at 01:28:12PM +0300, Pohjolainen, Topi wrote: > On Thu, Jun 04, 2015 at 05:35:11PM -0700, Ben Widawsky wrote: > > On Wed, Jun 03, 2015 at 09:32:55PM +0300, Pohjolainen, Topi wrote: > > > On Wed, Jun 03, 2015 at 09:21:11PM +0300, Topi Pohjolainen wrote: > > > > Shader programs are cached per stage (FS, VS, GS) using the > > > > corresponding shader source identifier and compile time choices > > > > as key. However, one not only stores the program binary but > > > > a pair consisting of program binary and program data. The latter > > > > represents the store of constants (such as uniforms) used by > > > > the program. > > > > > > > > However, when programs are searched in the cache for reloading > > > > only the program key representing the binary is considered > > > > (see for example, brw_upload_wm_prog() and brw_search_cache()). > > > > Hence, when programs are re-loaded from cache the first program > > > > binary, program data pair is extracted without considering if > > > > the program data matches the currently in use uniform storage > > > > as well. > > > > > > > > My reasoning Why this actually works is because the key > > > > contains the identifier of the corresponding gl_program that > > > > represents the source code for the shader program. Hence, > > > > two programs having identical source code still have unique > > > > keys. > > > > And therefore brw_try_upload_using_copy() never encounters > > > > a case where a matching binary is found but the program data > > > > doesn't match. > > > > > > In fact, thinking some more I think this is possible when the > > > same, say fragment shader, is used with two different vertex > > > shaders. This results into there being matching binaries but > > > program data pointing to different storage. Looking at > > > brw_upload_cache() I still can't see how failing > > > brw_try_upload_using_copy() makes a difference. We only upload > > > the program binary again (even though that is the part that > > > actually matches). And then proceed the same way regardless > > > of the result of brw_try_upload_using_copy(). The program data > > > gets augmented with the key. > > > > > > But the point remains that when a program is reloaded through > > > the brw_search_cache() only the key (and not the program data) > > > is considered returning the first matching pair. > > > > > > I probably need to write a piglit test for this. > > > > > > > > > > > My ultimate goal is to stop storing pointers to the individual > > > > components of a uniform but to store only a pointer to the > > > > "struct gl_uniform_storage" instead, and allow > > > > gen6_upload_push_constants() to iterate over individual > > > > components and array elements. This is needed to be able to > > > > convert 32-bits floats to fp16 - otherwise there is only > > > > pointer to 32-bits without knowing its type (int, float, etc) > > > > let alone its target precision. > > > > > > > > No regression in jenkins. However, we talked about this with > > > > Ken and this doesn't really tell much as piglit doesn't really > > > > re-use shader sources during one execution. > > > > > > > > Signed-off-by: Topi Pohjolainen > > > > CC: Kenneth Graunke > > > > CC: Tapani P\344lli > > > > --- > > > > src/mesa/drivers/dri/i965/brw_program.c | 6 -- > > > > 1 file changed, 6 deletions(-) > > > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_program.c > > > > b/src/mesa/drivers/dri/i965/brw_program.c > > > > index e5c0d3c..7f5fde8 100644 > > > > --- a/src/mesa/drivers/dri/i965/brw_program.c > > > > +++ b/src/mesa/drivers/dri/i965/brw_program.c > > > > @@ -576,12 +576,6 @@ brw_stage_prog_data_compare(const struct > > > > brw_stage_prog_data *a, > > > > if (memcmp(a, b, offsetof(struct brw_stage_prog_data, param))) > > > >return false; > > > > > > > > - if (memcmp(a->param, b->param, a->nr_params * sizeof(void *))) > > > > - return false; > > > > - > > > > - if (memcmp(a->pull_param, b->pull_param, a->nr_pull_params * > > > > sizeof(void *))) > > > > - return false; > > > > - > > > > return true; > > > > } > > > > > > > > I am looking at a lot of this code for the first time, and I have a kind of > > wild > > guess. > > > > The first time you upload a program, the program (kinda annoying that > > brw_upload_item_data doesn't seem to actually do that). Malloc a pointer > > (tmp, > > item->key), store the program and aux there. Set that pointer as the key. > > > > The aux data lives at key + key_size. > > > > Indeed search_cache() only checks the key. For WM it does contain the > > urb_entry data that I think would change if number of uniforms differed. So > > for > > your example above with 2 VS sharing an FS, if the number of uniforms are > > the > > same, then the program should be identical in the FS, right? Similarly for > > the > > GS with input_varyings. I think generally this is the behavior you'd want. > > > > brw_try_upload_usi
Re: [Mesa-dev] [PATCH] tgsi: handle indirect sampler arrays. (v2)
Should there be some clamping somewhere to prevent crashes due to out-of-bound unit index? In any case, Reviewed-by: Roland Scheidegger Am 22.06.2015 um 05:18 schrieb Dave Airlie: > This is required for ARB_gpu_shader5 support in softpipe. > > v2: add support to txd/txf/txq paths. > > Signed-off-by: Dave Airlie > --- > > src/gallium/auxiliary/tgsi/tgsi_exec.c | 42 > ++ > 1 file changed, 38 insertions(+), 4 deletions(-) > > diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c > b/src/gallium/auxiliary/tgsi/tgsi_exec.c > index fde99b9..44000ff 100644 > --- a/src/gallium/auxiliary/tgsi/tgsi_exec.c > +++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c > @@ -1988,6 +1988,35 @@ fetch_assign_deriv_channel(struct tgsi_exec_machine > *mach, > derivs[1][3] = d.f[3]; > } > > +static uint > +fetch_sampler_unit(struct tgsi_exec_machine *mach, > + const struct tgsi_full_instruction *inst, > + uint sampler) > +{ > + uint unit; > + > + if (inst->Src[sampler].Register.Indirect) { > + const struct tgsi_full_src_register *reg = &inst->Src[sampler]; > + union tgsi_exec_channel indir_index, index2; > + > + index2.i[0] = > + index2.i[1] = > + index2.i[2] = > + index2.i[3] = reg->Indirect.Index; > + > + fetch_src_file_channel(mach, > + 0, > + reg->Indirect.File, > + reg->Indirect.Swizzle, > + &index2, > + &ZeroVec, > + &indir_index); > + unit = inst->Src[sampler].Register.Index + indir_index.i[0]; > + } else { > + unit = inst->Src[sampler].Register.Index; > + } > + return unit; > +} > > /* > * execute a texture instruction. > @@ -2001,14 +2030,15 @@ exec_tex(struct tgsi_exec_machine *mach, > const struct tgsi_full_instruction *inst, > uint modifier, uint sampler) > { > - const uint unit = inst->Src[sampler].Register.Index; > const union tgsi_exec_channel *args[5], *proj = NULL; > union tgsi_exec_channel r[5]; > enum tgsi_sampler_control control = tgsi_sampler_lod_none; > uint chan; > + uint unit; > int8_t offsets[3]; > int dim, shadow_ref, i; > > + unit = fetch_sampler_unit(mach, inst, sampler); > /* always fetch all 3 offsets, overkill but keeps code simple */ > fetch_texel_offsets(mach, inst, offsets); > > @@ -2107,12 +2137,13 @@ static void > exec_txd(struct tgsi_exec_machine *mach, > const struct tgsi_full_instruction *inst) > { > - const uint unit = inst->Src[3].Register.Index; > union tgsi_exec_channel r[4]; > float derivs[3][2][TGSI_QUAD_SIZE]; > uint chan; > + uint unit; > int8_t offsets[3]; > > + unit = fetch_sampler_unit(mach, inst, 3); > /* always fetch all 3 offsets, overkill but keeps code simple */ > fetch_texel_offsets(mach, inst, offsets); > > @@ -2214,14 +2245,15 @@ static void > exec_txf(struct tgsi_exec_machine *mach, > const struct tgsi_full_instruction *inst) > { > - const uint unit = inst->Src[1].Register.Index; > union tgsi_exec_channel r[4]; > uint chan; > + uint unit; > float rgba[TGSI_NUM_CHANNELS][TGSI_QUAD_SIZE]; > int j; > int8_t offsets[3]; > unsigned target; > > + unit = fetch_sampler_unit(mach, inst, 1); > /* always fetch all 3 offsets, overkill but keeps code simple */ > fetch_texel_offsets(mach, inst, offsets); > > @@ -2296,12 +2328,14 @@ static void > exec_txq(struct tgsi_exec_machine *mach, > const struct tgsi_full_instruction *inst) > { > - const uint unit = inst->Src[1].Register.Index; > int result[4]; > union tgsi_exec_channel r[4], src; > uint chan; > + uint unit; > int i,j; > > + unit = fetch_sampler_unit(mach, inst, 1); > + > fetch_source(mach, &src, &inst->Src[0], TGSI_CHAN_X, TGSI_EXEC_DATA_INT); > > /* XXX: This interface can't return per-pixel values */ > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] i965/gen9: Implement Push Constant Buffer workaround
Ran multiple test cases multiple times that were introducing GPU hangs. Applying this patch fixed the GPU hang issues on SKL. Tested-by: Valtteri Rantala > -Original Message- > From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On Behalf > Of Anuj Phogat > Sent: Friday, June 19, 2015 4:27 AM > To: Widawsky, Benjamin > Cc: mesa-dev; Deak, Imre; Phogat, Anuj; Ben Widawsky > Subject: Re: [Mesa-dev] [PATCH 1/2] i965/gen9: Implement Push Constant > Buffer workaround > > On Wed, Jun 3, 2015 at 9:35 PM, Ben Widawsky > wrote: > > This implements a workaround (exact excerpt as a comment in the code). > > The docs specify [clearly, after you struggle for a while] that the > > offset isn't relative to state base. This actually makes sense. > > > > Buffer #0 is meant to be used for normal uniforms. > > Buffer #1 is typically used for gather constants when using RS. > > Buffer #1-#3 could be used to push a bunch of UBO data which would just be > > somewhere in memory, and not relative to the dynamic state. > > > > NOTE: I've moved away from the ternary operator for the new gen9 > conditions. > > Admittedly it's probably not great to do this, but I really want to > > fix this all up in the subsequent patch and doing it here makes that > > diff a lot nicer. I want to split out the gen8/9 code to make the > > function a bit more readable, but to keep this easily cherry-pickable > > I am doing this fix first. If we decide not to merge the cleanup patch then > > I can > revisit this. > > > > Anuj ran this on his SKL and said there were no fixes on regressions. > > There is some hope it fixes BXT issues. > > > > Cc: Imre Deak > > Cc: Neil Roberts > > Cc: Anuj Phogat > > Signed-off-by: Ben Widawsky > > --- > > src/mesa/drivers/dri/i965/gen7_vs_state.c | 48 > > ++- > > 1 file changed, 41 insertions(+), 7 deletions(-) > > > > diff --git a/src/mesa/drivers/dri/i965/gen7_vs_state.c > > b/src/mesa/drivers/dri/i965/gen7_vs_state.c > > index 278b3ec..4b17d06 100644 > > --- a/src/mesa/drivers/dri/i965/gen7_vs_state.c > > +++ b/src/mesa/drivers/dri/i965/gen7_vs_state.c > > @@ -43,18 +43,52 @@ gen7_upload_constant_state(struct brw_context > *brw, > > int dwords = brw->gen >= 8 ? 11 : 7; > > BEGIN_BATCH(dwords); > > OUT_BATCH(opcode << 16 | (dwords - 2)); > > - OUT_BATCH(active ? stage_state->push_const_size : 0); > > - OUT_BATCH(0); > > + > > + /* Workaround for SKL+ (we use option #2 until we have a need for more > > +* constant buffers). This comes from the documentation for > 3DSTATE_CONSTANT_* > > +* > > +* The driver must ensure The following case does not occur without a > > flush > > +* to the 3D engine: 3DSTATE_CONSTANT_* with buffer 3 read length equal > to > > +* zero committed followed by a 3DSTATE_CONSTANT_* with buffer 0 read > length > > +* not equal to zero committed. Possible ways to avoid this condition > > +* include: > > +* 1. always force buffer 3 to have a non zero read length > > +* 2. always force buffer 0 to a zero read length > > +*/ > > + if (brw->gen >= 9 && active) { > > + OUT_BATCH(0); > > + OUT_BATCH(stage_state->push_const_size); > > + } else { > > + OUT_BATCH(active ? stage_state->push_const_size : 0); > > + OUT_BATCH(0); > > + } > > /* Pointer to the constant buffer. Covered by the set of state flags > > * from gen6_prepare_wm_contants > > */ > > - OUT_BATCH(active ? (stage_state->push_const_offset | mocs) : 0); > > - OUT_BATCH(0); > > - OUT_BATCH(0); > > - OUT_BATCH(0); > > - if (brw->gen >= 8) { > > + if (brw->gen >= 9 && active) { > > + OUT_BATCH(0); > > + OUT_BATCH(0); > > + OUT_BATCH(0); > > + OUT_BATCH(0); > > + /* XXX: When using buffers other than 0, you need to specify the > > + * graphics virtual address regardless of INSPM/debug bits > INSTPM > > + */ > > + OUT_RELOC64(brw->batch.bo, I915_GEM_DOMAIN_RENDER, 0, > > + stage_state->push_const_offset); > >OUT_BATCH(0); > >OUT_BATCH(0); > > + } else if (brw->gen>= 8) { > > + OUT_BATCH(active ? (stage_state->push_const_offset | mocs) : 0); > > + OUT_BATCH(0); > > + OUT_BATCH(0); > > + OUT_BATCH(0); > > + OUT_BATCH(0); > > + OUT_BATCH(0); > > + OUT_BATCH(0); > > + OUT_BATCH(0); > > + } else { > > + OUT_BATCH(active ? (stage_state->push_const_offset | mocs) : 0); > > + OUT_BATCH(0); > >OUT_BATCH(0); > >OUT_BATCH(0); > > } > > -- > > 2.4.2 > > > > ___ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > > Verified with the spec. LGTM. > > Reviewed-by: Anuj Phogat > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listi
Re: [Mesa-dev] abundance of branches in mesa.git
On Mon, Jun 22, 2015 at 12:23:54PM +0200, Marek Olšák wrote: > On Mon, Jun 22, 2015 at 5:36 AM, Ilia Mirkin wrote: > > On Sun, Jun 21, 2015 at 11:33 PM, Michel Dänzer wrote: > >> On 22.06.2015 00:31, Ilia Mirkin wrote: > >>> On Sun, Jun 21, 2015 at 12:22 PM, Emil Velikov > >>> wrote: > On 20/06/15 10:01, Eirik Byrkjeflot Anonsen wrote: > > Ilia Mirkin writes: > > > >> Hello, > >> > >> There are a *ton* of branches in the upstream mesa git. Here is a full > >> list: > >> > > [...] > >> is there > >> any reason to keep these around with the exception of: > >> > >> master > >> $version (i.e. 9.0, 10.0, mesa_7_7_branch, etc) > > > > Instead of outright deleting old branches, it would be possible to set > > up an "archive" repository which mirrors all branches of the main > > repository. And then delete "obsolete" branches only from the main > > repository. Ideally, you would want a git hook to refuse to create a new > > branch (in the main repository) if a branch by that name already exists > > in the archive repository. Possibly with the exception that creating a > > same-named branch on the same commit would be allowed. > > > > (And the same for tags, of course) > > > Personally I am fine with either approach - stay/nuke/move. But I'm > thinking that having a mix of the two suggestions might be a nice middle > ground. > > Write a script that nukes branches that are merged in master (check the > top commit of the branch) and have an 'archive' repo that contains > everything else (minus the stable branches). > >> > >> Sounds good to me, FWIW. > >> > >> > >>> That still leaves a ton around, and curiously removes mesa_7_5 and > >>> mesa_7_6. > >> > >> I think the latter is expected, we were using a different branching > >> model back in those days. > >> > >> > >>>origin/amdgpu > >> > >> Note that this is a currently active branch, to be merged to master soon. > > > > Perhaps there's something I don't understand, but why is a feature > > branch made available on the shared tree? In my view of things the > > only branches on the shared mesa.git tree should be the version > > branches. > > As you can see, a lot of feature branches are in the shared tree > already, so there is a precedent. Sharing a branch among people in > this way sometimes tends to be more convenient. > > The reason here is that it's the only mesa repository where most > people from our team have commit access. > Also, the shared git tree supports https access, which means it is accessible when behind a firewall. -Tom > Marek > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] abundance of branches in mesa.git
On Mon, Jun 22, 2015 at 9:39 AM, Tom Stellard wrote: > On Mon, Jun 22, 2015 at 12:23:54PM +0200, Marek Olšák wrote: >> On Mon, Jun 22, 2015 at 5:36 AM, Ilia Mirkin wrote: >> > On Sun, Jun 21, 2015 at 11:33 PM, Michel Dänzer wrote: >> >> On 22.06.2015 00:31, Ilia Mirkin wrote: >> >>> On Sun, Jun 21, 2015 at 12:22 PM, Emil Velikov >> >>> wrote: >> On 20/06/15 10:01, Eirik Byrkjeflot Anonsen wrote: >> > Ilia Mirkin writes: >> > >> >> Hello, >> >> >> >> There are a *ton* of branches in the upstream mesa git. Here is a >> >> full list: >> >> >> > [...] >> >> is there >> >> any reason to keep these around with the exception of: >> >> >> >> master >> >> $version (i.e. 9.0, 10.0, mesa_7_7_branch, etc) >> > >> > Instead of outright deleting old branches, it would be possible to set >> > up an "archive" repository which mirrors all branches of the main >> > repository. And then delete "obsolete" branches only from the main >> > repository. Ideally, you would want a git hook to refuse to create a >> > new >> > branch (in the main repository) if a branch by that name already exists >> > in the archive repository. Possibly with the exception that creating a >> > same-named branch on the same commit would be allowed. >> > >> > (And the same for tags, of course) >> > >> Personally I am fine with either approach - stay/nuke/move. But I'm >> thinking that having a mix of the two suggestions might be a nice middle >> ground. >> >> Write a script that nukes branches that are merged in master (check the >> top commit of the branch) and have an 'archive' repo that contains >> everything else (minus the stable branches). >> >> >> >> Sounds good to me, FWIW. >> >> >> >> >> >>> That still leaves a ton around, and curiously removes mesa_7_5 and >> >>> mesa_7_6. >> >> >> >> I think the latter is expected, we were using a different branching >> >> model back in those days. >> >> >> >> >> >>>origin/amdgpu >> >> >> >> Note that this is a currently active branch, to be merged to master soon. >> > >> > Perhaps there's something I don't understand, but why is a feature >> > branch made available on the shared tree? In my view of things the >> > only branches on the shared mesa.git tree should be the version >> > branches. >> >> As you can see, a lot of feature branches are in the shared tree >> already, so there is a precedent. Sharing a branch among people in >> this way sometimes tends to be more convenient. >> >> The reason here is that it's the only mesa repository where most >> people from our team have commit access. >> > > Also, the shared git tree supports https access, which means it is > accessible when behind a firewall. OK, well if that's the prevailing attitude, then I'm on a fool's errand, and I'll just drop this. -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/11] glapi fixes - build whole of mesa with
On 19/06/15 23:09, Emil Velikov wrote: On 19 June 2015 at 21:26, Jose Fonseca wrote: On 19/06/15 20:56, Emil Velikov wrote: Hi all, A lovely series inspired (more like 'was awaken to send these out') by Pal Rohár, who was having issues when building xlib-libgl (plus the now enabled gles*) So here, we teach the final two static glapi users about shared-glapi, plus some related fixes. After this is done we can finally start transitioning to shared-only glapi, with some more details as mentioned in one of the patches: XXX: With this one done, we can finally transition with enforcing shared-glapi, and - link the dri modules against libglapi.so, add --no-undefined to the LDFLAGS - drop the dlopen(libglapi.so/libGL.so, RTLD_GLOBAL) workarounds in the loaders - libGL, libEGL and libgbm. - start killing off/cleaning up the dispatch ? The caveats: 1) up to what stage do we care about static libraries - libgl (either dri or xlib based) - osmesa - libEGL 2) how about other platforms (scons) ? - currently the scons uses static glapi, - would we need the dlopen(...) on windows ? Hope everyone is excited about this one as I am :-) Maybe I missed the context of this changes, but why this matters or is an improvement? If one goes the extra mile (which this series doesn't) - one configure option less, substantial some code de-duplication and consistent use of the code amongst all components provided. This way any improvements/cleanups made to the shared glapi will be available to osmesa/xlib-libgl. I'm perfectly happy with removing the configure option. And I understand the benefits of unified code paths, but I believe that for this particular case, the difference in requirements really demands the separate code paths. In summary, having the ability of using a shared glapi sounds great, but forcing shared glapi everywhere, sounds a bad idea. I'm suspecting that people might be keen on the following idea - use static glapi for osmesa/xlib-libgl and shared one everywhere else? Yes, that sounds reasonable for me. (Needs libgl-gdi too.) I fear that this will lead to further separation/bit-rot between the different implementations, but it seems like the bester compromise. I don't feel strongly between: a) using the same source code for both static/shared glapi (switched by a pre-processor define), or b) only share the interface but have shared/static glapi implementations. I'm actually not that familiar with that code. Either way, we can have two glapi build targets (a shared-glapi and a static-glapipe) side-by-side, so that there are no more source-wide configure flags. I believe a lot of the complexity of that code comes from assembly. I wonder if it's really justified nowadays (and even if it is, whether it would be better served with GNU C assembly.) Futhermore, I believe on Windows we use any assembly, so if we split shared/static glapi source code, we could probably abandon assembly from the static-glapi. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: use _mesa_lookup_enum_by_nr() in print_array()
Print GL_FLOAT, etc. instead of hex value. --- src/mesa/main/varray.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/mesa/main/varray.c b/src/mesa/main/varray.c index 7389037..ebdd9ea 100644 --- a/src/mesa/main/varray.c +++ b/src/mesa/main/varray.c @@ -2309,10 +2309,10 @@ print_array(const char *name, GLint index, const struct gl_client_array *array) fprintf(stderr, " %s[%d]: ", name, index); else fprintf(stderr, " %s: ", name); - fprintf(stderr, "Ptr=%p, Type=0x%x, Size=%d, ElemSize=%u, Stride=%d, Buffer=%u(Size %lu)\n", - array->Ptr, array->Type, array->Size, - array->_ElementSize, array->StrideB, - array->BufferObj->Name, (unsigned long) array->BufferObj->Size); + fprintf(stderr, "Ptr=%p, Type=%s, Size=%d, ElemSize=%u, Stride=%d, Buffer=%u(Size %lu)\n", + array->Ptr, _mesa_lookup_enum_by_nr(array->Type), array->Size, + array->_ElementSize, array->StrideB, array->BufferObj->Name, + (unsigned long) array->BufferObj->Size); } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/6] mesa: don't rebind constant buffers after every state change if GS is active
Hi, On 06/18/2015 03:17 PM, Emil Velikov wrote: Strange I was under the impression that there are apps that make use of GS, albeit not too many. So far I haven't seen any e.g. Steam game using GS, but Unreal Engine 4 demos: https://wiki.unrealengine.com/Linux_Demos use them. Of the 4 demos I checked, all compiled at least one geometry shader, Vehicle Game demo compiled three. I didn't check what they use them for, in total they compile hundred(s) of shaders. - Eero On the perf side - I was thinking about the hardware (i.e. regardless if the driver does extra state-tracking or not) - would there be the optimisation mentioned, would there be a "stall" in the pipeline, due to the "new" values being flushed/fetched/etc. Now that I think about it, only a few of the HW guys may know the answer on this one, so don't bother with this. Thanks Emil On 16 June 2015 at 20:56, Marek Olšák wrote: There are probably 0 apps using GS, so the answer is 0. The hardware doesn't ignore anything. It only does what it's told to do. The radeonsi driver doesn't check if the state change is redundant or not. Marek On Tue, Jun 16, 2015 at 10:13 PM, Emil Velikov wrote: Hi Marek, Out of curiosity: Any rough idea of how much of a perf. improvement this might bring ? Would the hardware ignore the newly (re)bound const. bufs, when the values are unchanged ? Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: use _mesa_lookup_enum_by_nr() in print_array()
Reviewed-by: Ilia Mirkin On Mon, Jun 22, 2015 at 10:33 AM, Brian Paul wrote: > Print GL_FLOAT, etc. instead of hex value. > --- > src/mesa/main/varray.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/src/mesa/main/varray.c b/src/mesa/main/varray.c > index 7389037..ebdd9ea 100644 > --- a/src/mesa/main/varray.c > +++ b/src/mesa/main/varray.c > @@ -2309,10 +2309,10 @@ print_array(const char *name, GLint index, const > struct gl_client_array *array) >fprintf(stderr, " %s[%d]: ", name, index); > else >fprintf(stderr, " %s: ", name); > - fprintf(stderr, "Ptr=%p, Type=0x%x, Size=%d, ElemSize=%u, Stride=%d, > Buffer=%u(Size %lu)\n", > - array->Ptr, array->Type, array->Size, > - array->_ElementSize, array->StrideB, > - array->BufferObj->Name, (unsigned long) array->BufferObj->Size); > + fprintf(stderr, "Ptr=%p, Type=%s, Size=%d, ElemSize=%u, Stride=%d, > Buffer=%u(Size %lu)\n", > + array->Ptr, _mesa_lookup_enum_by_nr(array->Type), array->Size, > + array->_ElementSize, array->StrideB, array->BufferObj->Name, > + (unsigned long) array->BufferObj->Size); > } > > > -- > 1.9.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] abundance of branches in mesa.git
On 22.06.2015 15:41, Ilia Mirkin wrote: On Mon, Jun 22, 2015 at 9:39 AM, Tom Stellard wrote: On Mon, Jun 22, 2015 at 12:23:54PM +0200, Marek Olšák wrote: On Mon, Jun 22, 2015 at 5:36 AM, Ilia Mirkin wrote: On Sun, Jun 21, 2015 at 11:33 PM, Michel Dänzer wrote: On 22.06.2015 00:31, Ilia Mirkin wrote: On Sun, Jun 21, 2015 at 12:22 PM, Emil Velikov wrote: On 20/06/15 10:01, Eirik Byrkjeflot Anonsen wrote: Ilia Mirkin writes: Hello, There are a *ton* of branches in the upstream mesa git. Here is a full list: [...] is there any reason to keep these around with the exception of: master $version (i.e. 9.0, 10.0, mesa_7_7_branch, etc) Instead of outright deleting old branches, it would be possible to set up an "archive" repository which mirrors all branches of the main repository. And then delete "obsolete" branches only from the main repository. Ideally, you would want a git hook to refuse to create a new branch (in the main repository) if a branch by that name already exists in the archive repository. Possibly with the exception that creating a same-named branch on the same commit would be allowed. (And the same for tags, of course) Personally I am fine with either approach - stay/nuke/move. But I'm thinking that having a mix of the two suggestions might be a nice middle ground. Write a script that nukes branches that are merged in master (check the top commit of the branch) and have an 'archive' repo that contains everything else (minus the stable branches). Sounds good to me, FWIW. That still leaves a ton around, and curiously removes mesa_7_5 and mesa_7_6. I think the latter is expected, we were using a different branching model back in those days. origin/amdgpu Note that this is a currently active branch, to be merged to master soon. Perhaps there's something I don't understand, but why is a feature branch made available on the shared tree? In my view of things the only branches on the shared mesa.git tree should be the version branches. As you can see, a lot of feature branches are in the shared tree already, so there is a precedent. Sharing a branch among people in this way sometimes tends to be more convenient. The reason here is that it's the only mesa repository where most people from our team have commit access. Also, the shared git tree supports https access, which means it is accessible when behind a firewall. OK, well if that's the prevailing attitude, then I'm on a fool's errand, and I'll just drop this. I still think it would be a good idea to archive the branches after a while, cause the current status is rather confusing if you search for something specifc. Regards, Christian. -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] glsl: Specify the shader stage in linker errors due to too many in/outputs.
This patch is Reviewed-by: Ian Romanick On 06/19/2015 06:08 AM, Jose Fonseca wrote: > --- > src/glsl/link_varyings.cpp | 12 > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp > index 7b2d4bd..278a778 100644 > --- a/src/glsl/link_varyings.cpp > +++ b/src/glsl/link_varyings.cpp > @@ -1540,13 +1540,15 @@ check_against_output_limit(struct gl_context *ctx, > const unsigned output_components = output_vectors * 4; > if (output_components > max_output_components) { >if (ctx->API == API_OPENGLES2 || prog->IsES) > - linker_error(prog, "shader uses too many output vectors " > + linker_error(prog, "%s shader uses too many output vectors " >"(%u > %u)\n", > + _mesa_shader_stage_to_string(producer->Stage), >output_vectors, >max_output_components / 4); >else > - linker_error(prog, "shader uses too many output components " > + linker_error(prog, "%s shader uses too many output components " >"(%u > %u)\n", > + _mesa_shader_stage_to_string(producer->Stage), >output_components, >max_output_components); > > @@ -1579,13 +1581,15 @@ check_against_input_limit(struct gl_context *ctx, > const unsigned input_components = input_vectors * 4; > if (input_components > max_input_components) { >if (ctx->API == API_OPENGLES2 || prog->IsES) > - linker_error(prog, "shader uses too many input vectors " > + linker_error(prog, "%s shader uses too many input vectors " >"(%u > %u)\n", > + _mesa_shader_stage_to_string(consumer->Stage), >input_vectors, >max_input_components / 4); >else > - linker_error(prog, "shader uses too many input components " > + linker_error(prog, "%s shader uses too many input components " >"(%u > %u)\n", > + _mesa_shader_stage_to_string(consumer->Stage), >input_components, >max_input_components); > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] glsl: handle conversions to double when comparing param matches
This seems believable... is there a piglit test? On 06/17/2015 12:15 PM, Ilia Mirkin wrote: > This allows mod(int, int) to become selected as float mod when doubles > are supported. > > Signed-off-by: Ilia Mirkin > Cc: "10.6" > --- > src/glsl/ir_function.cpp | 8 +--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/src/glsl/ir_function.cpp b/src/glsl/ir_function.cpp > index 2b2643c..1319443 100644 > --- a/src/glsl/ir_function.cpp > +++ b/src/glsl/ir_function.cpp > @@ -148,9 +148,11 @@ get_parameter_match_type(const ir_variable *param, > if (from_type == to_type) >return PARAMETER_EXACT_MATCH; > > - /* XXX: When ARB_gpu_shader_fp64 support is added, check for > float->double, > -* and int/uint->double conversions > -*/ > + if (to_type->base_type == GLSL_TYPE_DOUBLE) { > + if (from_type->base_type == GLSL_TYPE_FLOAT) > + return PARAMETER_FLOAT_TO_DOUBLE; > + return PARAMETER_INT_TO_DOUBLE; > + } > > if (to_type->base_type == GLSL_TYPE_FLOAT) >return PARAMETER_INT_TO_FLOAT; > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/5] glcpp: Allow arithmetic integer expressions in #line
First, sorry for the late answer, I somehow missed your replies (I was not in CC). On mar, 2015-06-09 at 10:59 -0700, Ian Romanick wrote: > On 06/09/2015 10:40 AM, Carl Worth wrote: > > On Tue, Jun 09 2015, Ian Romanick wrote: > >>> From section 3.4 ("Preprocessor") of the GLSL ES 3.00 specification: > >>> "#line must have, after macro substitution, one of the following forms: > >>> #line line > >>> #line line source-string-number > >>> where line and source-string-number are constant integral > >>> expressions." > > ... > >>> From section 4.3.3 ("Constant Expressions") of the same specification: > >>> "A constant integral expression is a constant expression that evaluates > >>> to a scalar signed or unsigned integer." > > > > Yes. That's an extremely unfortunate piece of the specification. > > > > This, together with unary operators introduces inherent ambiguity into > > the grammar. Just think about things like: I forgot to mention in the patch's commit message that, because of the ambiguity of the grammar, I made the assumption that one (or more blanks) that are not part of an expression between parentheses, act as parameter separators. Then, for the examples mentioned in the thread the output would be: #line 2-1+5 -> #line 6 #line 2 -1+5 -> #line 2 4 #line 2-1 +5 -> #line 1 5 #line 2-1+5 3 -> #line 6 3 #line 2 -1+5 3 -> compilation error #line 2-1 +5 3 -> compilation error #line 3 +3 -> #line 3 3 #line 3 (+3) -> #line 3 3 And for the parentheses the behavior is: #line (2 -1)+5 -> #line 6 #line 3 (4+1)-1 -> #line 3 4 #line (3) ((4+1) -1) -> #line 3 4 #line 3 (4+1) -1 -> compilation error > The spec was supposed to get updated to say that parsing is greedy, so > we at least know what those should do. I say "supposed to" instead of > "was" because I don't know for sure that it was updated. I am afraid that greedy parsing is not what this patch implements, as #line 3 +3 4 will not be evaluated as #line 6 4, but will raise an error instead. > > But I'll also take a look at this patch. Thanks for bringing it to my > > attention, Ian. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] abundance of branches in mesa.git
On Mon, Jun 22, 2015 at 11:30 AM, Christian König wrote: > On 22.06.2015 15:41, Ilia Mirkin wrote: >> >> On Mon, Jun 22, 2015 at 9:39 AM, Tom Stellard wrote: >>> >>> On Mon, Jun 22, 2015 at 12:23:54PM +0200, Marek Olšák wrote: On Mon, Jun 22, 2015 at 5:36 AM, Ilia Mirkin wrote: > > On Sun, Jun 21, 2015 at 11:33 PM, Michel Dänzer > wrote: >> >> On 22.06.2015 00:31, Ilia Mirkin wrote: >>> >>> On Sun, Jun 21, 2015 at 12:22 PM, Emil Velikov >>> wrote: On 20/06/15 10:01, Eirik Byrkjeflot Anonsen wrote: > > Ilia Mirkin writes: > >> Hello, >> >> There are a *ton* of branches in the upstream mesa git. Here is a >> full list: >> > [...] >> >> is there >> any reason to keep these around with the exception of: >> >> master >> $version (i.e. 9.0, 10.0, mesa_7_7_branch, etc) > > Instead of outright deleting old branches, it would be possible to > set > up an "archive" repository which mirrors all branches of the main > repository. And then delete "obsolete" branches only from the main > repository. Ideally, you would want a git hook to refuse to create > a new > branch (in the main repository) if a branch by that name already > exists > in the archive repository. Possibly with the exception that > creating a > same-named branch on the same commit would be allowed. > > (And the same for tags, of course) > Personally I am fine with either approach - stay/nuke/move. But I'm thinking that having a mix of the two suggestions might be a nice middle ground. Write a script that nukes branches that are merged in master (check the top commit of the branch) and have an 'archive' repo that contains everything else (minus the stable branches). >> >> Sounds good to me, FWIW. >> >> >>> That still leaves a ton around, and curiously removes mesa_7_5 and >>> mesa_7_6. >> >> I think the latter is expected, we were using a different branching >> model back in those days. >> >> >>> origin/amdgpu >> >> Note that this is a currently active branch, to be merged to master >> soon. > > Perhaps there's something I don't understand, but why is a feature > branch made available on the shared tree? In my view of things the > only branches on the shared mesa.git tree should be the version > branches. As you can see, a lot of feature branches are in the shared tree already, so there is a precedent. Sharing a branch among people in this way sometimes tends to be more convenient. The reason here is that it's the only mesa repository where most people from our team have commit access. >>> Also, the shared git tree supports https access, which means it is >>> accessible when behind a firewall. >> >> OK, well if that's the prevailing attitude, then I'm on a fool's >> errand, and I'll just drop this. > > > I still think it would be a good idea to archive the branches after a while, > cause the current status is rather confusing if you search for something > specifc. Yeah, but if the policy is "create random branches whenever you feel like on the upstream mesa tree", then this same current situation will happen again, so it's not really worth fixing now (at least for me). I'm not aware of any other major project with this sort of branching policy, but I guess there's always a first! I don't really see why you wouldn't just use a shared tree in someone's ~/foo, chgrp'd to mesa or some other convenient group, or how https plays into things, but I'm sure there's some reason for it. [Or why those amdgpu patches are on a branch in the first place rather than in master.] If the final state isn't a tree with a policy of not adding (non-release) branches, I don't think I'm particularly interested in doing the legwork. Cheers, -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] glsl: handle conversions to double when comparing param matches
http://patchwork.freedesktop.org/patch/52138/ I've already pushed this patch btw, Chris gave me a r-b over IRC. But it seems I neglected to push the piglit patch, my bad. On Mon, Jun 22, 2015 at 11:35 AM, Ian Romanick wrote: > This seems believable... is there a piglit test? > > On 06/17/2015 12:15 PM, Ilia Mirkin wrote: >> This allows mod(int, int) to become selected as float mod when doubles >> are supported. >> >> Signed-off-by: Ilia Mirkin >> Cc: "10.6" >> --- >> src/glsl/ir_function.cpp | 8 +--- >> 1 file changed, 5 insertions(+), 3 deletions(-) >> >> diff --git a/src/glsl/ir_function.cpp b/src/glsl/ir_function.cpp >> index 2b2643c..1319443 100644 >> --- a/src/glsl/ir_function.cpp >> +++ b/src/glsl/ir_function.cpp >> @@ -148,9 +148,11 @@ get_parameter_match_type(const ir_variable *param, >> if (from_type == to_type) >>return PARAMETER_EXACT_MATCH; >> >> - /* XXX: When ARB_gpu_shader_fp64 support is added, check for >> float->double, >> -* and int/uint->double conversions >> -*/ >> + if (to_type->base_type == GLSL_TYPE_DOUBLE) { >> + if (from_type->base_type == GLSL_TYPE_FLOAT) >> + return PARAMETER_FLOAT_TO_DOUBLE; >> + return PARAMETER_INT_TO_DOUBLE; >> + } >> >> if (to_type->base_type == GLSL_TYPE_FLOAT) >>return PARAMETER_INT_TO_FLOAT; >> > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] ARB_arrays_of_arrays GLSL ES
Hi, On 06/20/2015 03:32 PM, Timothy Arceri wrote: The restrictions in ES make the extension easier to implement so I thought I'd try get this stuff reviewed an committed before finishing up the full extension. The bits that I'm still working on for the desktop version are AoA inputs outputs, and interface blocks. The only thing I know is definatly missing in this series for ES is support for indirect indexing of samplers, but that didn't seem like something that should hold up the series. Once the SSBO series lands (with a patch that restricts unsized arrays) then all the AoA ES conformance tests will pass. There are already a bunch of piglit tests in git but I've just sent a series with all the patches still waiting review here: http://lists.freedesktop.org/archives/piglit/2015-June/016312.html I haven't made a patch marking this as done yet because currently the i965 backend takes a very long time trying to optimise some of the conformance tests. They still pass but they are taking 15-minutes+ just to compile so this really needs to be sorted out first. If someone with more knowledge in this area than me wants to take a look at this I would be greatful for being pointed in the right direction. Are there individual shaders which compilation take several minutes? Do you have any perf [1] or valgrind [2] tool output for compiling the slowest one? - Eero [1] # perf record -a ^C # perf report -n -> text output [2] $ valgrind --tool=callgrind $ kcachegrind -> callgraphs, callee maps etc ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] glsl: Fix counting of varyings.
On 06/19/2015 06:08 AM, Jose Fonseca wrote: > When input and output varyings started to be counted separately (commit > 42305fb5) the is_varying_var function wasn't updated to return true for > output varyings or input varyings for stages other than the fragment > shader), effectively making the varying limit to never be checked. Without SSO, counting the varying inputs used by, say, the fragment shader, should be sufficient. With SSO, it's more difficult. > With this change, color, texture coord, and generic varyings are not > counted, but others are ignored. It is assumed the hardware will handle > special varyings internally (ie, optimistic rather than pessimistic), to > avoid causing regressions where things were working somehow. > > This fixes `glsl-max-varyings --exceed-limits` with softpipe/llvmpipe, > which was asserting because we were getting varyings beyond > VARYING_SLOT_MAX in st_glsl_to_tgsi.cpp. > > It also prevents the assertion failure with > https://bugs.freedesktop.org/show_bug.cgi?id=90539 but the tests still > fails due to the link error. > > This change also adds a few assertions to catch this sort of errors > earlier, and potentially prevent buffer overflows in the future (no > buffer overflow was detected here though). > > However, this change causes several tests to regress: > > spec/glsl-1.10/execution/varying-packing/simple ivec3 array > spec/glsl-1.10/execution/varying-packing/simple ivec3 separate > spec/glsl-1.10/execution/varying-packing/simple uvec3 array > spec/glsl-1.10/execution/varying-packing/simple uvec3 separate Wait... so the ivec3 and uvec3 tests fail, but the vec3 test passes? > spec/arb_gpu_shader_fp64/varying-packing/simple dmat3 array > spec/glsl-1.50/execution/geometry/max-input-components > spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec4-index-rd > > spec/glsl-1.50/execution/variable-indexing/vs-output-array-vec4-index-wr-before-gs > > But this all seem to be issues either in the way we count varyings > (e.g., geometry inputs get counted multiple times) or in the tests > themselves, or limitations in the varying packer, and deserve attention > on their own right. Do you have a feeling for which tests are which sorts of problems? I'd like to run this through GLES3 conformance before it gets pushed. I'm not too worried about the geometry shader issues, but the ivec / uvec tests seem more problematic. > --- > src/glsl/link_varyings.cpp | 70 > -- > src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 2 + > 2 files changed, 58 insertions(+), 14 deletions(-) > > diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp > index 278a778..7649720 100644 > --- a/src/glsl/link_varyings.cpp > +++ b/src/glsl/link_varyings.cpp > @@ -190,6 +190,8 @@ cross_validate_outputs_to_inputs(struct gl_shader_program > *prog, >*/ > const unsigned idx = var->data.location - VARYING_SLOT_VAR0; > > + assert(idx < MAX_VARYING); > + > if (explicit_locations[idx] != NULL) { > linker_error(prog, > "%s shader has multiple outputs explicitly " > @@ -1031,25 +1033,63 @@ varying_matches::match_comparator(const void > *x_generic, const void *y_generic) > /** > * Is the given variable a varying variable to be counted against the > * limit in ctx->Const.MaxVarying? > - * This includes variables such as texcoords, colors and generic > - * varyings, but excludes variables such as gl_FrontFacing and gl_FragCoord. > + * > + * OpenGL specification states: Please use the canonical format. * Section A.B (Foo Bar) of the OpenGL X.Y Whichever Profile spec * says: That enables later readers to more easily find the text in the spec. Also, the language changes from time to time. > + * > + * Each output variable component used as either a vertex shader output or > + * fragment shader input counts against this limit, except for the > components > + * of gl_Position. A program containing only a vertex and fragment shader This bit about gl_Position is tricky... I believe this language has changed more than once in the spec. It's also the reason the varying limit has changed from 64 components to 60 components. I don't think that affects this patch... it's just a thing I thought was worth pointing out. > + * that accesses more than this limit's worth of components of outputs may > + * fail to link, unless device-dependent optimizations are able to make the > + * program fit within available hardware resources. > + * > */ > static bool > var_counts_against_varying_limit(gl_shader_stage stage, const ir_variable > *var) > { > - /* Only fragment shaders will take a varying variable as an input */ > - if (stage == MESA_SHADER_FRAGMENT && > - var->data.mode == ir_var_shader_in) { > - switch (var->data.location) { > - case VARYING_SLOT_POS: > - case VARYING_SLOT_FACE: > - ca
[Mesa-dev] [Bug 91044] piglit spec/egl_khr_create_context/valid debug flag gles* fail
https://bugs.freedesktop.org/show_bug.cgi?id=91044 --- Comment #1 from Emil Velikov --- Based of the patch date (17 July 2012) and the extension revision history I'd say that things were changed/nuked in Version 12 or later. With Version 15 being the prime suspect. As Intel is a Khronos member, you should have access to the SVN repo/history for the exact details. I'd assume that it would be the better option. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] abundance of branches in mesa.git
I will happily remove the branch after the kernel driver lands. I also wonder why all Mesa developers can force-push branches in Mesa but not libdrm. Marek On Mon, Jun 22, 2015 at 5:39 PM, Ilia Mirkin wrote: > On Mon, Jun 22, 2015 at 11:30 AM, Christian König > wrote: >> On 22.06.2015 15:41, Ilia Mirkin wrote: >>> >>> On Mon, Jun 22, 2015 at 9:39 AM, Tom Stellard wrote: On Mon, Jun 22, 2015 at 12:23:54PM +0200, Marek Olšák wrote: > > On Mon, Jun 22, 2015 at 5:36 AM, Ilia Mirkin > wrote: >> >> On Sun, Jun 21, 2015 at 11:33 PM, Michel Dänzer >> wrote: >>> >>> On 22.06.2015 00:31, Ilia Mirkin wrote: On Sun, Jun 21, 2015 at 12:22 PM, Emil Velikov wrote: > > On 20/06/15 10:01, Eirik Byrkjeflot Anonsen wrote: >> >> Ilia Mirkin writes: >> >>> Hello, >>> >>> There are a *ton* of branches in the upstream mesa git. Here is a >>> full list: >>> >> [...] >>> >>> is there >>> any reason to keep these around with the exception of: >>> >>> master >>> $version (i.e. 9.0, 10.0, mesa_7_7_branch, etc) >> >> Instead of outright deleting old branches, it would be possible to >> set >> up an "archive" repository which mirrors all branches of the main >> repository. And then delete "obsolete" branches only from the main >> repository. Ideally, you would want a git hook to refuse to create >> a new >> branch (in the main repository) if a branch by that name already >> exists >> in the archive repository. Possibly with the exception that >> creating a >> same-named branch on the same commit would be allowed. >> >> (And the same for tags, of course) >> > Personally I am fine with either approach - stay/nuke/move. But I'm > thinking that having a mix of the two suggestions might be a nice > middle > ground. > > Write a script that nukes branches that are merged in master (check > the > top commit of the branch) and have an 'archive' repo that contains > everything else (minus the stable branches). >>> >>> Sounds good to me, FWIW. >>> >>> That still leaves a ton around, and curiously removes mesa_7_5 and mesa_7_6. >>> >>> I think the latter is expected, we were using a different branching >>> model back in those days. >>> >>> origin/amdgpu >>> >>> Note that this is a currently active branch, to be merged to master >>> soon. >> >> Perhaps there's something I don't understand, but why is a feature >> branch made available on the shared tree? In my view of things the >> only branches on the shared mesa.git tree should be the version >> branches. > > As you can see, a lot of feature branches are in the shared tree > already, so there is a precedent. Sharing a branch among people in > this way sometimes tends to be more convenient. > > The reason here is that it's the only mesa repository where most > people from our team have commit access. > Also, the shared git tree supports https access, which means it is accessible when behind a firewall. >>> >>> OK, well if that's the prevailing attitude, then I'm on a fool's >>> errand, and I'll just drop this. >> >> >> I still think it would be a good idea to archive the branches after a while, >> cause the current status is rather confusing if you search for something >> specifc. > > Yeah, but if the policy is "create random branches whenever you feel > like on the upstream mesa tree", then this same current situation will > happen again, so it's not really worth fixing now (at least for me). > I'm not aware of any other major project with this sort of branching > policy, but I guess there's always a first! > > I don't really see why you wouldn't just use a shared tree in > someone's ~/foo, chgrp'd to mesa or some other convenient group, or > how https plays into things, but I'm sure there's some reason for it. > [Or why those amdgpu patches are on a branch in the first place rather > than in master.] If the final state isn't a tree with a policy of not > adding (non-release) branches, I don't think I'm particularly > interested in doing the legwork. > > Cheers, > > -ilia > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] abundance of branches in mesa.git
On 06/22/2015 10:40 AM, Marek Olšák wrote: > I will happily remove the branch after the kernel driver lands. > > I also wonder why all Mesa developers can force-push branches in Mesa > but not libdrm. That's probably just historical. We probably ought to restrict that on Mesa as well. It sounds like you guys have some requirements for a shared repo. It seems like a repo on fd.o could work. I think you'd just need a "amddevs" group and make the repo group rwx. I thought fd.o GIT did https (maybe just SSH?). > Marek > > On Mon, Jun 22, 2015 at 5:39 PM, Ilia Mirkin wrote: >> On Mon, Jun 22, 2015 at 11:30 AM, Christian König >> wrote: >>> On 22.06.2015 15:41, Ilia Mirkin wrote: On Mon, Jun 22, 2015 at 9:39 AM, Tom Stellard wrote: > > On Mon, Jun 22, 2015 at 12:23:54PM +0200, Marek Olšák wrote: >> >> On Mon, Jun 22, 2015 at 5:36 AM, Ilia Mirkin >> wrote: >>> >>> On Sun, Jun 21, 2015 at 11:33 PM, Michel Dänzer >>> wrote: On 22.06.2015 00:31, Ilia Mirkin wrote: > > On Sun, Jun 21, 2015 at 12:22 PM, Emil Velikov > wrote: >> >> On 20/06/15 10:01, Eirik Byrkjeflot Anonsen wrote: >>> >>> Ilia Mirkin writes: >>> Hello, There are a *ton* of branches in the upstream mesa git. Here is a full list: >>> [...] is there any reason to keep these around with the exception of: master $version (i.e. 9.0, 10.0, mesa_7_7_branch, etc) >>> >>> Instead of outright deleting old branches, it would be possible to >>> set >>> up an "archive" repository which mirrors all branches of the main >>> repository. And then delete "obsolete" branches only from the main >>> repository. Ideally, you would want a git hook to refuse to create >>> a new >>> branch (in the main repository) if a branch by that name already >>> exists >>> in the archive repository. Possibly with the exception that >>> creating a >>> same-named branch on the same commit would be allowed. >>> >>> (And the same for tags, of course) >>> >> Personally I am fine with either approach - stay/nuke/move. But I'm >> thinking that having a mix of the two suggestions might be a nice >> middle >> ground. >> >> Write a script that nukes branches that are merged in master (check >> the >> top commit of the branch) and have an 'archive' repo that contains >> everything else (minus the stable branches). Sounds good to me, FWIW. > That still leaves a ton around, and curiously removes mesa_7_5 and > mesa_7_6. I think the latter is expected, we were using a different branching model back in those days. > origin/amdgpu Note that this is a currently active branch, to be merged to master soon. >>> >>> Perhaps there's something I don't understand, but why is a feature >>> branch made available on the shared tree? In my view of things the >>> only branches on the shared mesa.git tree should be the version >>> branches. >> >> As you can see, a lot of feature branches are in the shared tree >> already, so there is a precedent. Sharing a branch among people in >> this way sometimes tends to be more convenient. >> >> The reason here is that it's the only mesa repository where most >> people from our team have commit access. >> > Also, the shared git tree supports https access, which means it is > accessible when behind a firewall. OK, well if that's the prevailing attitude, then I'm on a fool's errand, and I'll just drop this. >>> >>> >>> I still think it would be a good idea to archive the branches after a while, >>> cause the current status is rather confusing if you search for something >>> specifc. >> >> Yeah, but if the policy is "create random branches whenever you feel >> like on the upstream mesa tree", then this same current situation will >> happen again, so it's not really worth fixing now (at least for me). >> I'm not aware of any other major project with this sort of branching >> policy, but I guess there's always a first! >> >> I don't really see why you wouldn't just use a shared tree in >> someone's ~/foo, chgrp'd to mesa or some other convenient group, or >> how https plays into things, but I'm sure there's some reason for it. >> [Or why those amdgpu patches are on a branch in the first place rather >> than in master.] If the final state isn't a tree with a policy of not >> adding (non-release) branches, I don't think I'm particularly >> interested in doing the legwork. >> >> Cheers, >> >> -ilia >> __
Re: [Mesa-dev] [PATCH 11/11] android: egl: do not link against libglapi
Niiice, thank you. For most drivers - gallium, i965 this is implemented, leaving nouveau_vieux, radeon, r200 and i915. From these i915 does work with EGL, while nouveau_vieux dies miserably (missing __DRI_IMAGE v7 iirc). How well does radeon/r200 fair ? So as a nice starter task one can, modify EGL to use flush_with_flags and fall-back do glFlush. Hmm... seems perfect for Google Code-In (junior GSoC). The application for mentoring org. is around October, perhaps we can give it a bash :-) You did bring a very nice topic though... up-to when are we going to support every loader/dri module combination out there ? Emil On 21 June 2015 at 10:22, Marek Olšák wrote: > FWIW, flushing can be done through > flush_with_flags(__DRI2_FLUSH_CONTEXT), so glFlush shouldn't be > needed, but some drivers don't implement flush_with_flags and I've > heard libEGL and libGL need to support DRI drivers from older Mesas too. > > Marek > > On Fri, Jun 19, 2015 at 9:56 PM, Emil Velikov > wrote: >> The only reason we touch glapi is to dlopen it to: >> - make sure that the unresolved _glapi* symbols in the dri modules are >> provided. >> - fetch glFlush() and use it at various stages in the dri2 driver. >> >> XXX: If anyone has suggestions why the latter is required (or can >> recommend any reading material) I'm all ears. >> >> Cc: Chih-Wei Huang >> Cc: Eric Anholt >> Signed-off-by: Emil Velikov >> --- >> src/egl/main/Android.mk | 1 - >> 1 file changed, 1 deletion(-) >> >> diff --git a/src/egl/main/Android.mk b/src/egl/main/Android.mk >> index 8f687e9..0ba7295 100644 >> --- a/src/egl/main/Android.mk >> +++ b/src/egl/main/Android.mk >> @@ -44,7 +44,6 @@ LOCAL_CFLAGS := \ >> -D_EGL_OS_UNIX=1 >> >> LOCAL_SHARED_LIBRARIES := \ >> - libglapi \ >> libdl \ >> libhardware \ >> liblog \ >> -- >> 2.4.2 >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/fs: Fix ir_txs in emit_texture_gen4_simd16().
We were not emitting the LOD, which led to message lengths of 1 instead of 3. Setting has_lod makes us emit the LOD, but I had to make changes to avoid emitting the non-existent coordinate as well. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91022 Cc: mesa-sta...@lists.freedesktop.org Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 4770838..12253e4 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -247,7 +247,7 @@ fs_visitor::emit_texture_gen4_simd16(ir_texture_opcode op, fs_reg dst, uint32_t sampler) { fs_reg message(MRF, 2, BRW_REGISTER_TYPE_F, dispatch_width); - bool has_lod = op == ir_txl || op == ir_txb || op == ir_txf; + bool has_lod = op == ir_txl || op == ir_txb || op == ir_txf || op == ir_txs; if (has_lod && shadow_c.file != BAD_FILE) no16("TXB and TXL with shadow comparison unsupported in SIMD16."); @@ -264,14 +264,15 @@ fs_visitor::emit_texture_gen4_simd16(ir_texture_opcode op, fs_reg dst, fs_reg msg_end = offset(message, vector_elements); /* Messages other than sample and ld require all three components */ - if (has_lod || shadow_c.file != BAD_FILE) { + if (vector_elements > 0 && (has_lod || shadow_c.file != BAD_FILE)) { for (int i = vector_elements; i < 3; i++) { bld.MOV(offset(message, i), fs_reg(0.0f)); } + msg_end = offset(message, 3); } if (has_lod) { - fs_reg msg_lod = retype(offset(message, 3), op == ir_txf ? + fs_reg msg_lod = retype(msg_end, op == ir_txf ? BRW_REGISTER_TYPE_UD : BRW_REGISTER_TYPE_F); bld.MOV(msg_lod, lod); msg_end = offset(msg_lod, 1); -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] abundance of branches in mesa.git
It's not so important now that the amdgpu driver is about to be merged. Speaking of other branches, I think removing the old feature branches is a good idea. Marek On Mon, Jun 22, 2015 at 8:02 PM, Ian Romanick wrote: > On 06/22/2015 10:40 AM, Marek Olšák wrote: >> I will happily remove the branch after the kernel driver lands. >> >> I also wonder why all Mesa developers can force-push branches in Mesa >> but not libdrm. > > That's probably just historical. We probably ought to restrict that on > Mesa as well. > > It sounds like you guys have some requirements for a shared repo. It > seems like a repo on fd.o could work. I think you'd just need a > "amddevs" group and make the repo group rwx. I thought fd.o GIT did > https (maybe just SSH?). > >> Marek >> >> On Mon, Jun 22, 2015 at 5:39 PM, Ilia Mirkin wrote: >>> On Mon, Jun 22, 2015 at 11:30 AM, Christian König >>> wrote: On 22.06.2015 15:41, Ilia Mirkin wrote: > > On Mon, Jun 22, 2015 at 9:39 AM, Tom Stellard wrote: >> >> On Mon, Jun 22, 2015 at 12:23:54PM +0200, Marek Olšák wrote: >>> >>> On Mon, Jun 22, 2015 at 5:36 AM, Ilia Mirkin >>> wrote: On Sun, Jun 21, 2015 at 11:33 PM, Michel Dänzer wrote: > > On 22.06.2015 00:31, Ilia Mirkin wrote: >> >> On Sun, Jun 21, 2015 at 12:22 PM, Emil Velikov >> wrote: >>> >>> On 20/06/15 10:01, Eirik Byrkjeflot Anonsen wrote: Ilia Mirkin writes: > Hello, > > There are a *ton* of branches in the upstream mesa git. Here is a > full list: > [...] > > is there > any reason to keep these around with the exception of: > > master > $version (i.e. 9.0, 10.0, mesa_7_7_branch, etc) Instead of outright deleting old branches, it would be possible to set up an "archive" repository which mirrors all branches of the main repository. And then delete "obsolete" branches only from the main repository. Ideally, you would want a git hook to refuse to create a new branch (in the main repository) if a branch by that name already exists in the archive repository. Possibly with the exception that creating a same-named branch on the same commit would be allowed. (And the same for tags, of course) >>> Personally I am fine with either approach - stay/nuke/move. But I'm >>> thinking that having a mix of the two suggestions might be a nice >>> middle >>> ground. >>> >>> Write a script that nukes branches that are merged in master (check >>> the >>> top commit of the branch) and have an 'archive' repo that contains >>> everything else (minus the stable branches). > > Sounds good to me, FWIW. > > >> That still leaves a ton around, and curiously removes mesa_7_5 and >> mesa_7_6. > > I think the latter is expected, we were using a different branching > model back in those days. > > >> origin/amdgpu > > Note that this is a currently active branch, to be merged to master > soon. Perhaps there's something I don't understand, but why is a feature branch made available on the shared tree? In my view of things the only branches on the shared mesa.git tree should be the version branches. >>> >>> As you can see, a lot of feature branches are in the shared tree >>> already, so there is a precedent. Sharing a branch among people in >>> this way sometimes tends to be more convenient. >>> >>> The reason here is that it's the only mesa repository where most >>> people from our team have commit access. >>> >> Also, the shared git tree supports https access, which means it is >> accessible when behind a firewall. > > OK, well if that's the prevailing attitude, then I'm on a fool's > errand, and I'll just drop this. I still think it would be a good idea to archive the branches after a while, cause the current status is rather confusing if you search for something specifc. >>> >>> Yeah, but if the policy is "create random branches whenever you feel >>> like on the upstream mesa tree", then this same current situation will >>> happen again, so it's not really worth fixing now (at least for me). >>> I'm not aware of any other major project with this sort of branching >>> policy, but I guess there's always a first! >>> >>> I don't really see why you wouldn't just use a shared tree in >>> someone's ~/foo, chgrp'd to mesa or some other convenient group, or
Re: [Mesa-dev] [PATCH 00/11] glapi fixes - build whole of mesa with
On 06/22/2015 07:01 AM, Jose Fonseca wrote: > On 19/06/15 23:09, Emil Velikov wrote: >> On 19 June 2015 at 21:26, Jose Fonseca wrote: >>> On 19/06/15 20:56, Emil Velikov wrote: Hi all, A lovely series inspired (more like 'was awaken to send these out') by Pal Rohár, who was having issues when building xlib-libgl (plus the now enabled gles*) So here, we teach the final two static glapi users about shared-glapi, plus some related fixes. After this is done we can finally start transitioning to shared-only glapi, with some more details as mentioned in one of the patches: XXX: With this one done, we can finally transition with enforcing shared-glapi, and - link the dri modules against libglapi.so, add --no-undefined to the LDFLAGS - drop the dlopen(libglapi.so/libGL.so, RTLD_GLOBAL) workarounds in the loaders - libGL, libEGL and libgbm. - start killing off/cleaning up the dispatch ? The caveats: 1) up to what stage do we care about static libraries - libgl (either dri or xlib based) - osmesa - libEGL 2) how about other platforms (scons) ? - currently the scons uses static glapi, - would we need the dlopen(...) on windows ? Hope everyone is excited about this one as I am :-) >>> >>> >>> Maybe I missed the context of this changes, but why this matters or >>> is an >>> improvement? >>> >> If one goes the extra mile (which this series doesn't) - one configure >> option less, substantial some code de-duplication and consistent use >> of the code amongst all components provided. This way any >> improvements/cleanups made to the shared glapi will be available to >> osmesa/xlib-libgl. > > I'm perfectly happy with removing the configure option. > > And I understand the benefits of unified code paths, but I believe that > for this particular case, the difference in requirements really demands > the separate code paths. > >>> In summary, having the ability of using a shared glapi sounds great, but >>> forcing shared glapi everywhere, sounds a bad idea. >>> >> I'm suspecting that people might be keen on the following idea - use >> static glapi for osmesa/xlib-libgl and shared one everywhere else? > > Yes, that sounds reasonable for me. (Needs libgl-gdi too.) > >> >> I fear that this will lead to further separation/bit-rot between the >> different implementations, but it seems like the bester compromise. > > I don't feel strongly between: a) using the same source code for both > static/shared glapi (switched by a pre-processor define), or b) only > share the interface but have shared/static glapi implementations. I'm > actually not that familiar with that code. > > > Either way, we can have two glapi build targets (a shared-glapi and a > static-glapipe) side-by-side, so that there are no more source-wide > configure flags. > > > I believe a lot of the complexity of that code comes from assembly. I > wonder if it's really justified nowadays (and even if it is, whether it > would be better served with GNU C assembly.) Futhermore, I believe on > Windows we use any assembly, so if we split shared/static glapi source > code, we could probably abandon assembly from the static-glapi. It comes from the intersection of the assembly and the myriad threading options. Having TLS and shared-glapi is the only "option" for DRI builds would be terrific. We have a couple work loads that, especially on Atom CPUs, are sensitive to any added overhead. My recollection was that GCC does not generate the code you want for the dispatch functions. I feel like we keeping coming around to the loader/driver interface needing some significant work. I certainly have a bunch of ideas for how things could be improved. I'll start working on a proposal. > Jose > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC] Compatibility between old dri modules and new loaders, and vice verse
Hi all, As kindly hinted by Marek, currently we do have a wide selection of supported dri <> loader combinations. Although we like to think that things never break, we have to admit that not many of us test every possible combinations of dri modules and loaders. With the chances getting smaller as the time gap (age) between the two increases. As such I would like to ask if we're interested in gradually depreciating as the gap grows beyond X years. The rough idea that I have in my mind is: - Check for obsolete extensions (requirements for such) - both in the dri modules and the loaders (including the xserver). - Add some WARN messages ("You're using an old loader/DRI module. Update to XXX or later") when such code path is hit. - After X mesa releases, we remove the dri extension from the module(s) and bump the requirement(s) in the loader(s). And now the more important question why ? - Very rarely tested and not actively supported - if it works it works, we only cover one stable branch. - Having a quick look at the the "if extension && extension.version >= y" maze does leave most of us speechless. - Will allow us to start removing a few of the nasty quirks/hacks that we currently have laying around. Worth mentioning: - Depreciation period will be based on the longest time frame set by LTS versions of distros. For example if Debian A ships X and mesa 3 years apart, while Ubuntu does is ~2.5 and RedHat ~2.8, we'll stick with 3 years. - libGL dri1 support... it's been almost four years since the removal of the dri1 modules. Since then the only activity that I've noticed by Connor Behan on the r128 front. Although it seems that he has covered the ddx and is just looking at the kernel side of things. Should we consider mesa X (10.6 ?) as the last one that supports such old modules in it's libGL and give it a much needed cleanup ? How would people feel about this - do we have any strong ack/nack about the idea ? Are there many people/companies that support distros where the xserver <> mesa gap is over, say 2 years ? Looking forward to any feedback, Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Building Mesa/LLVMpipe on Windows
Hi everyone, I spent some time building Mesa/llvmpipe on Windows and created a Python script that implements all the required steps (downloading/extracting all prerequisites and sources, configuring and building LLVM and Mesa). The script is available at: https://github.com/florianlink/MesaOnWindows I hope it helps some people struggling with the build details on Windows! If you are interested, feel free to incorporate it into Mesa, I placed the script into the public domain. Best regards, Florian P.S. Is there any reason why there are no prebuilt Mesa opengl32.dll files available on the web? I considered putting a current dll onto Github as well, are there any reasons why I should not do that? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radeon: Advertise correct GL_SAMPLES_PASSED value.
From: Ian Romanick Commit b765119c changed the default value of all the counter bits to 64. However, older hardware only has 32 counter bits. This has only been build-tested. We don't have any tests that verify the advertised value against implementation behavior, so I don't know what additional testing could be done. NOTE: It appears that many Gallium drivers (at least r300 and i915g) have the same problem, but I don't see a way for the state-tracker to determine the counter size. Signed-off-by: Ian Romanick Cc: Marek Olšák Cc: Alex Deucher --- .../drivers/dri/radeon/radeon_common_context.c | 23 ++ 1 file changed, 23 insertions(+) diff --git a/src/mesa/drivers/dri/radeon/radeon_common_context.c b/src/mesa/drivers/dri/radeon/radeon_common_context.c index 9699dcb..3d0ceda 100644 --- a/src/mesa/drivers/dri/radeon/radeon_common_context.c +++ b/src/mesa/drivers/dri/radeon/radeon_common_context.c @@ -194,6 +194,29 @@ GLboolean radeonInitContext(radeonContextPtr radeon, radeon_init_dma(radeon); +/* _mesa_initialize_context calls _mesa_init_queryobj which + * initializes all of the counter sizes to 64. The counters on r100 + * and r200 are only 32-bits for occlusion queries. Those are the + * only counters, so set the other sizes to zero. + */ +radeon->glCtx.Const.QueryCounterBits.SamplesPassed = 32; + +radeon->glCtx.Const.QueryCounterBits.TimeElapsed = 0; +radeon->glCtx.Const.QueryCounterBits.Timestamp = 0; +radeon->glCtx.Const.QueryCounterBits.PrimitivesGenerated = 0; +radeon->glCtx.Const.QueryCounterBits.PrimitivesWritten = 0; +radeon->glCtx.Const.QueryCounterBits.VerticesSubmitted = 0; +radeon->glCtx.Const.QueryCounterBits.PrimitivesSubmitted = 0; +radeon->glCtx.Const.QueryCounterBits.VsInvocations = 0; +radeon->glCtx.Const.QueryCounterBits.TessPatches = 0; +radeon->glCtx.Const.QueryCounterBits.TessInvocations = 0; +radeon->glCtx.Const.QueryCounterBits.GsInvocations = 0; +radeon->glCtx.Const.QueryCounterBits.GsPrimitives = 0; +radeon->glCtx.Const.QueryCounterBits.FsInvocations = 0; +radeon->glCtx.Const.QueryCounterBits.ComputeInvocations = 0; +radeon->glCtx.Const.QueryCounterBits.ClInPrimitives = 0; +radeon->glCtx.Const.QueryCounterBits.ClOutPrimitives = 0; + return GL_TRUE; } -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 11/11] android: egl: do not link against libglapi
Yes, I think we need to support every loader/driver combination, but I'm not sure. Ian, please how much do we care about compatibility between loaders (libGL, libEGL) and DRI drivers? Thanks, Marek On Mon, Jun 22, 2015 at 8:04 PM, Emil Velikov wrote: > Niiice, thank you. For most drivers - gallium, i965 this is > implemented, leaving nouveau_vieux, radeon, r200 and i915. From these > i915 does work with EGL, while nouveau_vieux dies miserably (missing > __DRI_IMAGE v7 iirc). How well does radeon/r200 fair ? > > So as a nice starter task one can, modify EGL to use flush_with_flags > and fall-back do glFlush. Hmm... seems perfect for Google Code-In > (junior GSoC). The application for mentoring org. is around October, > perhaps we can give it a bash :-) > > You did bring a very nice topic though... up-to when are we going to > support every loader/dri module combination out there ? > > Emil > > On 21 June 2015 at 10:22, Marek Olšák wrote: >> FWIW, flushing can be done through >> flush_with_flags(__DRI2_FLUSH_CONTEXT), so glFlush shouldn't be >> needed, but some drivers don't implement flush_with_flags and I've >> heard libEGL and libGL need to support DRI drivers from older Mesas too. >> >> Marek >> >> On Fri, Jun 19, 2015 at 9:56 PM, Emil Velikov >> wrote: >>> The only reason we touch glapi is to dlopen it to: >>> - make sure that the unresolved _glapi* symbols in the dri modules are >>> provided. >>> - fetch glFlush() and use it at various stages in the dri2 driver. >>> >>> XXX: If anyone has suggestions why the latter is required (or can >>> recommend any reading material) I'm all ears. >>> >>> Cc: Chih-Wei Huang >>> Cc: Eric Anholt >>> Signed-off-by: Emil Velikov >>> --- >>> src/egl/main/Android.mk | 1 - >>> 1 file changed, 1 deletion(-) >>> >>> diff --git a/src/egl/main/Android.mk b/src/egl/main/Android.mk >>> index 8f687e9..0ba7295 100644 >>> --- a/src/egl/main/Android.mk >>> +++ b/src/egl/main/Android.mk >>> @@ -44,7 +44,6 @@ LOCAL_CFLAGS := \ >>> -D_EGL_OS_UNIX=1 >>> >>> LOCAL_SHARED_LIBRARIES := \ >>> - libglapi \ >>> libdl \ >>> libhardware \ >>> liblog \ >>> -- >>> 2.4.2 >>> >>> ___ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/11] glapi fixes - build whole of mesa with
On 22 June 2015 at 15:01, Jose Fonseca wrote: > On 19/06/15 23:09, Emil Velikov wrote: >> >> On 19 June 2015 at 21:26, Jose Fonseca wrote: >>> >>> On 19/06/15 20:56, Emil Velikov wrote: Hi all, A lovely series inspired (more like 'was awaken to send these out') by Pal Rohár, who was having issues when building xlib-libgl (plus the now enabled gles*) So here, we teach the final two static glapi users about shared-glapi, plus some related fixes. After this is done we can finally start transitioning to shared-only glapi, with some more details as mentioned in one of the patches: XXX: With this one done, we can finally transition with enforcing shared-glapi, and - link the dri modules against libglapi.so, add --no-undefined to the LDFLAGS - drop the dlopen(libglapi.so/libGL.so, RTLD_GLOBAL) workarounds in the loaders - libGL, libEGL and libgbm. - start killing off/cleaning up the dispatch ? The caveats: 1) up to what stage do we care about static libraries - libgl (either dri or xlib based) - osmesa - libEGL 2) how about other platforms (scons) ? - currently the scons uses static glapi, - would we need the dlopen(...) on windows ? Hope everyone is excited about this one as I am :-) >>> >>> >>> >>> Maybe I missed the context of this changes, but why this matters or is an >>> improvement? >>> >> If one goes the extra mile (which this series doesn't) - one configure >> option less, substantial some code de-duplication and consistent use >> of the code amongst all components provided. This way any >> improvements/cleanups made to the shared glapi will be available to >> osmesa/xlib-libgl. > > > I'm perfectly happy with removing the configure option. > > And I understand the benefits of unified code paths, but I believe that for > this particular case, the difference in requirements really demands the > separate code paths. > >>> In summary, having the ability of using a shared glapi sounds great, but >>> forcing shared glapi everywhere, sounds a bad idea. >>> >> I'm suspecting that people might be keen on the following idea - use >> static glapi for osmesa/xlib-libgl and shared one everywhere else? > > > Yes, that sounds reasonable for me. (Needs libgl-gdi too.) > Indeed. Everything gdi is build only via scons so we'll touch it only if needed. >> >> I fear that this will lead to further separation/bit-rot between the >> different implementations, but it seems like the bester compromise. > > > I don't feel strongly between: a) using the same source code for both > static/shared glapi (switched by a pre-processor define), or b) only share > the interface but have shared/static glapi implementations. I'm actually > not that familiar with that code. > > > Either way, we can have two glapi build targets (a shared-glapi and a > static-glapipe) side-by-side, so that there are no more source-wide > configure flags. > In theory it should be fine, in practise... I'm rather cautious as mapi is the most convoluted part in mesa, and with the "subdir-objects" option being toggled soon things may go (albeit unlikely) subtly haywire. > > I believe a lot of the complexity of that code comes from assembly. I > wonder if it's really justified nowadays (and even if it is, whether it > would be better served with GNU C assembly.) Futhermore, I believe on > Windows we use any assembly, so if we split shared/static glapi source code, > we could probably abandon assembly from the static-glapi. > I'm not 100% sure but I'd suspect that Cygwin might use it when combined with swrast_dri. Don't know what others use - iirc some of the BSD folks are moving over to llvm. That I aside there is a massive amount of #ifdef spaghetti, apart from the assembly code. Can I have your ack/nack on the idea of having shared-glapi available for xlib-libgl (patches 2, 3 and 4), until we have both glapi's built in in parallel ? As mentioned originally, currently we fail to build if one enabled gles* and xlib-libgl and adding another hack in configure.ac is feel like flocking up a dead horse. -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] Compatibility between old dri modules and new loaders, and vice verse
> > As kindly hinted by Marek, currently we do have a wide selection of > supported dri <> loader combinations. > > Although we like to think that things never break, we have to admit > that not many of us test every possible combinations of dri modules > and loaders. With the chances getting smaller as the time gap (age) > between the two increases. As such I would like to ask if we're > interested in gradually depreciating as the gap grows beyond X years. > > The rough idea that I have in my mind is: > - Check for obsolete extensions (requirements for such) - both in the > dri modules and the loaders (including the xserver). > - Add some WARN messages ("You're using an old loader/DRI module. > Update to XXX or later") when such code path is hit. > - After X mesa releases, we remove the dri extension from the > module(s) and bump the requirement(s) in the loader(s). > > And now the more important question why ? > - Very rarely tested and not actively supported - if it works it > works, we only cover one stable branch. > - Having a quick look at the the "if extension && extension.version >>= y" maze does leave most of us speechless. > - Will allow us to start removing a few of the nasty quirks/hacks > that we currently have laying around. > > Worth mentioning: > - Depreciation period will be based on the longest time frame set by > LTS versions of distros. For example if Debian A ships X and mesa 3 > years apart, while Ubuntu does is ~2.5 and RedHat ~2.8, we'll stick > with 3 years. > - libGL dri1 support... it's been almost four years since the removal > of the dri1 modules. Since then the only activity that I've noticed by > Connor Behan on the r128 front. Although it seems that he has covered > the ddx and is just looking at the kernel side of things. Should we > consider mesa X (10.6 ?) as the last one that supports such old > modules in it's libGL and give it a much needed cleanup ? > > > How would people feel about this - do we have any strong ack/nack > about the idea ? Are there many people/companies that support distros > where the xserver <> mesa gap is over, say 2 years ? We still ship 7.11 based dri1 drivers in RHEL6, and there is still a chance of us rebasing to newer Mesa in that depending on schedules. ajax might have a different opinion, on how likely that is, but that would be at least another year from now where we'd want DRI1 to work. Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Building Mesa/LLVMpipe on Windows
On 22/06/15 19:40, Florian Link wrote: Hi everyone, I spent some time building Mesa/llvmpipe on Windows and created a Python script that implements all the required steps (downloading/extracting all prerequisites and sources, configuring and building LLVM and Mesa). The script is available at: https://github.com/florianlink/MesaOnWindows Given you're building for MSVC, you could avoid MinGW by using http://winflexbison.sourceforge.net/ . BTW, I've been playing with AppVeyor for building Mesa builds with MSVC. You can see the builds log https://ci.appveyor.com/project/jrfonseca/mesa It doesn't build everything -- it uses pre-compiled LLVM binaries --, and it also leverages a lot of software that is pre-installed int AppVeyor build images. > > I hope it helps some people struggling with the build details on Windows! > If you are interested, feel free to incorporate it into Mesa, Maybe this sort of script wouldn't be a bad idea indeed. > I placed the script into the public domain. Didn't know about unlicense.org . Interesting. A bit off-topic, but I actually have been considering public domain for future personal pet projects, because when Best regards, Florian P.S. Is there any reason why there are no prebuilt Mesa opengl32.dll files available on the web? I considered putting a current dll onto Github as well, are there any reasons why I should not do that? No particular reason other than nobody could be bothered. Mesa doesn't ship compiled binaries for any OS, not just Windows. Personally I don't the time to prepare binaries. If this ever was to happen it would have to be fully automated via something like AppVeyor (MSVC) or Travis-Ci (mingw cross-compilers). I also worry about people just downloading opengl32.dll, without understanding what they are doing, running into all sort of troubles, and flooding with bug reports / support requests. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 03/14] mesa: Fix conditions to test signed, unsigned integer format
On Sun, Jun 21, 2015 at 11:25 PM, Iago Toral wrote: > On Fri, 2015-06-19 at 13:32 -0700, Anuj Phogat wrote: >> On Thu, Jun 18, 2015 at 11:41 PM, Iago Toral wrote: >> > On Thu, 2015-06-18 at 09:19 -0700, Anuj Phogat wrote: >> >> On Thu, Jun 18, 2015 at 7:09 AM, Iago Toral wrote: >> >> > On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote: >> >> >> Signed-off-by: Anuj Phogat >> >> >> Cc: >> >> >> --- >> >> >> src/mesa/main/readpix.c | 2 ++ >> >> >> 1 file changed, 2 insertions(+) >> >> >> >> >> >> diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c >> >> >> index caa2648..a9416ef 100644 >> >> >> --- a/src/mesa/main/readpix.c >> >> >> +++ b/src/mesa/main/readpix.c >> >> >> @@ -160,10 +160,12 @@ _mesa_readpixels_needs_slow_path(const struct >> >> >> gl_context *ctx, GLenum format, >> >> >>srcType = _mesa_get_format_datatype(rb->Format); >> >> >> >> >> >>if ((srcType == GL_INT && >> >> >> + _mesa_is_enum_format_integer(format) && >> >> >> (type == GL_UNSIGNED_INT || >> >> >> type == GL_UNSIGNED_SHORT || >> >> >> type == GL_UNSIGNED_BYTE)) || >> >> >>(srcType == GL_UNSIGNED_INT && >> >> >> + _mesa_is_enum_format_integer(format) && >> >> >> (type == GL_INT || >> >> >> type == GL_SHORT || >> >> >> type == GL_BYTE))) { >> >> > >> >> > As far as I understand this code we are trying to see if we can use >> >> > memcpy to directly copy the contents of the framebuffer to the >> >> > destination buffer. In that case, as long as the src/dst types have >> >> > different sign we can't just use memcpy, right? In fact it looks like we >> >> > might need to expand the checks to include the cases where srcType is >> >> > GL_(UNSIGNED_)SHORT and GL_(UNSIGNED_)BYTE as well. >> >> > >> >> srcType returned by _mesa_get_format_datatype() is one of: >> >> GL_UNSIGNED_NORMALIZED >> >> GL_SIGNED_NORMALIZED >> >> GL_UNSIGNED_INT >> >> GL_INT >> >> GL_FLOAT >> >> So, the suggested checks for srcType are not required. >> > >> > Oh, right, although I think that does not invalidate my point: can we >> > memcpy from a GL_UNSIGNED_NORMALIZED to a format with type GL_FLOAT or >> > GL_SIGNED_NORMALIZED? It does not look like these checks here are >> > thorough. >> > >> Helper function _mesa_need_signed_unsigned_int_conversion() is >> meant to do the checks only for integer formats. May be add another >> function to do the missing checks for other formats? > > I have no concerns about the _mesa_need_signed_unsigned_int_conversion > function that you add in a later patch for your PBO work, my concern is > related to the fact that you are assuming that the checks that you need > in the PBO path are the same that we have in > _mesa_readpixels_needs_slow_path, so you make both the same when I think > they are trying to address different things. > > In your PBO code, you can't handle signed/unsigned integer conversions, > so you need to detect that and fall back to another path. That should be > fine I guess and the function _mesa_need_signed_unsigned_int_conversion > does what you need, so no problems there. > > However, in _mesa_readpixels_needs_slow_path I think we don't want to > just do integer checking. The purpose of the function is to tell whether > we can use memcpy to copy pixels from the framebuffer to the dst, and if > we have types with different signs, *whether they are integer or not*, > we can't, so limiting the check only to integer types does not look > right to me. The key aspect here is that what this function needs to > check is not specific to integer types, even if the current code only > seems to check things when the framebuffer has an integer format. > >> > In any case, that's beyond the point of your patch. Talking specifically >> > about your patch: can we memcpy, for example, from a _signed_ integer >> > format like MESA_FORMAT_R_SINT8 to an _unsigned_ format (integer or >> > not)? I don't think we can, in which case your patch would not look >> > correct to me. >> > >> Reading integer format to a non integer format is not allowed in >> glReadPixels. That's why those cases are not relevant here and >> we just check for integer formats. From ext_texture_integer: >> "INVALID_OPERATON is generated by ReadPixels if is >> an integer format and the color buffer is not an integer format, or >> if is not an integer format and the color buffer is an >> integer format." > > Right, that was not a good example, but forget about integer types, what > if the framebuffer is something like MESA_FORMAT_R8G8B8A8_UNORM and our > dst format/type is GL_RGBA/GL_BYTE? These are not integer types but we > can't memcpy anyway because the framebuffer is unsigned and the dst is > signed so a conversion is needed. > > Of course, the current code in this function only cares about the > framebuffer being an integer format, but for the reasons I explain > above, I think that is wrong in this case, I think
Re: [Mesa-dev] [PATCH] radeon: Advertise correct GL_SAMPLES_PASSED value.
Reviewed-by: Marek Olšák For Gallium, a new PIPE_CAP or new get_xxx_param function will be needed. Marek On Mon, Jun 22, 2015 at 8:41 PM, Ian Romanick wrote: > From: Ian Romanick > > Commit b765119c changed the default value of all the counter bits to > 64. However, older hardware only has 32 counter bits. > > This has only been build-tested. We don't have any tests that verify > the advertised value against implementation behavior, so I don't know > what additional testing could be done. > > NOTE: It appears that many Gallium drivers (at least r300 and i915g) > have the same problem, but I don't see a way for the state-tracker to > determine the counter size. > > Signed-off-by: Ian Romanick > Cc: Marek Olšák > Cc: Alex Deucher > --- > .../drivers/dri/radeon/radeon_common_context.c | 23 > ++ > 1 file changed, 23 insertions(+) > > diff --git a/src/mesa/drivers/dri/radeon/radeon_common_context.c > b/src/mesa/drivers/dri/radeon/radeon_common_context.c > index 9699dcb..3d0ceda 100644 > --- a/src/mesa/drivers/dri/radeon/radeon_common_context.c > +++ b/src/mesa/drivers/dri/radeon/radeon_common_context.c > @@ -194,6 +194,29 @@ GLboolean radeonInitContext(radeonContextPtr radeon, > > radeon_init_dma(radeon); > > +/* _mesa_initialize_context calls _mesa_init_queryobj which > + * initializes all of the counter sizes to 64. The counters on r100 > + * and r200 are only 32-bits for occlusion queries. Those are the > + * only counters, so set the other sizes to zero. > + */ > +radeon->glCtx.Const.QueryCounterBits.SamplesPassed = 32; > + > +radeon->glCtx.Const.QueryCounterBits.TimeElapsed = 0; > +radeon->glCtx.Const.QueryCounterBits.Timestamp = 0; > +radeon->glCtx.Const.QueryCounterBits.PrimitivesGenerated = 0; > +radeon->glCtx.Const.QueryCounterBits.PrimitivesWritten = 0; > +radeon->glCtx.Const.QueryCounterBits.VerticesSubmitted = 0; > +radeon->glCtx.Const.QueryCounterBits.PrimitivesSubmitted = 0; > +radeon->glCtx.Const.QueryCounterBits.VsInvocations = 0; > +radeon->glCtx.Const.QueryCounterBits.TessPatches = 0; > +radeon->glCtx.Const.QueryCounterBits.TessInvocations = 0; > +radeon->glCtx.Const.QueryCounterBits.GsInvocations = 0; > +radeon->glCtx.Const.QueryCounterBits.GsPrimitives = 0; > +radeon->glCtx.Const.QueryCounterBits.FsInvocations = 0; > +radeon->glCtx.Const.QueryCounterBits.ComputeInvocations = 0; > +radeon->glCtx.Const.QueryCounterBits.ClInPrimitives = 0; > +radeon->glCtx.Const.QueryCounterBits.ClOutPrimitives = 0; > + > return GL_TRUE; > } > > -- > 2.1.0 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-stable] [PATCH] egl/x11: Remove duplicate call to dri2_x11_add_configs_for_visuals
On Thu 18 Jun 2015, Emil Velikov wrote: > Hi Boyan, > > On 13 June 2015 at 08:33, Boyan Ding wrote: > > The call to dri2_x11_add_configs_for_visuals (previously > > dri2_add_configs_for_visuals) was moved downwards in commit f8c5b8a1, > > but appeared again in its original position after its rename in > > d019cd81. Remove it. > > > I believe you're bang on the spot here. The latter commit mentions > only about the renaming, so it seems that the hunk got back in as the > patch was rebased. Adding Chad to the Cc list, just in case we've > missed something :-) > > Fwiw the patch is > Reviewed-by: Emil Velikov Reviewed-by: Chad Versace ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC PATCH 8/8] nv50: enable GL_AMD_performance_monitor
This exposes a group of global performance counters that enables GL_AMD_performance_monitor. All piglit tests are okay. Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nv50/nv50_query.c | 35 ++ src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + src/gallium/drivers/nouveau/nv50/nv50_screen.h | 6 + 3 files changed, 42 insertions(+) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query.c b/src/gallium/drivers/nouveau/nv50/nv50_query.c index 062d427..6638e82 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_query.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_query.c @@ -1566,6 +1566,7 @@ nv50_screen_get_driver_query_info(struct pipe_screen *pscreen, info->name = cfg->event->name; info->query_type = NV50_HW_PM_QUERY(id); + info->group_id = NV50_HW_PM_QUERY_GROUP; info->max_value.u64 = (cfg->event->display == NV50_HW_PM_EVENT_DISPLAY_RATIO) ? 100 : 0; return 1; @@ -1576,6 +1577,40 @@ nv50_screen_get_driver_query_info(struct pipe_screen *pscreen, return 0; } +int +nv50_screen_get_driver_query_group_info(struct pipe_screen *pscreen, +unsigned id, +struct pipe_driver_query_group_info *info) +{ + struct nv50_screen *screen = nv50_screen(pscreen); + int count = 0; + + // TODO: Check DRM version when nvif will be merged in libdrm! + if (screen->base.perfmon) { + count++; /* NV50_HW_PM_QUERY_GROUP */ + } + + if (!info) + return count; + + if (id == NV50_HW_PM_QUERY_GROUP) { + if (screen->base.perfmon) { + info->name = "Global performance counters"; + info->type = PIPE_DRIVER_QUERY_GROUP_TYPE_GPU; + info->num_queries = NV50_HW_PM_QUERY_COUNT; + info->max_active_queries = 1; /* TODO: get rid of this limitation! */ + return 1; + } + } + + /* user asked for info about non-existing query group */ + info->name = "this_is_not_the_query_group_you_are_looking_for"; + info->max_active_queries = 0; + info->num_queries = 0; + info->type = 0; + return 0; +} + void nv50_init_query_functions(struct nv50_context *nv50) { diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c b/src/gallium/drivers/nouveau/nv50/nv50_screen.c index f07798e..dfe20c9 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c @@ -746,6 +746,7 @@ nv50_screen_create(struct nouveau_device *dev) pscreen->get_shader_param = nv50_screen_get_shader_param; pscreen->get_paramf = nv50_screen_get_paramf; pscreen->get_driver_query_info = nv50_screen_get_driver_query_info; + pscreen->get_driver_query_group_info = nv50_screen_get_driver_query_group_info; nv50_screen_init_resource_functions(pscreen); diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.h b/src/gallium/drivers/nouveau/nv50/nv50_screen.h index 69127c0..807ae0e 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.h +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.h @@ -114,6 +114,9 @@ nv50_screen(struct pipe_screen *screen) return (struct nv50_screen *)screen; } +/* Hardware global performance counters groups. */ +#define NV50_HW_PM_QUERY_GROUP 0 + /* Hardware global performance counters. */ #define NV50_HW_PM_QUERY_COUNT 24 #define NV50_HW_PM_QUERY(i)(PIPE_QUERY_DRIVER_SPECIFIC + (i)) @@ -146,6 +149,9 @@ nv50_screen(struct pipe_screen *screen) int nv50_screen_get_driver_query_info(struct pipe_screen *, unsigned, struct pipe_driver_query_info *); +int nv50_screen_get_driver_query_group_info(struct pipe_screen *, unsigned, +struct pipe_driver_query_group_info *); + boolean nv50_blitter_create(struct nv50_screen *); void nv50_blitter_destroy(struct nv50_screen *); -- 2.4.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC PATCH 5/8] nv50: prevent NULL pointer dereference with pipe_query functions
This may happen when nv50_query_create() fails to create a new query. Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nv50/nv50_query.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query.c b/src/gallium/drivers/nouveau/nv50/nv50_query.c index 55fcac8..1162110 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_query.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_query.c @@ -96,6 +96,9 @@ nv50_query_allocate(struct nv50_context *nv50, struct nv50_query *q, int size) static void nv50_query_destroy(struct pipe_context *pipe, struct pipe_query *pq) { + if (!pq) + return; + nv50_query_allocate(nv50_context(pipe), nv50_query(pq), 0); nouveau_fence_ref(NULL, &nv50_query(pq)->fence); FREE(nv50_query(pq)); @@ -152,6 +155,9 @@ nv50_query_begin(struct pipe_context *pipe, struct pipe_query *pq) struct nouveau_pushbuf *push = nv50->base.pushbuf; struct nv50_query *q = nv50_query(pq); + if (!pq) + return FALSE; + /* For occlusion queries we have to change the storage, because a previous * query might set the initial render conition to FALSE even *after* we re- * initialized it to TRUE. @@ -218,6 +224,9 @@ nv50_query_end(struct pipe_context *pipe, struct pipe_query *pq) struct nouveau_pushbuf *push = nv50->base.pushbuf; struct nv50_query *q = nv50_query(pq); + if (!pq) + return; + q->state = NV50_QUERY_STATE_ENDED; switch (q->type) { @@ -294,9 +303,12 @@ nv50_query_result(struct pipe_context *pipe, struct pipe_query *pq, uint64_t *res64 = (uint64_t *)result; uint32_t *res32 = (uint32_t *)result; boolean *res8 = (boolean *)result; - uint64_t *data64 = (uint64_t *)q->data; + uint64_t *data64; int i; + if (!pq) + return FALSE; + if (q->state != NV50_QUERY_STATE_READY) nv50_query_update(q); @@ -314,6 +326,7 @@ nv50_query_result(struct pipe_context *pipe, struct pipe_query *pq, } q->state = NV50_QUERY_STATE_READY; + data64 = (uint64_t *)q->data; switch (q->type) { case PIPE_QUERY_GPU_FINISHED: res8[0] = TRUE; -- 2.4.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC PATCH 2/8] nv50: allocate a software object class
This will allow to monitor global performance counters through the command stream of the GPU instead of using ioctls. Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nv50/nv50_screen.c | 11 +++ src/gallium/drivers/nouveau/nv50/nv50_screen.h | 1 + src/gallium/drivers/nouveau/nv50/nv50_winsys.h | 1 + 3 files changed, 13 insertions(+) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c b/src/gallium/drivers/nouveau/nv50/nv50_screen.c index 6583a35..c985344 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c @@ -367,6 +367,7 @@ nv50_screen_destroy(struct pipe_screen *pscreen) nouveau_object_del(&screen->eng2d); nouveau_object_del(&screen->m2mf); nouveau_object_del(&screen->sync); + nouveau_object_del(&screen->sw); nouveau_screen_fini(&screen->base); @@ -437,6 +438,9 @@ nv50_screen_init_hwctx(struct nv50_screen *screen) BEGIN_NV04(push, SUBC_3D(NV01_SUBCHAN_OBJECT), 1); PUSH_DATA (push, screen->tesla->handle); + BEGIN_NV04(push, SUBC_SW(NV01_SUBCHAN_OBJECT), 1); + PUSH_DATA (push, screen->sw->handle); + BEGIN_NV04(push, NV50_3D(COND_MODE), 1); PUSH_DATA (push, NV50_3D_COND_MODE_ALWAYS); @@ -768,6 +772,13 @@ nv50_screen_create(struct nouveau_device *dev) goto fail; } + ret = nouveau_object_new(chan, 0xbeef506e, 0x506e, +NULL, 0, &screen->sw); + if (ret) { + NOUVEAU_ERR("Failed to allocate SW object: %d\n", ret); + goto fail; + } + ret = nouveau_object_new(chan, 0xbeef5039, NV50_M2MF_CLASS, NULL, 0, &screen->m2mf); if (ret) { diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.h b/src/gallium/drivers/nouveau/nv50/nv50_screen.h index 881051b..69fdfdb 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.h +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.h @@ -93,6 +93,7 @@ struct nv50_screen { struct nouveau_object *tesla; struct nouveau_object *eng2d; struct nouveau_object *m2mf; + struct nouveau_object *sw; }; static INLINE struct nv50_screen * diff --git a/src/gallium/drivers/nouveau/nv50/nv50_winsys.h b/src/gallium/drivers/nouveau/nv50/nv50_winsys.h index e8578c8..5cb33ef 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_winsys.h +++ b/src/gallium/drivers/nouveau/nv50/nv50_winsys.h @@ -60,6 +60,7 @@ PUSH_REFN(struct nouveau_pushbuf *push, struct nouveau_bo *bo, uint32_t flags) #define SUBC_COMPUTE(m) 6, (m) #define NV50_COMPUTE(n) SUBC_COMPUTE(NV50_COMPUTE_##n) +#define SUBC_SW(m) 7, (m) static INLINE uint32_t NV50_FIFO_PKHDR(int subc, int mthd, unsigned size) -- 2.4.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC PATCH 3/8] nv50: allocate and map a notifier buffer object for PM
This notifier buffer object will be used to read back global performance counters results written by the kernel. For each domain, we will store the handle of the perfdom object, an array of 4 counters and the number of cycles. Like the Gallium's HUD, we keep a list of busy queries in a ring in order to prevent stalls when reading queries. Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nv50/nv50_screen.c | 29 ++ src/gallium/drivers/nouveau/nv50/nv50_screen.h | 6 ++ 2 files changed, 35 insertions(+) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c b/src/gallium/drivers/nouveau/nv50/nv50_screen.c index c985344..3a99cc8 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c @@ -368,6 +368,7 @@ nv50_screen_destroy(struct pipe_screen *pscreen) nouveau_object_del(&screen->m2mf); nouveau_object_del(&screen->sync); nouveau_object_del(&screen->sw); + nouveau_object_del(&screen->query); nouveau_screen_fini(&screen->base); @@ -699,9 +700,11 @@ nv50_screen_create(struct nouveau_device *dev) struct nv50_screen *screen; struct pipe_screen *pscreen; struct nouveau_object *chan; + struct nv04_fifo *fifo; uint64_t value; uint32_t tesla_class; unsigned stack_size; + uint32_t length; int ret; screen = CALLOC_STRUCT(nv50_screen); @@ -727,6 +730,7 @@ nv50_screen_create(struct nouveau_device *dev) screen->base.pushbuf->rsvd_kick = 5; chan = screen->base.channel; + fifo = chan->data; pscreen->destroy = nv50_screen_destroy; pscreen->context_create = nv50_create; @@ -772,6 +776,23 @@ nv50_screen_create(struct nouveau_device *dev) goto fail; } + /* Compute size (in bytes) of the notifier buffer object which is used +* in order to read back global performance counters results written +* by the kernel. For each domain, we store the handle of the perfdom +* object, an array of 4 counters and the number of cycles. Like for +* the Gallium's HUD, we keep a list of busy queries in a ring in order +* to prevent stalls when reading queries. */ + length = (1 + (NV50_HW_PM_RING_BUFFER_NUM_DOMAINS * 6) * + NV50_HW_PM_RING_BUFFER_MAX_QUERIES) * 4; + + ret = nouveau_object_new(chan, 0xbeef0302, NOUVEAU_NOTIFIER_CLASS, +&(struct nv04_notify){ .length = length }, +sizeof(struct nv04_notify), &screen->query); + if (ret) { + NOUVEAU_ERR("Failed to allocate notifier object for PM: %d\n", ret); + goto fail; + } + ret = nouveau_object_new(chan, 0xbeef506e, 0x506e, NULL, 0, &screen->sw); if (ret) { @@ -845,6 +866,14 @@ nv50_screen_create(struct nouveau_device *dev) nouveau_heap_init(&screen->gp_code_heap, 0, 1 << NV50_CODE_BO_SIZE_LOG2); nouveau_heap_init(&screen->fp_code_heap, 0, 1 << NV50_CODE_BO_SIZE_LOG2); + ret = nouveau_bo_wrap(screen->base.device, fifo->notify, &screen->notify_bo); + if (ret == 0) + nouveau_bo_map(screen->notify_bo, 0, screen->base.client); + if (ret) { + NOUVEAU_ERR("Failed to map notifier object for PM: %d\n", ret); + goto fail; + } + nouveau_getparam(dev, NOUVEAU_GETPARAM_GRAPH_UNITS, &value); screen->TPs = util_bitcount(value & 0x); diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.h b/src/gallium/drivers/nouveau/nv50/nv50_screen.h index 69fdfdb..71a5247 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.h +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.h @@ -59,6 +59,7 @@ struct nv50_screen { struct nouveau_bo *txc; /* TIC (offset 0) and TSC (65536) */ struct nouveau_bo *stack_bo; struct nouveau_bo *tls_bo; + struct nouveau_bo *notify_bo; unsigned TPs; unsigned MPsInTP; @@ -89,6 +90,7 @@ struct nv50_screen { } fence; struct nouveau_object *sync; + struct nouveau_object *query; struct nouveau_object *tesla; struct nouveau_object *eng2d; @@ -96,6 +98,10 @@ struct nv50_screen { struct nouveau_object *sw; }; +/* Parameters of the ring buffer used to read back global PM counters. */ +#define NV50_HW_PM_RING_BUFFER_NUM_DOMAINS 8 +#define NV50_HW_PM_RING_BUFFER_MAX_QUERIES 9 /* HUD_NUM_QUERIES + 1 */ + static INLINE struct nv50_screen * nv50_screen(struct pipe_screen *screen) { -- 2.4.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC PATCH 7/8] nv50: expose global performance counters to the HUD
Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nv50/nv50_query.c | 41 ++ src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 + src/gallium/drivers/nouveau/nv50/nv50_screen.h | 3 ++ 3 files changed, 45 insertions(+) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query.c b/src/gallium/drivers/nouveau/nv50/nv50_query.c index b9d2914..062d427 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_query.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_query.c @@ -1535,6 +1535,47 @@ nv50_hw_pm_query_result(struct nv50_context *nv50, struct nv50_query *q, return TRUE; } +int +nv50_screen_get_driver_query_info(struct pipe_screen *pscreen, + unsigned id, + struct pipe_driver_query_info *info) +{ + struct nv50_screen *screen = nv50_screen(pscreen); + int count = 0; + + // TODO: Check DRM version when nvif will be merged in libdrm! + if (screen->base.perfmon) { + nv50_identify_events(screen); + count += NV50_HW_PM_QUERY_COUNT; + } + + if (!info) + return count; + + /* Init default values. */ + info->name = "this_is_not_the_query_you_are_looking_for"; + info->query_type = 0xdeadd01d; + info->type = PIPE_DRIVER_QUERY_TYPE_UINT64; + info->max_value.u64 = 0; + info->group_id = -1; + + if (id < count) { + if (screen->base.perfmon) { + const struct nv50_hw_pm_query_cfg *cfg = +nv50_hw_pm_query_get_cfg(screen, NV50_HW_PM_QUERY(id)); + + info->name = cfg->event->name; + info->query_type = NV50_HW_PM_QUERY(id); + info->max_value.u64 = +(cfg->event->display == NV50_HW_PM_EVENT_DISPLAY_RATIO) ? 100 : 0; + return 1; + } + } + + /* User asked for info about non-existing query. */ + return 0; +} + void nv50_init_query_functions(struct nv50_context *nv50) { diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c b/src/gallium/drivers/nouveau/nv50/nv50_screen.c index 53817c0..f07798e 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c @@ -745,6 +745,7 @@ nv50_screen_create(struct nouveau_device *dev) pscreen->get_param = nv50_screen_get_param; pscreen->get_shader_param = nv50_screen_get_shader_param; pscreen->get_paramf = nv50_screen_get_paramf; + pscreen->get_driver_query_info = nv50_screen_get_driver_query_info; nv50_screen_init_resource_functions(pscreen); diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.h b/src/gallium/drivers/nouveau/nv50/nv50_screen.h index 0449659..69127c0 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.h +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.h @@ -143,6 +143,9 @@ nv50_screen(struct pipe_screen *screen) #define NV50_HW_PM_QUERY_TEX_CACHE_HIT 22 #define NV50_HW_PM_QUERY_TEX_WAITS_FOR_FB 23 +int nv50_screen_get_driver_query_info(struct pipe_screen *, unsigned, + struct pipe_driver_query_info *); + boolean nv50_blitter_create(struct nv50_screen *); void nv50_blitter_destroy(struct nv50_screen *); -- 2.4.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC PATCH 6/8] nv50: add support for compute/graphics global performance counters
This commit adds support for both compute and graphics global performance counters which have been reverse engineered with CUPTI (Linux) and PerfKit (Windows). Currently, only one query type can be monitored at the same time because the Gallium's HUD doesn't fit pretty well. This will be improved later. Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nv50/nv50_query.c | 1057 +++- src/gallium/drivers/nouveau/nv50/nv50_screen.h | 35 + 2 files changed, 1087 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query.c b/src/gallium/drivers/nouveau/nv50/nv50_query.c index 1162110..b9d2914 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_query.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_query.c @@ -27,6 +27,8 @@ #include "nv50/nv50_context.h" #include "nv_object.xml.h" +#include "nouveau_perfmon.h" + #define NV50_QUERY_STATE_READY 0 #define NV50_QUERY_STATE_ACTIVE 1 #define NV50_QUERY_STATE_ENDED 2 @@ -51,10 +53,25 @@ struct nv50_query { boolean is64bit; struct nouveau_mm_allocation *mm; struct nouveau_fence *fence; + struct nouveau_object *perfdom; }; #define NV50_QUERY_ALLOC_SPACE 256 +#ifdef DEBUG +static void nv50_hw_pm_dump_perfdom(struct nvif_perfdom_v0 *args); +#endif + +static boolean +nv50_hw_pm_query_create(struct nv50_context *, struct nv50_query *); +static void +nv50_hw_pm_query_destroy(struct nv50_context *, struct nv50_query *); +static boolean +nv50_hw_pm_query_begin(struct nv50_context *, struct nv50_query *); +static void nv50_hw_pm_query_end(struct nv50_context *, struct nv50_query *); +static boolean nv50_hw_pm_query_result(struct nv50_context *, +struct nv50_query *, boolean, void *); + static INLINE struct nv50_query * nv50_query(struct pipe_query *pipe) { @@ -96,12 +113,18 @@ nv50_query_allocate(struct nv50_context *nv50, struct nv50_query *q, int size) static void nv50_query_destroy(struct pipe_context *pipe, struct pipe_query *pq) { + struct nv50_context *nv50 = nv50_context(pipe); + struct nv50_query *q = nv50_query(pq); + if (!pq) return; - nv50_query_allocate(nv50_context(pipe), nv50_query(pq), 0); - nouveau_fence_ref(NULL, &nv50_query(pq)->fence); - FREE(nv50_query(pq)); + if ((q->type >= NV50_HW_PM_QUERY(0) && q->type <= NV50_HW_PM_QUERY_LAST)) + nv50_hw_pm_query_destroy(nv50, q); + + nv50_query_allocate(nv50, q, 0); + nouveau_fence_ref(NULL, &q->fence); + FREE(q); } static struct pipe_query * @@ -130,6 +153,11 @@ nv50_query_create(struct pipe_context *pipe, unsigned type, unsigned index) q->data -= 32 / sizeof(*q->data); /* we advance before query_begin ! */ } + if ((q->type >= NV50_HW_PM_QUERY(0) && q->type <= NV50_HW_PM_QUERY_LAST)) { + if (!nv50_hw_pm_query_create(nv50, q)) + return NULL; + } + return (struct pipe_query *)q; } @@ -154,6 +182,7 @@ nv50_query_begin(struct pipe_context *pipe, struct pipe_query *pq) struct nv50_context *nv50 = nv50_context(pipe); struct nouveau_pushbuf *push = nv50->base.pushbuf; struct nv50_query *q = nv50_query(pq); + boolean ret = TRUE; if (!pq) return FALSE; @@ -211,10 +240,13 @@ nv50_query_begin(struct pipe_context *pipe, struct pipe_query *pq) nv50_query_get(push, q, 0x10, 0x5002); break; default: + if ((q->type >= NV50_HW_PM_QUERY(0) && q->type <= NV50_HW_PM_QUERY_LAST)) { + ret = nv50_hw_pm_query_begin(nv50, q); + } break; } q->state = NV50_QUERY_STATE_ACTIVE; - return true; + return ret; } static void @@ -274,7 +306,9 @@ nv50_query_end(struct pipe_context *pipe, struct pipe_query *pq) q->state = NV50_QUERY_STATE_READY; break; default: - assert(0); + if ((q->type >= NV50_HW_PM_QUERY(0) && q->type <= NV50_HW_PM_QUERY_LAST)) { + nv50_hw_pm_query_end(nv50, q); + } break; } @@ -309,6 +343,10 @@ nv50_query_result(struct pipe_context *pipe, struct pipe_query *pq, if (!pq) return FALSE; + if ((q->type >= NV50_HW_PM_QUERY(0) && q->type <= NV50_HW_PM_QUERY_LAST)) { + return nv50_hw_pm_query_result(nv50, q, wait, result); + } + if (q->state != NV50_QUERY_STATE_READY) nv50_query_update(q); @@ -488,6 +526,1015 @@ nva0_so_target_save_offset(struct pipe_context *pipe, nv50_query_end(pipe, targ->pq); } +/* === HARDWARE GLOBAL PERFORMANCE COUNTERS for NV50 === */ + +struct nv50_hw_pm_source_cfg +{ + const char *name; + uint64_t value; +}; + +struct nv50_hw_pm_signal_cfg +{ + const char *name; + const struct nv50_hw_pm_source_cfg src[8]; +}; + +struct nv50_hw_pm_counter_cfg +{ + uint16_t logic_op; + const struct nv50_hw_pm_signal_cfg sig[4]; +}; + +enum nv50_hw_pm_query_display +{ + NV50_HW_PM_EVENT_DISPLAY_RAW, + NV50_HW_PM_EVENT_DISPLAY_RATIO, +}; + +enum nv50_hw_pm_query_count +{ + NV50_HW_PM_EVENT_COUNT_SIMPLE, + NV50_H
[Mesa-dev] [RFC PATCH 0/8] nv50: expose global performance counters
Hello there, This series exposes NVIDIA's global performance counters for Tesla through the Gallium's HUD and the GL_AMD_performance_monitor extension. This adds support for 24 hardware events which have been reverse engineered with PerfKit (Windows) and CUPTI (Linux). These hardware events will allow developers to profile OpenGL applications. To reduce latency and to improve accuracy, these global performance counters are tied to the command stream of the GPU using a set of software methods instead of ioctls. Results are then written by the kernel to a mapped notifier buffer object that allows the userspace to read back them. However, the libdrm branch which implements the new nvif interface exposed by Nouveau and the software methods interface are not upstream yet. I hope this should done in the next days. The code of this series can be found here: http://cgit.freedesktop.org/~hakzsam/mesa/log/?h=nouveau_perfmon The libdrm branch can be found here: http://cgit.freedesktop.org/~hakzsam/drm/log/?h=nouveau_perfmon The code of the software methods interface can be found here (two last commits): http://cgit.freedesktop.org/~hakzsam/nouveau/log/?h=nouveau_perfmon An other series which exposes global performance counters for Fermi and Kepler will be submitted once I have got enough reviews for this one. Feel free to make a review. Thanks, Samuel. Samuel Pitoiset (8): nouveau: implement the nvif hardware performance counters interface nv50: allocate a software object class nv50: allocate and map a notifier buffer object for PM nv50: configure the ring buffer for reading back PM counters nv50: prevent NULL pointer dereference with pipe_query functions nv50: add support for compute/graphics global performance counters nv50: expose global performance counters to the HUD nv50: enable GL_AMD_performance_monitor src/gallium/drivers/nouveau/Makefile.sources |2 + src/gallium/drivers/nouveau/nouveau_perfmon.c | 302 +++ src/gallium/drivers/nouveau/nouveau_perfmon.h | 59 ++ src/gallium/drivers/nouveau/nouveau_screen.c |5 + src/gallium/drivers/nouveau/nouveau_screen.h |1 + src/gallium/drivers/nouveau/nv50/nv50_query.c | 1148 +++- src/gallium/drivers/nouveau/nv50/nv50_screen.c | 49 + src/gallium/drivers/nouveau/nv50/nv50_screen.h | 51 ++ src/gallium/drivers/nouveau/nv50/nv50_winsys.h |1 + 9 files changed, 1612 insertions(+), 6 deletions(-) create mode 100644 src/gallium/drivers/nouveau/nouveau_perfmon.c create mode 100644 src/gallium/drivers/nouveau/nouveau_perfmon.h -- 2.4.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC PATCH 4/8] nv50: configure the ring buffer for reading back PM counters
To write data at the right offset, the kernel has to know some parameters of this ring buffer, like the number of domains and the maximum number of queries. Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nv50/nv50_screen.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_screen.c b/src/gallium/drivers/nouveau/nv50/nv50_screen.c index 3a99cc8..53817c0 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_screen.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_screen.c @@ -441,6 +441,13 @@ nv50_screen_init_hwctx(struct nv50_screen *screen) BEGIN_NV04(push, SUBC_SW(NV01_SUBCHAN_OBJECT), 1); PUSH_DATA (push, screen->sw->handle); + BEGIN_NV04(push, SUBC_SW(0x0190), 1); + PUSH_DATA (push, screen->query->handle); + // XXX: Maybe add a check for DRM version here ? + BEGIN_NV04(push, SUBC_SW(0x0600), 1); + PUSH_DATA (push, NV50_HW_PM_RING_BUFFER_MAX_QUERIES); + BEGIN_NV04(push, SUBC_SW(0x0604), 1); + PUSH_DATA (push, NV50_HW_PM_RING_BUFFER_NUM_DOMAINS); BEGIN_NV04(push, NV50_3D(COND_MODE), 1); PUSH_DATA (push, NV50_3D_COND_MODE_ALWAYS); -- 2.4.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC PATCH 1/8] nouveau: implement the nvif hardware performance counters interface
This commit implements the base interface for hardware performance counters that will be shared between nv50 and nvc0 drivers. TODO: Bump libdrm version of mesa when nvif will be merged. Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/Makefile.sources | 2 + src/gallium/drivers/nouveau/nouveau_perfmon.c | 302 ++ src/gallium/drivers/nouveau/nouveau_perfmon.h | 59 + src/gallium/drivers/nouveau/nouveau_screen.c | 5 + src/gallium/drivers/nouveau/nouveau_screen.h | 1 + 5 files changed, 369 insertions(+) create mode 100644 src/gallium/drivers/nouveau/nouveau_perfmon.c create mode 100644 src/gallium/drivers/nouveau/nouveau_perfmon.h diff --git a/src/gallium/drivers/nouveau/Makefile.sources b/src/gallium/drivers/nouveau/Makefile.sources index 3fae3bc..3da0bdc 100644 --- a/src/gallium/drivers/nouveau/Makefile.sources +++ b/src/gallium/drivers/nouveau/Makefile.sources @@ -10,6 +10,8 @@ C_SOURCES := \ nouveau_heap.h \ nouveau_mm.c \ nouveau_mm.h \ + nouveau_perfmon.c \ + nouveau_perfmon.h \ nouveau_screen.c \ nouveau_screen.h \ nouveau_statebuf.h \ diff --git a/src/gallium/drivers/nouveau/nouveau_perfmon.c b/src/gallium/drivers/nouveau/nouveau_perfmon.c new file mode 100644 index 000..3798612 --- /dev/null +++ b/src/gallium/drivers/nouveau/nouveau_perfmon.c @@ -0,0 +1,302 @@ +/* + * Copyright 2015 Samuel Pitoiset + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + */ + +#include + +#include "util/u_memory.h" + +#include "nouveau_debug.h" +#include "nouveau_winsys.h" +#include "nouveau_perfmon.h" + +static int +nouveau_perfmon_query_sources(struct nouveau_perfmon *pm, + struct nouveau_perfmon_dom *dom, + struct nouveau_perfmon_sig *sig) +{ + struct nvif_perfmon_query_source_v0 args = {}; + + args.domain = dom->id; + args.signal = sig->signal; + do { + uint8_t prev_iter = args.iter; + struct nouveau_perfmon_src *src; + int ret; + + ret = nouveau_object_mthd(pm->object, NVIF_PERFMON_V0_QUERY_SOURCE, + &args, sizeof(args)); + if (ret) + return ret; + + if (prev_iter) { + args.iter = prev_iter; + ret = nouveau_object_mthd(pm->object, NVIF_PERFMON_V0_QUERY_SOURCE, + &args, sizeof(args)); + if (ret) + return ret; + + src = CALLOC_STRUCT(nouveau_perfmon_src); + if (!src) + return -ENOMEM; + +#if 0 + debug_printf("id = %d\n", args.source); + debug_printf("name = %s\n", args.name); + debug_printf("mask = %08x\n", args.mask); + debug_printf("\n"); +#endif + + src->id = args.source; + strncpy(src->name, args.name, sizeof(src->name)); + list_addtail(&src->head, &sig->sources); + } + } while (args.iter != 0xff); + + return 0; +} + +static int +nouveau_perfmon_query_signals(struct nouveau_perfmon *pm, + struct nouveau_perfmon_dom *dom) +{ + struct nvif_perfmon_query_signal_v0 args = {}; + + args.domain = dom->id; + do { + uint16_t prev_iter = args.iter; + struct nouveau_perfmon_sig *sig; + int ret; + + ret = nouveau_object_mthd(pm->object, NVIF_PERFMON_V0_QUERY_SIGNAL, +&args, sizeof(args)); + if (ret) + return ret; + + if (prev_iter) { + args.iter = prev_iter; + ret = nouveau_object_mthd(pm->object, NVIF_PERFMON_V0_QUERY_SIGNAL, +
Re: [Mesa-dev] [Nouveau] [RFC PATCH 5/8] nv50: prevent NULL pointer dereference with pipe_query functions
If query_create fails, why would any of these functions get called? On Mon, Jun 22, 2015 at 4:53 PM, Samuel Pitoiset wrote: > This may happen when nv50_query_create() fails to create a new query. > > Signed-off-by: Samuel Pitoiset > --- > src/gallium/drivers/nouveau/nv50/nv50_query.c | 15 ++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query.c > b/src/gallium/drivers/nouveau/nv50/nv50_query.c > index 55fcac8..1162110 100644 > --- a/src/gallium/drivers/nouveau/nv50/nv50_query.c > +++ b/src/gallium/drivers/nouveau/nv50/nv50_query.c > @@ -96,6 +96,9 @@ nv50_query_allocate(struct nv50_context *nv50, struct > nv50_query *q, int size) > static void > nv50_query_destroy(struct pipe_context *pipe, struct pipe_query *pq) > { > + if (!pq) > + return; > + > nv50_query_allocate(nv50_context(pipe), nv50_query(pq), 0); > nouveau_fence_ref(NULL, &nv50_query(pq)->fence); > FREE(nv50_query(pq)); > @@ -152,6 +155,9 @@ nv50_query_begin(struct pipe_context *pipe, struct > pipe_query *pq) > struct nouveau_pushbuf *push = nv50->base.pushbuf; > struct nv50_query *q = nv50_query(pq); > > + if (!pq) > + return FALSE; > + > /* For occlusion queries we have to change the storage, because a previous > * query might set the initial render conition to FALSE even *after* we > re- > * initialized it to TRUE. > @@ -218,6 +224,9 @@ nv50_query_end(struct pipe_context *pipe, struct > pipe_query *pq) > struct nouveau_pushbuf *push = nv50->base.pushbuf; > struct nv50_query *q = nv50_query(pq); > > + if (!pq) > + return; > + > q->state = NV50_QUERY_STATE_ENDED; > > switch (q->type) { > @@ -294,9 +303,12 @@ nv50_query_result(struct pipe_context *pipe, struct > pipe_query *pq, > uint64_t *res64 = (uint64_t *)result; > uint32_t *res32 = (uint32_t *)result; > boolean *res8 = (boolean *)result; > - uint64_t *data64 = (uint64_t *)q->data; > + uint64_t *data64; > int i; > > + if (!pq) > + return FALSE; > + > if (q->state != NV50_QUERY_STATE_READY) >nv50_query_update(q); > > @@ -314,6 +326,7 @@ nv50_query_result(struct pipe_context *pipe, struct > pipe_query *pq, > } > q->state = NV50_QUERY_STATE_READY; > > + data64 = (uint64_t *)q->data; > switch (q->type) { > case PIPE_QUERY_GPU_FINISHED: >res8[0] = TRUE; > -- > 2.4.4 > > ___ > Nouveau mailing list > nouv...@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/nouveau ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Nouveau] [RFC PATCH 5/8] nv50: prevent NULL pointer dereference with pipe_query functions
On 06/22/2015 10:52 PM, Ilia Mirkin wrote: If query_create fails, why would any of these functions get called? Because the HUD doesn't check if query_create() fails and it calls other pipe_query functions with NULL pointer instead of a valid query object. On Mon, Jun 22, 2015 at 4:53 PM, Samuel Pitoiset wrote: This may happen when nv50_query_create() fails to create a new query. Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nv50/nv50_query.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_query.c b/src/gallium/drivers/nouveau/nv50/nv50_query.c index 55fcac8..1162110 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_query.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_query.c @@ -96,6 +96,9 @@ nv50_query_allocate(struct nv50_context *nv50, struct nv50_query *q, int size) static void nv50_query_destroy(struct pipe_context *pipe, struct pipe_query *pq) { + if (!pq) + return; + nv50_query_allocate(nv50_context(pipe), nv50_query(pq), 0); nouveau_fence_ref(NULL, &nv50_query(pq)->fence); FREE(nv50_query(pq)); @@ -152,6 +155,9 @@ nv50_query_begin(struct pipe_context *pipe, struct pipe_query *pq) struct nouveau_pushbuf *push = nv50->base.pushbuf; struct nv50_query *q = nv50_query(pq); + if (!pq) + return FALSE; + /* For occlusion queries we have to change the storage, because a previous * query might set the initial render conition to FALSE even *after* we re- * initialized it to TRUE. @@ -218,6 +224,9 @@ nv50_query_end(struct pipe_context *pipe, struct pipe_query *pq) struct nouveau_pushbuf *push = nv50->base.pushbuf; struct nv50_query *q = nv50_query(pq); + if (!pq) + return; + q->state = NV50_QUERY_STATE_ENDED; switch (q->type) { @@ -294,9 +303,12 @@ nv50_query_result(struct pipe_context *pipe, struct pipe_query *pq, uint64_t *res64 = (uint64_t *)result; uint32_t *res32 = (uint32_t *)result; boolean *res8 = (boolean *)result; - uint64_t *data64 = (uint64_t *)q->data; + uint64_t *data64; int i; + if (!pq) + return FALSE; + if (q->state != NV50_QUERY_STATE_READY) nv50_query_update(q); @@ -314,6 +326,7 @@ nv50_query_result(struct pipe_context *pipe, struct pipe_query *pq, } q->state = NV50_QUERY_STATE_READY; + data64 = (uint64_t *)q->data; switch (q->type) { case PIPE_QUERY_GPU_FINISHED: res8[0] = TRUE; -- 2.4.4 ___ Nouveau mailing list nouv...@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Don't count NIR instructions for shader-db.
Matt, Jason, and I haven't found this useful in a long time. Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_nir.c | 31 --- 1 file changed, 31 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_nir.c b/src/mesa/drivers/dri/i965/brw_nir.c index c13708a..dffb8ab 100644 --- a/src/mesa/drivers/dri/i965/brw_nir.c +++ b/src/mesa/drivers/dri/i965/brw_nir.c @@ -57,28 +57,6 @@ nir_optimize(nir_shader *nir) } while (progress); } -static bool -count_nir_instrs_in_block(nir_block *block, void *state) -{ - int *count = (int *) state; - nir_foreach_instr(block, instr) { - *count = *count + 1; - } - return true; -} - -static int -count_nir_instrs(nir_shader *nir) -{ - int count = 0; - nir_foreach_overload(nir, overload) { - if (!overload->impl) - continue; - nir_foreach_block(overload->impl, count_nir_instrs_in_block, &count); - } - return count; -} - nir_shader * brw_create_nir(struct brw_context *brw, const struct gl_shader_program *shader_prog, @@ -178,15 +156,6 @@ brw_create_nir(struct brw_context *brw, nir_print_shader(nir, stderr); } - static GLuint msg_id = 0; - _mesa_gl_debug(&brw->ctx, &msg_id, - MESA_DEBUG_SOURCE_SHADER_COMPILER, - MESA_DEBUG_TYPE_OTHER, - MESA_DEBUG_SEVERITY_NOTIFICATION, - "%s NIR shader: %d inst\n", - _mesa_shader_stage_to_abbrev(stage), - count_nir_instrs(nir)); - nir_convert_from_ssa(nir); nir_validate_shader(nir); -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Don't count NIR instructions for shader-db.
Reviewed-by: Matt Turner ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 26/82] glsl: Don't do copy propagation on buffer variables
24-26 once again makes me wonder if these optimization *can* be used with SSBOs based on the same ext spec wording I referenced before: "The ability to write to buffer objects creates the potential for multiple independent shader invocations to read and write the same underlying memory. The same issue exists with the ARB_shader_image_load_store extension provided in OpenGL 4.2, which can write to texture objects and buffers. In both cases, the specification makes few guarantees related to the relative order of memory reads and writes performed by the shader invocations." In these patches "other threads" were specifically mentioned. Did these patches also prevent bad things from happening in generated code? (Like mentioned for patch 23.) -Jordan On 2015-06-03 00:01:16, Iago Toral Quiroga wrote: > Since the backing storage for these is shared we cannot ensure that the > value won't change by writes from other threads. > --- > src/glsl/opt_copy_propagation.cpp | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/glsl/opt_copy_propagation.cpp > b/src/glsl/opt_copy_propagation.cpp > index 806027b..f206995 100644 > --- a/src/glsl/opt_copy_propagation.cpp > +++ b/src/glsl/opt_copy_propagation.cpp > @@ -330,7 +330,7 @@ ir_copy_propagation_visitor::add_copy(ir_assignment *ir) > */ > ir->condition = new(ralloc_parent(ir)) ir_constant(false); > this->progress = true; > - } else { > + } else if (lhs_var->data.mode != ir_var_shader_storage) { > entry = new(this->acp) acp_entry(lhs_var, rhs_var); > this->acp->push_tail(entry); >} > -- > 1.9.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] draw/gallivm: add invocation ID support for llvmpipe.
For the series: Reviewed-by: Roland Scheidegger Am 22.06.2015 um 06:01 schrieb Dave Airlie: > From: Dave Airlie > > This extends the draw code to add support for invocations. > > Signed-off-by: Dave Airlie > --- > src/gallium/auxiliary/draw/draw_gs.c| 3 ++- > src/gallium/auxiliary/draw/draw_llvm.c | 5 - > src/gallium/auxiliary/draw/draw_llvm.h | 3 ++- > src/gallium/auxiliary/gallivm/lp_bld_tgsi.h | 1 + > src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 5 + > 5 files changed, 14 insertions(+), 3 deletions(-) > > diff --git a/src/gallium/auxiliary/draw/draw_gs.c > b/src/gallium/auxiliary/draw/draw_gs.c > index 755e527..a1564f9 100644 > --- a/src/gallium/auxiliary/draw/draw_gs.c > +++ b/src/gallium/auxiliary/draw/draw_gs.c > @@ -391,7 +391,8 @@ llvm_gs_run(struct draw_geometry_shader *shader, >(struct vertex_header*)input, >input_primitives, >shader->draw->instance_id, > - shader->llvm_prim_ids); > + shader->llvm_prim_ids, > + shader->invocation_id); > > return ret; > } > diff --git a/src/gallium/auxiliary/draw/draw_llvm.c > b/src/gallium/auxiliary/draw/draw_llvm.c > index 9629a8a..90a31bc 100644 > --- a/src/gallium/auxiliary/draw/draw_llvm.c > +++ b/src/gallium/auxiliary/draw/draw_llvm.c > @@ -2069,7 +2069,7 @@ draw_gs_llvm_generate(struct draw_llvm *llvm, > struct gallivm_state *gallivm = variant->gallivm; > LLVMContextRef context = gallivm->context; > LLVMTypeRef int32_type = LLVMInt32TypeInContext(context); > - LLVMTypeRef arg_types[6]; > + LLVMTypeRef arg_types[7]; > LLVMTypeRef func_type; > LLVMValueRef variant_func; > LLVMValueRef context_ptr; > @@ -2105,6 +2105,7 @@ draw_gs_llvm_generate(struct draw_llvm *llvm, > arg_types[4] = int32_type; /* instance_id */ > arg_types[5] = LLVMPointerType( >LLVMVectorType(int32_type, vector_length), 0); /* prim_id_ptr */ > + arg_types[6] = int32_type; > > func_type = LLVMFunctionType(int32_type, arg_types, Elements(arg_types), > 0); > > @@ -2125,6 +2126,7 @@ draw_gs_llvm_generate(struct draw_llvm *llvm, > num_prims = LLVMGetParam(variant_func, 3); > system_values.instance_id = LLVMGetParam(variant_func, 4); > prim_id_ptr = LLVMGetParam(variant_func, 5); > + system_values.invocation_id = LLVMGetParam(variant_func, 6); > > lp_build_name(context_ptr, "context"); > lp_build_name(input_array, "input"); > @@ -2132,6 +2134,7 @@ draw_gs_llvm_generate(struct draw_llvm *llvm, > lp_build_name(num_prims, "num_prims"); > lp_build_name(system_values.instance_id, "instance_id"); > lp_build_name(prim_id_ptr, "prim_id_ptr"); > + lp_build_name(system_values.invocation_id, "invocation_id"); > > variant->context_ptr = context_ptr; > variant->io_ptr = io_ptr; > diff --git a/src/gallium/auxiliary/draw/draw_llvm.h > b/src/gallium/auxiliary/draw/draw_llvm.h > index 9565fc6..d48ed72 100644 > --- a/src/gallium/auxiliary/draw/draw_llvm.h > +++ b/src/gallium/auxiliary/draw/draw_llvm.h > @@ -298,7 +298,8 @@ typedef int > struct vertex_header *output, > unsigned num_prims, > unsigned instance_id, > -int *prim_ids); > +int *prim_ids, > +unsigned invocation_id); > > struct draw_llvm_variant_key > { > diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h > b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h > index 3f76b79..967373c 100644 > --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h > +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h > @@ -165,6 +165,7 @@ struct lp_bld_tgsi_system_values { > LLVMValueRef vertex_id_nobase; > LLVMValueRef prim_id; > LLVMValueRef basevertex; > + LLVMValueRef invocation_id; > }; > > > diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c > b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c > index 092bd18..268379e 100644 > --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c > +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c > @@ -1532,6 +1532,11 @@ emit_fetch_system_value( >atype = TGSI_TYPE_UNSIGNED; >break; > > + case TGSI_SEMANTIC_INVOCATIONID: > + res = lp_build_broadcast_scalar(&bld_base->uint_bld, > bld->system_values.invocation_id); > + atype = TGSI_TYPE_UNSIGNED; > + break; > + > default: >assert(!"unexpected semantic in emit_fetch_system_value"); >res = bld_base->base.zero; > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 27/82] mesa: Add new IR node ir_ssbo_store
Reviewed-by: Jordan Justen On 2015-06-03 00:01:17, Iago Toral Quiroga wrote: > Shader storage buffer objects (SSBO) require special handling: when we > detect writes to any channel of a shader buffer variable we need to > emit the corresponding write to memory. We will later add a lowering pass > that detects these writes and injects ir_ssbo_store nodes in the IR so > drivers can generate code for the memory writes. > --- > src/glsl/ir.h | 38 > ++ > src/glsl/ir_hierarchical_visitor.cpp | 18 > src/glsl/ir_hierarchical_visitor.h | 2 ++ > src/glsl/ir_hv_accept.cpp | 23 > src/glsl/ir_print_visitor.cpp | 12 > src/glsl/ir_print_visitor.h| 1 + > src/glsl/ir_rvalue_visitor.cpp | 21 ++ > src/glsl/ir_rvalue_visitor.h | 3 ++ > src/glsl/ir_visitor.h | 2 ++ > src/glsl/nir/glsl_to_nir.cpp | 7 + > src/mesa/drivers/dri/i965/brw_vec4.h | 1 + > src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 6 > src/mesa/program/ir_to_mesa.cpp| 7 + > src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 7 + > 14 files changed, 148 insertions(+) > > diff --git a/src/glsl/ir.h b/src/glsl/ir.h > index 1118732..2a0b28c 100644 > --- a/src/glsl/ir.h > +++ b/src/glsl/ir.h > @@ -78,6 +78,7 @@ enum ir_node_type { > ir_type_discard, > ir_type_emit_vertex, > ir_type_end_primitive, > + ir_type_ssbo_store, > ir_type_max, /**< maximum ir_type enum number, for validation */ > ir_type_unset = ir_type_max > }; > @@ -2407,6 +2408,43 @@ public: > ir_rvalue *stream; > }; > > +/** > + * IR instruction to write to a shader storage buffer object (SSBO) > + */ > +class ir_ssbo_store : public ir_instruction { > +public: > + ir_ssbo_store(ir_rvalue *block, ir_rvalue *offset, ir_rvalue *val, > + unsigned write_mask) > + : ir_instruction(ir_type_ssbo_store), > +block(block), offset(offset), val(val), write_mask(write_mask) > + { > + assert(block); > + assert(offset); > + assert(val); > + assert(write_mask != 0); > + } > + > + virtual void accept(ir_visitor *v) > + { > + v->visit(this); > + } > + > + virtual ir_ssbo_store *clone(void *mem_ctx, struct hash_table *ht) const > + { > + return new(mem_ctx) ir_ssbo_store(this->block->clone(mem_ctx, ht), > +this->offset->clone(mem_ctx, ht), > +this->val->clone(mem_ctx, ht), > +this->write_mask); > + } > + > + virtual ir_visitor_status accept(ir_hierarchical_visitor *); > + > + ir_rvalue *block; > + ir_rvalue *offset; > + ir_rvalue *val; > + unsigned write_mask; > +}; > + > /*@}*/ > > /** > diff --git a/src/glsl/ir_hierarchical_visitor.cpp > b/src/glsl/ir_hierarchical_visitor.cpp > index adb6294..1aa5cc0 100644 > --- a/src/glsl/ir_hierarchical_visitor.cpp > +++ b/src/glsl/ir_hierarchical_visitor.cpp > @@ -349,6 +349,24 @@ ir_hierarchical_visitor::visit_leave(ir_end_primitive > *ir) > return visit_continue; > } > > +ir_visitor_status > +ir_hierarchical_visitor::visit_enter(ir_ssbo_store *ir) > +{ > + if (this->callback_enter != NULL) > + this->callback_enter(ir, this->data_enter); > + > + return visit_continue; > +} > + > +ir_visitor_status > +ir_hierarchical_visitor::visit_leave(ir_ssbo_store *ir) > +{ > + if (this->callback_leave != NULL) > + this->callback_leave(ir, this->data_leave); > + > + return visit_continue; > +} > + > void > ir_hierarchical_visitor::run(exec_list *instructions) > { > diff --git a/src/glsl/ir_hierarchical_visitor.h > b/src/glsl/ir_hierarchical_visitor.h > index faa52fd..49dc37e 100644 > --- a/src/glsl/ir_hierarchical_visitor.h > +++ b/src/glsl/ir_hierarchical_visitor.h > @@ -139,6 +139,8 @@ public: > virtual ir_visitor_status visit_leave(class ir_emit_vertex *); > virtual ir_visitor_status visit_enter(class ir_end_primitive *); > virtual ir_visitor_status visit_leave(class ir_end_primitive *); > + virtual ir_visitor_status visit_enter(class ir_ssbo_store *); > + virtual ir_visitor_status visit_leave(class ir_ssbo_store *); > /*@}*/ > > > diff --git a/src/glsl/ir_hv_accept.cpp b/src/glsl/ir_hv_accept.cpp > index be5b3ea..500ce4b 100644 > --- a/src/glsl/ir_hv_accept.cpp > +++ b/src/glsl/ir_hv_accept.cpp > @@ -429,3 +429,26 @@ ir_end_primitive::accept(ir_hierarchical_visitor *v) > > return (s == visit_stop) ? s : v->visit_leave(this); > } > + > + > +ir_visitor_status > +ir_ssbo_store::accept(ir_hierarchical_visitor *v) > +{ > + ir_visitor_status s = v->visit_enter(this); > + if (s != visit_continue) > + return (s == visit_continue_with_parent) ? visit_continue : s; > + >
Re: [Mesa-dev] [PATCH 2/5] i965/gen9: Plugin the code for selecting YF/YS tiling on skl+
On Wed, Jun 10, 2015 at 03:30:47PM -0700, Anuj Phogat wrote: > Buffers with Yf/Ys tiling end up using meta upload / download > paths or the blitter for cases where they used tiled_memcpy paths > in case of Y tiling. This has exposed some bugs in meta path. To > avoid any piglit regressions on SKL this patch keeps the Yf/Ys > tiling disabled at the moment. > > V3: Make brw_miptree_choose_tr_mode() actually choose TRMODE. (Ben) > Few cosmetic changes. > V4: Get rid of brw_miptree_choose_tr_mode(). > Take care of all tile resource modes {Yf, Ys, none} for all > generations at one place. > > Signed-off-by: Anuj Phogat > Cc: Ben Widawsky > --- > src/mesa/drivers/dri/i965/brw_tex_layout.c | 97 > -- > 1 file changed, 79 insertions(+), 18 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c > b/src/mesa/drivers/dri/i965/brw_tex_layout.c > index b9ac4cf..c0ef5cc 100644 > --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c > +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c > @@ -807,27 +807,88 @@ brw_miptree_layout(struct brw_context *brw, > enum intel_miptree_tiling_mode requested, > struct intel_mipmap_tree *mt) > { > - mt->tr_mode = INTEL_MIPTREE_TRMODE_NONE; > + const unsigned bpp = mt->cpp * 8; > + const bool is_tr_mode_yf_ys_allowed = > + brw->gen >= 9 && > + !for_bo && > + !mt->compressed && > + /* Enable YF/YS tiling only for color surfaces because depth and > + * stencil surfaces are not supported in blitter using fast copy > + * blit and meta PBO upload, download paths. No other paths > + * currently support Yf/Ys tiled surfaces. > + * FIXME: Remove this restriction once we have a tiled_memcpy() > + * path to do depth/stencil data upload/download to Yf/Ys tiled > + * surfaces. > + */ I think it's more readable to move this comment above the variable declaration. Up to you though. Also I think "FINISHME" is the more appropriate classification for this type of thing. > + _mesa_is_format_color_format(mt->format) && > + (requested == INTEL_MIPTREE_TILING_Y || > + requested == INTEL_MIPTREE_TILING_ANY) && This is where my tiling flags would have helped a bit since you should be able to do flags & Y_TILED :P > + (bpp && is_power_of_two(bpp)) && > + /* FIXME: To avoid piglit regressions keep the Yf/Ys tiling > + * disabled at the moment. > + */ > + false; Also, "FINISHME" > > - intel_miptree_set_alignment(brw, mt); > - intel_miptree_set_total_width_height(brw, mt); > + /* Lower index (Yf) is the higher priority mode */ > + const uint32_t tr_mode[3] = {INTEL_MIPTREE_TRMODE_YF, > +INTEL_MIPTREE_TRMODE_YS, > +INTEL_MIPTREE_TRMODE_NONE}; > + int i = is_tr_mode_yf_ys_allowed ? 0 : ARRAY_SIZE(tr_mode) - 1; > > - if (!mt->total_width || !mt->total_height) { > - intel_miptree_release(&mt); > - return; > - } > + while (i < ARRAY_SIZE(tr_mode)) { > + if (brw->gen < 9) > + assert(tr_mode[i] == INTEL_MIPTREE_TRMODE_NONE); > + else > + assert(tr_mode[i] == INTEL_MIPTREE_TRMODE_YF || > +tr_mode[i] == INTEL_MIPTREE_TRMODE_YS || > +tr_mode[i] == INTEL_MIPTREE_TRMODE_NONE); > > - /* On Gen9+ the alignment values are expressed in multiples of the block > -* size > -*/ > - if (brw->gen >= 9) { > - unsigned int i, j; > - _mesa_get_format_block_size(mt->format, &i, &j); > - mt->align_w /= i; > - mt->align_h /= j; > - } > + mt->tr_mode = tr_mode[i]; > + intel_miptree_set_alignment(brw, mt); > + intel_miptree_set_total_width_height(brw, mt); > > - if (!for_bo) > - mt->tiling = brw_miptree_choose_tiling(brw, requested, mt); > + if (!mt->total_width || !mt->total_height) { > + intel_miptree_release(&mt); > + return; > + } > + > + /* On Gen9+ the alignment values are expressed in multiples of the > + * block size. > + */ > + if (brw->gen >= 9) { > + unsigned int i, j; > + _mesa_get_format_block_size(mt->format, &i, &j); > + mt->align_w /= i; > + mt->align_h /= j; > + } Can we just combine this alignment calculation into intel_miptree_set_alignment()? > + > + if (!for_bo) > + mt->tiling = brw_miptree_choose_tiling(brw, requested, mt); Perhaps (fwiw, I prefer break instead of returning within a loop, but that's just me)? /* If there is already a BO, we cannot effect tiling modes */ if (for_bo) break; mt->tiling = brw_miptree_choose_tiling(brw, requested, mt);; if (is_tr_mode_yf_ys_allowed) { ... } This sort of reflects how I felt earlier about pushing the YF/YS decision into choose tiling. The code is heading in that direction though, so I am content. > + > + if (is_tr_mode_yf_ys_allowed
[Mesa-dev] [PATCH 3/3] i965: Initialize backend_shader::mem_ctx in its constructor.
We were initializing it in each subclasses' constructors for some reason. --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 4 +--- src/mesa/drivers/dri/i965/brw_shader.cpp | 2 ++ src/mesa/drivers/dri/i965/brw_shader.h | 1 + src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 3 +-- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 4770838..dc992dd 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1984,13 +1984,11 @@ fs_visitor::fs_visitor(struct brw_context *brw, struct gl_shader_program *shader_prog, struct gl_program *prog, unsigned dispatch_width) - : backend_shader(brw, shader_prog, prog, prog_data, stage), + : backend_shader(brw, mem_ctx, shader_prog, prog, prog_data, stage), key(key), prog_data(prog_data), dispatch_width(dispatch_width), promoted_constants(0), bld(fs_builder(this, dispatch_width).at_end()) { - this->mem_ctx = mem_ctx; - switch (stage) { case MESA_SHADER_FRAGMENT: key_tex = &((const brw_wm_prog_key *) key)->tex; diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index 545ec26..7a26939 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -757,6 +757,7 @@ brw_abs_immediate(enum brw_reg_type type, struct brw_reg *reg) } backend_shader::backend_shader(struct brw_context *brw, + void *mem_ctx, struct gl_shader_program *shader_prog, struct gl_program *prog, struct brw_stage_prog_data *stage_prog_data, @@ -769,6 +770,7 @@ backend_shader::backend_shader(struct brw_context *brw, shader_prog(shader_prog), prog(prog), stage_prog_data(stage_prog_data), + mem_ctx(mem_ctx), cfg(NULL), stage(stage) { diff --git a/src/mesa/drivers/dri/i965/brw_shader.h b/src/mesa/drivers/dri/i965/brw_shader.h index da01d2f..e647749 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.h +++ b/src/mesa/drivers/dri/i965/brw_shader.h @@ -215,6 +215,7 @@ class backend_shader { protected: backend_shader(struct brw_context *brw, + void *mem_ctx, struct gl_shader_program *shader_prog, struct gl_program *prog, struct brw_stage_prog_data *stage_prog_data, diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index 0a76bde..669f769 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -3691,7 +3691,7 @@ vec4_visitor::vec4_visitor(struct brw_context *brw, shader_time_shader_type st_base, shader_time_shader_type st_written, shader_time_shader_type st_reset) - : backend_shader(brw, shader_prog, prog, &prog_data->base, stage), + : backend_shader(brw, mem_ctx, shader_prog, prog, &prog_data->base, stage), c(c), key(key), prog_data(prog_data), @@ -3704,7 +3704,6 @@ vec4_visitor::vec4_visitor(struct brw_context *brw, st_written(st_written), st_reset(st_reset) { - this->mem_ctx = mem_ctx; this->failed = false; this->base_ir = NULL; -- 2.3.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] i965/cfg: Assert that cur_do/while/if pointers are non-NULL.
Coverity sees that the functions immediately below the new assertions dereference these pointers, but is unaware that an ENDIF always follows an IF, etc. --- src/mesa/drivers/dri/i965/brw_cfg.cpp | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_cfg.cpp b/src/mesa/drivers/dri/i965/brw_cfg.cpp index 39c419b..f1f230e 100644 --- a/src/mesa/drivers/dri/i965/brw_cfg.cpp +++ b/src/mesa/drivers/dri/i965/brw_cfg.cpp @@ -231,6 +231,7 @@ cfg_t::cfg_t(exec_list *instructions) if (cur_else) { cur_else->add_successor(mem_ctx, cur_endif); } else { +assert(cur_if != NULL); cur_if->add_successor(mem_ctx, cur_endif); } @@ -299,6 +300,7 @@ cfg_t::cfg_t(exec_list *instructions) inst->exec_node::remove(); cur->instructions.push_tail(inst); + assert(cur_do != NULL && cur_while != NULL); cur->add_successor(mem_ctx, cur_do); set_next_block(&cur, cur_while, ip); -- 2.3.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] i965: Assert that the GL primitive isn't out of range.
Coverity sees the if (mode >= BRW_PRIM_OFFSET (128)) test and assumes that the else-branch might execute for mode to up 127, which out be out of bounds. --- src/mesa/drivers/dri/i965/brw_draw.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c index a7164db..b91597a 100644 --- a/src/mesa/drivers/dri/i965/brw_draw.c +++ b/src/mesa/drivers/dri/i965/brw_draw.c @@ -92,8 +92,10 @@ get_hw_prim_for_gl_prim(int mode) { if (mode >= BRW_PRIM_OFFSET) return mode - BRW_PRIM_OFFSET; - else + else { + assert(mode < ARRAY_SIZE(prim_to_hw_prim)); return prim_to_hw_prim[mode]; + } } -- 2.3.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/5] i965: Make a helper function intel_miptree_release_levels()
I am shocked this is the only place we do this... On Wed, Jun 10, 2015 at 03:30:48PM -0700, Anuj Phogat wrote: > Signed-off-by: Anuj Phogat > Cc: Ben Widawsky > --- > src/mesa/drivers/dri/i965/brw_tex_layout.c | 17 - > 1 file changed, 12 insertions(+), 5 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c > b/src/mesa/drivers/dri/i965/brw_tex_layout.c > index c0ef5cc..c185e41 100644 > --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c > +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c > @@ -801,6 +801,17 @@ intel_miptree_set_alignment(struct brw_context *brw, > } > } > > +static void > +intel_miptree_release_levels(struct intel_mipmap_tree *mt) > +{ > + unsigned int level = 0; > + > + for (level = mt->first_level; level <= mt->last_level; level++) { > + free(mt->level[level].slice); > + mt->level[level].slice = NULL; > + } > +} > + > void > brw_miptree_layout(struct brw_context *brw, > bool for_bo, > @@ -866,7 +877,6 @@ brw_miptree_layout(struct brw_context *brw, > mt->tiling = brw_miptree_choose_tiling(brw, requested, mt); > >if (is_tr_mode_yf_ys_allowed) { > - unsigned int level = 0; > assert(brw->gen >= 9); > > if (mt->tiling == I915_TILING_Y || > @@ -883,10 +893,7 @@ brw_miptree_layout(struct brw_context *brw, > /* Failed to use selected tr_mode. Free up the memory allocated >* for miptree levels in intel_miptree_total_width_height(). >*/ > - for (level = mt->first_level; level <= mt->last_level; level++) { > -free(mt->level[level].slice); > -mt->level[level].slice = NULL; > - } > + intel_miptree_release_levels(mt); >} >i++; > } > -- > 1.9.3 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/5] i965: Make a helper function intel_miptree_can_use_tr_mode()
1-4 (with/without changes) are: Reviewed-by: Ben Widawsky On Wed, Jun 10, 2015 at 03:30:49PM -0700, Anuj Phogat wrote: > Signed-off-by: Anuj Phogat > Cc: Ben Widawsky > --- > src/mesa/drivers/dri/i965/brw_tex_layout.c | 30 > +++--- > 1 file changed, 19 insertions(+), 11 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c > b/src/mesa/drivers/dri/i965/brw_tex_layout.c > index c185e41..39c6a39 100644 > --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c > +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c > @@ -812,6 +812,23 @@ intel_miptree_release_levels(struct intel_mipmap_tree > *mt) > } > } > > +static bool > +intel_miptree_can_use_tr_mode(const struct intel_mipmap_tree *mt) > +{ > + if (mt->tiling == I915_TILING_Y || > + mt->tiling == (I915_TILING_Y | I915_TILING_X) || > + mt->tr_mode == INTEL_MIPTREE_TRMODE_NONE) { > + /* FIXME: Don't allow YS tiling at the moment. Using 64KB tiling > + * for small textures might result in to memory wastage. Revisit > + * this condition when we have more information about the specific > + * cases where using YS over YF will be useful. > + */ > + if (mt->tr_mode != INTEL_MIPTREE_TRMODE_YS) > + return true; > + } > + return false; > +} > + > void > brw_miptree_layout(struct brw_context *brw, > bool for_bo, > @@ -879,17 +896,8 @@ brw_miptree_layout(struct brw_context *brw, >if (is_tr_mode_yf_ys_allowed) { > assert(brw->gen >= 9); > > - if (mt->tiling == I915_TILING_Y || > - mt->tiling == (I915_TILING_Y | I915_TILING_X) || > - mt->tr_mode == INTEL_MIPTREE_TRMODE_NONE) { > -/* FIXME: Don't allow YS tiling at the moment. Using 64KB tiling > - * for small textures might result in to memory wastage. Revisit > - * this condition when we have more information about the > specific > - * cases where using YS over YF will be useful. > - */ > -if (mt->tr_mode != INTEL_MIPTREE_TRMODE_YS) > - return; > - } > + if (intel_miptree_can_use_tr_mode(mt)) > +return; > /* Failed to use selected tr_mode. Free up the memory allocated >* for miptree levels in intel_miptree_total_width_height(). >*/ > -- > 1.9.3 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/fs: Don't mess up stride for uniform integer multiplication.
If the stride is 0, the source is a uniform and we should not modify the stride. Cc: "10.6" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91047 --- src/mesa/drivers/dri/i965/brw_fs.cpp | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 5563c5a..903624c 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3196,10 +3196,16 @@ fs_visitor::lower_integer_multiplication() src1_1_w.fixed_hw_reg.dw1.ud >>= 16; } else { src1_0_w.type = BRW_REGISTER_TYPE_UW; - src1_0_w.stride = 2; + if (src1_0_w.stride != 0) { + assert(src1_0_w.stride == 1); + src1_0_w.stride = 2; + } src1_1_w.type = BRW_REGISTER_TYPE_UW; - src1_1_w.stride = 2; + if (src1_1_w.stride != 0) { + assert(src1_1_w.stride == 1); + src1_1_w.stride = 2; + } src1_1_w.subreg_offset += type_sz(BRW_REGISTER_TYPE_UW); } ibld.MUL(low, inst->src[0], src1_0_w); @@ -3209,10 +3215,16 @@ fs_visitor::lower_integer_multiplication() fs_reg src0_1_w = inst->src[0]; src0_0_w.type = BRW_REGISTER_TYPE_UW; -src0_0_w.stride = 2; +if (src0_0_w.stride != 0) { + assert(src0_0_w.stride == 1); + src0_0_w.stride = 2; +} src0_1_w.type = BRW_REGISTER_TYPE_UW; -src0_1_w.stride = 2; +if (src0_1_w.stride != 0) { + assert(src0_1_w.stride == 1); + src0_1_w.stride = 2; +} src0_1_w.subreg_offset += type_sz(BRW_REGISTER_TYPE_UW); ibld.MUL(low, src0_0_w, inst->src[1]); -- 2.3.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: Delete unused ICEIL().
Can't find any uses of it in git history. --- Strangely, when it was moved to its current location in commit 27558a1, it was moved from mmath.h... which seems to have been lost from git's history. Searching further git log --grep mmath.h shows that various commit messages mention modifying mmath.h and none of the commits actually do. src/mesa/main/imports.h | 32 1 file changed, 32 deletions(-) diff --git a/src/mesa/main/imports.h b/src/mesa/main/imports.h index c4d917e..9ffe3de 100644 --- a/src/mesa/main/imports.h +++ b/src/mesa/main/imports.h @@ -230,38 +230,6 @@ static inline int IFLOOR(float f) } -/** Return (as an integer) ceiling of float */ -static inline int ICEIL(float f) -{ -#if defined(USE_X86_ASM) && defined(__GNUC__) && defined(__i386__) - /* -* IEEE ceil for computers that round to nearest or even. -* 'f' must be between -4194304 and 4194303. -* This ceil operation is done by "(iround(f + .5) + iround(f - .5) + 1) >> 1", -* but uses some IEEE specific tricks for better speed. -* Contributed by Josh Vanderhoof -*/ - int ai, bi; - double af, bf; - af = (3 << 22) + 0.5 + (double)f; - bf = (3 << 22) + 0.5 - (double)f; - /* GCC generates an extra fstp/fld without this. */ - __asm__ ("fstps %0" : "=m" (ai) : "t" (af) : "st"); - __asm__ ("fstps %0" : "=m" (bi) : "t" (bf) : "st"); - return (ai - bi + 1) >> 1; -#else - int ai, bi; - double af, bf; - fi_type u; - af = (3 << 22) + 0.5 + (double)f; - bf = (3 << 22) + 0.5 - (double)f; - u.f = (float) af; ai = u.i; - u.f = (float) bf; bi = u.i; - return (ai - bi + 1) >> 1; -#endif -} - - /** * Is x a power of two? */ -- 2.3.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: Delete unused ICEIL().
Reviewed-by: Jordan Justen On 2015-06-22 14:58:27, Matt Turner wrote: > Can't find any uses of it in git history. > --- > Strangely, when it was moved to its current location in commit 27558a1, > it was moved from mmath.h... which seems to have been lost from git's > history. Searching further git log --grep mmath.h shows that various > commit messages mention modifying mmath.h and none of the commits > actually do. > > src/mesa/main/imports.h | 32 > 1 file changed, 32 deletions(-) > > diff --git a/src/mesa/main/imports.h b/src/mesa/main/imports.h > index c4d917e..9ffe3de 100644 > --- a/src/mesa/main/imports.h > +++ b/src/mesa/main/imports.h > @@ -230,38 +230,6 @@ static inline int IFLOOR(float f) > } > > > -/** Return (as an integer) ceiling of float */ > -static inline int ICEIL(float f) > -{ > -#if defined(USE_X86_ASM) && defined(__GNUC__) && defined(__i386__) > - /* > -* IEEE ceil for computers that round to nearest or even. > -* 'f' must be between -4194304 and 4194303. > -* This ceil operation is done by "(iround(f + .5) + iround(f - .5) + 1) > >> 1", > -* but uses some IEEE specific tricks for better speed. > -* Contributed by Josh Vanderhoof > -*/ > - int ai, bi; > - double af, bf; > - af = (3 << 22) + 0.5 + (double)f; > - bf = (3 << 22) + 0.5 - (double)f; > - /* GCC generates an extra fstp/fld without this. */ > - __asm__ ("fstps %0" : "=m" (ai) : "t" (af) : "st"); > - __asm__ ("fstps %0" : "=m" (bi) : "t" (bf) : "st"); > - return (ai - bi + 1) >> 1; > -#else > - int ai, bi; > - double af, bf; > - fi_type u; > - af = (3 << 22) + 0.5 + (double)f; > - bf = (3 << 22) + 0.5 - (double)f; > - u.f = (float) af; ai = u.i; > - u.f = (float) bf; bi = u.i; > - return (ai - bi + 1) >> 1; > -#endif > -} > - > - > /** > * Is x a power of two? > */ > -- > 2.3.6 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] Compatibility between old dri modules and new loaders, and vice verse
On 06/22/2015 11:54 AM, Dave Airlie wrote: >> >> As kindly hinted by Marek, currently we do have a wide selection of >> supported dri <> loader combinations. >> >> Although we like to think that things never break, we have to admit >> that not many of us test every possible combinations of dri modules >> and loaders. With the chances getting smaller as the time gap (age) >> between the two increases. As such I would like to ask if we're >> interested in gradually depreciating as the gap grows beyond X years. >> >> The rough idea that I have in my mind is: >> - Check for obsolete extensions (requirements for such) - both in the >> dri modules and the loaders (including the xserver). >> - Add some WARN messages ("You're using an old loader/DRI module. >> Update to XXX or later") when such code path is hit. >> - After X mesa releases, we remove the dri extension from the >> module(s) and bump the requirement(s) in the loader(s). >> >> And now the more important question why ? >> - Very rarely tested and not actively supported - if it works it >> works, we only cover one stable branch. >> - Having a quick look at the the "if extension && extension.version >>> = y" maze does leave most of us speechless. >> - Will allow us to start removing a few of the nasty quirks/hacks >> that we currently have laying around. >> >> Worth mentioning: >> - Depreciation period will be based on the longest time frame set by >> LTS versions of distros. For example if Debian A ships X and mesa 3 >> years apart, while Ubuntu does is ~2.5 and RedHat ~2.8, we'll stick >> with 3 years. >> - libGL dri1 support... it's been almost four years since the removal >> of the dri1 modules. Since then the only activity that I've noticed by >> Connor Behan on the r128 front. Although it seems that he has covered >> the ddx and is just looking at the kernel side of things. Should we >> consider mesa X (10.6 ?) as the last one that supports such old >> modules in it's libGL and give it a much needed cleanup ? >> >> >> How would people feel about this - do we have any strong ack/nack >> about the idea ? Are there many people/companies that support distros >> where the xserver <> mesa gap is over, say 2 years ? > > We still ship 7.11 based dri1 drivers in RHEL6, and there is still a > chance of us rebasing to newer Mesa in that depending on schedules. > > ajax might have a different opinion, on how likely that is, but > that would be at least another year from now where we'd want DRI1 > to work. A time line would be good. I think it will take a fair amount of time to get a new loader<>driver interface in order. If we can't change anything for two years, then there's not a lot of point to thinking about it now. If it's a year or less away, that's a different story. The other possibility would be for RHEL to ship more than one libGL... one for DRI1 drivers and one for everything else. I don't know how horrible that would be. > Dave. > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] Compatibility between old dri modules and new loaders, and vice verse
On 23 June 2015 at 08:16, Ian Romanick wrote: > On 06/22/2015 11:54 AM, Dave Airlie wrote: >>> >>> As kindly hinted by Marek, currently we do have a wide selection of >>> supported dri <> loader combinations. >>> >>> Although we like to think that things never break, we have to admit >>> that not many of us test every possible combinations of dri modules >>> and loaders. With the chances getting smaller as the time gap (age) >>> between the two increases. As such I would like to ask if we're >>> interested in gradually depreciating as the gap grows beyond X years. >>> >>> The rough idea that I have in my mind is: >>> - Check for obsolete extensions (requirements for such) - both in the >>> dri modules and the loaders (including the xserver). >>> - Add some WARN messages ("You're using an old loader/DRI module. >>> Update to XXX or later") when such code path is hit. >>> - After X mesa releases, we remove the dri extension from the >>> module(s) and bump the requirement(s) in the loader(s). >>> >>> And now the more important question why ? >>> - Very rarely tested and not actively supported - if it works it >>> works, we only cover one stable branch. >>> - Having a quick look at the the "if extension && extension.version = y" maze does leave most of us speechless. >>> - Will allow us to start removing a few of the nasty quirks/hacks >>> that we currently have laying around. >>> >>> Worth mentioning: >>> - Depreciation period will be based on the longest time frame set by >>> LTS versions of distros. For example if Debian A ships X and mesa 3 >>> years apart, while Ubuntu does is ~2.5 and RedHat ~2.8, we'll stick >>> with 3 years. >>> - libGL dri1 support... it's been almost four years since the removal >>> of the dri1 modules. Since then the only activity that I've noticed by >>> Connor Behan on the r128 front. Although it seems that he has covered >>> the ddx and is just looking at the kernel side of things. Should we >>> consider mesa X (10.6 ?) as the last one that supports such old >>> modules in it's libGL and give it a much needed cleanup ? >>> >>> >>> How would people feel about this - do we have any strong ack/nack >>> about the idea ? Are there many people/companies that support distros >>> where the xserver <> mesa gap is over, say 2 years ? >> >> We still ship 7.11 based dri1 drivers in RHEL6, and there is still a >> chance of us rebasing to newer Mesa in that depending on schedules. >> >> ajax might have a different opinion, on how likely that is, but >> that would be at least another year from now where we'd want DRI1 >> to work. > > A time line would be good. I think it will take a fair amount of time > to get a new loader<>driver interface in order. If we can't change > anything for two years, then there's not a lot of point to thinking > about it now. If it's a year or less away, that's a different story. > > The other possibility would be for RHEL to ship more than one libGL... > one for DRI1 drivers and one for everything else. I don't know how > horrible that would be. That would worse than impossible, it's bad enough nvidia overwrite libGL I don't want us to do it as well to ourselves :-) Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Don't mess up stride for uniform integer multiplication.
On Monday, June 22, 2015 02:58:36 PM Matt Turner wrote: > If the stride is 0, the source is a uniform and we should not modify the > stride. > > Cc: "10.6" > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91047 > --- > src/mesa/drivers/dri/i965/brw_fs.cpp | 20 > 1 file changed, 16 insertions(+), 4 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > b/src/mesa/drivers/dri/i965/brw_fs.cpp > index 5563c5a..903624c 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > @@ -3196,10 +3196,16 @@ fs_visitor::lower_integer_multiplication() > src1_1_w.fixed_hw_reg.dw1.ud >>= 16; > } else { > src1_0_w.type = BRW_REGISTER_TYPE_UW; > - src1_0_w.stride = 2; > + if (src1_0_w.stride != 0) { > + assert(src1_0_w.stride == 1); > + src1_0_w.stride = 2; > + } > > src1_1_w.type = BRW_REGISTER_TYPE_UW; > - src1_1_w.stride = 2; > + if (src1_1_w.stride != 0) { > + assert(src1_1_w.stride == 1); > + src1_1_w.stride = 2; > + } > src1_1_w.subreg_offset += type_sz(BRW_REGISTER_TYPE_UW); > } > ibld.MUL(low, inst->src[0], src1_0_w); > @@ -3209,10 +3215,16 @@ fs_visitor::lower_integer_multiplication() > fs_reg src0_1_w = inst->src[0]; > > src0_0_w.type = BRW_REGISTER_TYPE_UW; > -src0_0_w.stride = 2; > +if (src0_0_w.stride != 0) { > + assert(src0_0_w.stride == 1); > + src0_0_w.stride = 2; > +} > > src0_1_w.type = BRW_REGISTER_TYPE_UW; > -src0_1_w.stride = 2; > +if (src0_1_w.stride != 0) { > + assert(src0_1_w.stride == 1); > + src0_1_w.stride = 2; > +} > src0_1_w.subreg_offset += type_sz(BRW_REGISTER_TYPE_UW); > > ibld.MUL(low, src0_0_w, inst->src[1]); > Whoops. Yeah, this makes sense. Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] i965/cfg: Assert that cur_do/while/if pointers are non-NULL.
Series Reviewed-by: Jordan Justen On 2015-06-22 14:56:06, Matt Turner wrote: > Coverity sees that the functions immediately below the new assertions > dereference these pointers, but is unaware that an ENDIF always follows > an IF, etc. > --- > src/mesa/drivers/dri/i965/brw_cfg.cpp | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_cfg.cpp > b/src/mesa/drivers/dri/i965/brw_cfg.cpp > index 39c419b..f1f230e 100644 > --- a/src/mesa/drivers/dri/i965/brw_cfg.cpp > +++ b/src/mesa/drivers/dri/i965/brw_cfg.cpp > @@ -231,6 +231,7 @@ cfg_t::cfg_t(exec_list *instructions) > if (cur_else) { > cur_else->add_successor(mem_ctx, cur_endif); > } else { > +assert(cur_if != NULL); > cur_if->add_successor(mem_ctx, cur_endif); > } > > @@ -299,6 +300,7 @@ cfg_t::cfg_t(exec_list *instructions) > inst->exec_node::remove(); > cur->instructions.push_tail(inst); > > + assert(cur_do != NULL && cur_while != NULL); > cur->add_successor(mem_ctx, cur_do); > set_next_block(&cur, cur_while, ip); > > -- > 2.3.6 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 06/18] mesa/glformats: recognize ASTC formats as compressed
From: Nanley Chery Reviewed-by: Anuj Phogat Signed-off-by: Nanley Chery --- src/mesa/main/glformats.c | 29 + 1 file changed, 29 insertions(+) diff --git a/src/mesa/main/glformats.c b/src/mesa/main/glformats.c index ac69fab..e7363b5 100644 --- a/src/mesa/main/glformats.c +++ b/src/mesa/main/glformats.c @@ -1262,6 +1262,35 @@ _mesa_is_compressed_format(const struct gl_context *ctx, GLenum format) case GL_COMPRESSED_RGB_BPTC_UNSIGNED_FLOAT: return _mesa_is_desktop_gl(ctx) && ctx->Extensions.ARB_texture_compression_bptc; + case GL_COMPRESSED_RGBA_ASTC_4x4_KHR: + case GL_COMPRESSED_RGBA_ASTC_5x4_KHR: + case GL_COMPRESSED_RGBA_ASTC_5x5_KHR: + case GL_COMPRESSED_RGBA_ASTC_6x5_KHR: + case GL_COMPRESSED_RGBA_ASTC_6x6_KHR: + case GL_COMPRESSED_RGBA_ASTC_8x5_KHR: + case GL_COMPRESSED_RGBA_ASTC_8x6_KHR: + case GL_COMPRESSED_RGBA_ASTC_8x8_KHR: + case GL_COMPRESSED_RGBA_ASTC_10x5_KHR: + case GL_COMPRESSED_RGBA_ASTC_10x6_KHR: + case GL_COMPRESSED_RGBA_ASTC_10x8_KHR: + case GL_COMPRESSED_RGBA_ASTC_10x10_KHR: + case GL_COMPRESSED_RGBA_ASTC_12x10_KHR: + case GL_COMPRESSED_RGBA_ASTC_12x12_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_4x4_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_5x4_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_5x5_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_6x5_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_6x6_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_8x5_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_8x6_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_8x8_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_10x5_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_10x6_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_10x8_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_10x10_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_12x10_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_12x12_KHR: + return ctx->Extensions.KHR_texture_compression_astc_ldr; case GL_PALETTE4_RGB8_OES: case GL_PALETTE4_RGBA8_OES: case GL_PALETTE4_R5_G6_B5_OES: -- 2.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 03/18] mesa: disable online compression for ASTC formats
From: Nanley Chery Reviewed-by: Anuj Phogat Signed-off-by: Nanley Chery --- src/mesa/main/texcompress.c | 22 ++ src/mesa/main/teximage.c| 28 2 files changed, 50 insertions(+) diff --git a/src/mesa/main/texcompress.c b/src/mesa/main/texcompress.c index 0fd1a36..1654fc6 100644 --- a/src/mesa/main/texcompress.c +++ b/src/mesa/main/texcompress.c @@ -229,6 +229,28 @@ _mesa_gl_compressed_format_base_format(GLenum format) *what GL_NUM_COMPRESSED_TEXTURE_FORMATS and *GL_COMPRESSED_TEXTURE_FORMATS return." * + * The KHR_texture_compression_astc_hdr spec says: + * + *"Interactions with OpenGL 4.2 + * + *OpenGL 4.2 supports the feature that compressed textures can be + *compressed online, by passing the compressed texture format enum as + *the internal format when uploading a texture using TexImage1D, + *TexImage2D or TexImage3D (see Section 3.9.3, Texture Image + *Specification, subsection Encoding of Special Internal Formats). + * + *Due to the complexity of the ASTC compression algorithm, it is not + *usually suitable for online use, and therefore ASTC support will be + *limited to pre-compressed textures only. Where on-device compression + *is required, a domain-specific limited compressor will typically + *be used, and this is therefore not suitable for implementation in + *the driver. + * + *In particular, the ASTC format specifiers will not be added to + *Table 3.14, and thus will not be accepted by the TexImage*D + *functions, and will not be returned by the (already deprecated) + *COMPRESSED_TEXTURE_FORMATS query." + * * There is no formal spec for GL_ATI_texture_compression_3dc. Since the * formats added by this extension are luminance-alpha formats, it is * reasonable to expect them to follow the same rules as diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c index 3d85615..86ef407 100644 --- a/src/mesa/main/teximage.c +++ b/src/mesa/main/teximage.c @@ -1778,6 +1778,34 @@ compressedteximage_only_format(const struct gl_context *ctx, GLenum format) case GL_PALETTE8_R5_G6_B5_OES: case GL_PALETTE8_RGBA4_OES: case GL_PALETTE8_RGB5_A1_OES: + case GL_COMPRESSED_RGBA_ASTC_4x4_KHR: + case GL_COMPRESSED_RGBA_ASTC_5x4_KHR: + case GL_COMPRESSED_RGBA_ASTC_5x5_KHR: + case GL_COMPRESSED_RGBA_ASTC_6x5_KHR: + case GL_COMPRESSED_RGBA_ASTC_6x6_KHR: + case GL_COMPRESSED_RGBA_ASTC_8x5_KHR: + case GL_COMPRESSED_RGBA_ASTC_8x6_KHR: + case GL_COMPRESSED_RGBA_ASTC_8x8_KHR: + case GL_COMPRESSED_RGBA_ASTC_10x5_KHR: + case GL_COMPRESSED_RGBA_ASTC_10x6_KHR: + case GL_COMPRESSED_RGBA_ASTC_10x8_KHR: + case GL_COMPRESSED_RGBA_ASTC_10x10_KHR: + case GL_COMPRESSED_RGBA_ASTC_12x10_KHR: + case GL_COMPRESSED_RGBA_ASTC_12x12_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_4x4_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_5x4_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_5x5_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_6x5_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_6x6_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_8x5_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_8x6_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_8x8_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_10x5_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_10x6_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_10x8_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_10x10_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_12x10_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_12x12_KHR: return GL_TRUE; default: return GL_FALSE; -- 2.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 02/18] glapi: add support for KHR_texture_compression_astc_ldr
From: Nanley Chery v2: correct the spelling of the sRGB variants. remove spaces around "=" when setting the enum value. Reviewed-by: Anuj Phogat Signed-off-by: Nanley Chery --- .../glapi/gen/KHR_texture_compression_astc.xml | 40 ++ src/mapi/glapi/gen/Makefile.am | 1 + src/mapi/glapi/gen/gl_API.xml | 2 +- 3 files changed, 42 insertions(+), 1 deletion(-) create mode 100644 src/mapi/glapi/gen/KHR_texture_compression_astc.xml diff --git a/src/mapi/glapi/gen/KHR_texture_compression_astc.xml b/src/mapi/glapi/gen/KHR_texture_compression_astc.xml new file mode 100644 index 000..7b5864d --- /dev/null +++ b/src/mapi/glapi/gen/KHR_texture_compression_astc.xml @@ -0,0 +1,40 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am index 5b163b0..53edab5 100644 --- a/src/mapi/glapi/gen/Makefile.am +++ b/src/mapi/glapi/gen/Makefile.am @@ -187,6 +187,7 @@ API_XML = \ INTEL_performance_query.xml \ KHR_debug.xml \ KHR_context_flush_control.xml \ + KHR_texture_compression_astc.xml \ NV_conditional_render.xml \ NV_primitive_restart.xml \ NV_texture_barrier.xml \ diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml index 2f33075..8df58a3 100644 --- a/src/mapi/glapi/gen/gl_API.xml +++ b/src/mapi/glapi/gen/gl_API.xml @@ -8162,7 +8162,7 @@ http://www.w3.org/2001/XInclude"/> - +http://www.w3.org/2001/XInclude"/> http://www.w3.org/2001/XInclude"/> -- 2.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 04/18] mesa: return bool instead of GLboolean in compressedteximage_only_format()
From: Nanley Chery In agreement with the coding style, functions that aren't directly visible to the GL API should prefer the use of bool over GLboolean. Suggested-by: Ian Romanick Signed-off-by: Nanley Chery --- src/mesa/main/teximage.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c index 86ef407..0e0488a 100644 --- a/src/mesa/main/teximage.c +++ b/src/mesa/main/teximage.c @@ -1763,7 +1763,7 @@ _mesa_test_proxy_teximage(struct gl_context *ctx, GLenum target, GLint level, /** * Return true if the format is only valid for glCompressedTexImage. */ -static GLboolean +static bool compressedteximage_only_format(const struct gl_context *ctx, GLenum format) { switch (format) { @@ -1806,9 +1806,9 @@ compressedteximage_only_format(const struct gl_context *ctx, GLenum format) case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_10x10_KHR: case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_12x10_KHR: case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_12x12_KHR: - return GL_TRUE; + return true; default: - return GL_FALSE; + return false; } } -- 2.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 07/18] mesa/texcompress: enable translation between MESA and GL ASTC formats
From: Nanley Chery Reviewed-by: Anuj Phogat Signed-off-by: Nanley Chery --- src/mesa/main/texcompress.c | 114 1 file changed, 114 insertions(+) diff --git a/src/mesa/main/texcompress.c b/src/mesa/main/texcompress.c index 1654fc6..203a065 100644 --- a/src/mesa/main/texcompress.c +++ b/src/mesa/main/texcompress.c @@ -471,6 +471,63 @@ _mesa_glenum_to_compressed_format(GLenum format) case GL_COMPRESSED_RGB_BPTC_UNSIGNED_FLOAT: return MESA_FORMAT_BPTC_RGB_UNSIGNED_FLOAT; + case GL_COMPRESSED_RGBA_ASTC_4x4_KHR: + return MESA_FORMAT_ASTC_4x4_RGBA; + case GL_COMPRESSED_RGBA_ASTC_5x4_KHR: + return MESA_FORMAT_ASTC_5x4_RGBA; + case GL_COMPRESSED_RGBA_ASTC_5x5_KHR: + return MESA_FORMAT_ASTC_5x5_RGBA; + case GL_COMPRESSED_RGBA_ASTC_6x5_KHR: + return MESA_FORMAT_ASTC_6x5_RGBA; + case GL_COMPRESSED_RGBA_ASTC_6x6_KHR: + return MESA_FORMAT_ASTC_6x6_RGBA; + case GL_COMPRESSED_RGBA_ASTC_8x5_KHR: + return MESA_FORMAT_ASTC_8x5_RGBA; + case GL_COMPRESSED_RGBA_ASTC_8x6_KHR: + return MESA_FORMAT_ASTC_8x6_RGBA; + case GL_COMPRESSED_RGBA_ASTC_8x8_KHR: + return MESA_FORMAT_ASTC_8x8_RGBA; + case GL_COMPRESSED_RGBA_ASTC_10x5_KHR: + return MESA_FORMAT_ASTC_10x5_RGBA; + case GL_COMPRESSED_RGBA_ASTC_10x6_KHR: + return MESA_FORMAT_ASTC_10x6_RGBA; + case GL_COMPRESSED_RGBA_ASTC_10x8_KHR: + return MESA_FORMAT_ASTC_10x8_RGBA; + case GL_COMPRESSED_RGBA_ASTC_10x10_KHR: + return MESA_FORMAT_ASTC_10x10_RGBA; + case GL_COMPRESSED_RGBA_ASTC_12x10_KHR: + return MESA_FORMAT_ASTC_12x10_RGBA; + case GL_COMPRESSED_RGBA_ASTC_12x12_KHR: + return MESA_FORMAT_ASTC_12x12_RGBA; + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_4x4_KHR: + return MESA_FORMAT_ASTC_4x4_SRGB8_ALPHA8; + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_5x4_KHR: + return MESA_FORMAT_ASTC_5x4_SRGB8_ALPHA8; + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_5x5_KHR: + return MESA_FORMAT_ASTC_5x5_SRGB8_ALPHA8; + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_6x5_KHR: + return MESA_FORMAT_ASTC_6x5_SRGB8_ALPHA8; + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_6x6_KHR: + return MESA_FORMAT_ASTC_6x6_SRGB8_ALPHA8; + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_8x5_KHR: + return MESA_FORMAT_ASTC_8x5_SRGB8_ALPHA8; + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_8x6_KHR: + return MESA_FORMAT_ASTC_8x6_SRGB8_ALPHA8; + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_8x8_KHR: + return MESA_FORMAT_ASTC_8x8_SRGB8_ALPHA8; + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_10x5_KHR: + return MESA_FORMAT_ASTC_10x5_SRGB8_ALPHA8; + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_10x6_KHR: + return MESA_FORMAT_ASTC_10x6_SRGB8_ALPHA8; + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_10x8_KHR: + return MESA_FORMAT_ASTC_10x8_SRGB8_ALPHA8; + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_10x10_KHR: + return MESA_FORMAT_ASTC_10x10_SRGB8_ALPHA8; + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_12x10_KHR: + return MESA_FORMAT_ASTC_12x10_SRGB8_ALPHA8; + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_12x12_KHR: + return MESA_FORMAT_ASTC_12x12_SRGB8_ALPHA8; + default: return MESA_FORMAT_NONE; } @@ -561,6 +618,63 @@ _mesa_compressed_format_to_glenum(struct gl_context *ctx, mesa_format mesaFormat case MESA_FORMAT_BPTC_RGB_UNSIGNED_FLOAT: return GL_COMPRESSED_RGB_BPTC_UNSIGNED_FLOAT; + case MESA_FORMAT_ASTC_4x4_RGBA: + return GL_COMPRESSED_RGBA_ASTC_4x4_KHR; + case MESA_FORMAT_ASTC_5x4_RGBA: + return GL_COMPRESSED_RGBA_ASTC_5x4_KHR; + case MESA_FORMAT_ASTC_5x5_RGBA: + return GL_COMPRESSED_RGBA_ASTC_5x5_KHR; + case MESA_FORMAT_ASTC_6x5_RGBA: + return GL_COMPRESSED_RGBA_ASTC_6x5_KHR; + case MESA_FORMAT_ASTC_6x6_RGBA: + return GL_COMPRESSED_RGBA_ASTC_6x6_KHR; + case MESA_FORMAT_ASTC_8x5_RGBA: + return GL_COMPRESSED_RGBA_ASTC_8x5_KHR; + case MESA_FORMAT_ASTC_8x6_RGBA: + return GL_COMPRESSED_RGBA_ASTC_8x6_KHR; + case MESA_FORMAT_ASTC_8x8_RGBA: + return GL_COMPRESSED_RGBA_ASTC_8x8_KHR; + case MESA_FORMAT_ASTC_10x5_RGBA: + return GL_COMPRESSED_RGBA_ASTC_10x5_KHR; + case MESA_FORMAT_ASTC_10x6_RGBA: + return GL_COMPRESSED_RGBA_ASTC_10x6_KHR; + case MESA_FORMAT_ASTC_10x8_RGBA: + return GL_COMPRESSED_RGBA_ASTC_10x8_KHR; + case MESA_FORMAT_ASTC_10x10_RGBA: + return GL_COMPRESSED_RGBA_ASTC_10x10_KHR; + case MESA_FORMAT_ASTC_12x10_RGBA: + return GL_COMPRESSED_RGBA_ASTC_12x10_KHR; + case MESA_FORMAT_ASTC_12x12_RGBA: + return GL_COMPRESSED_RGBA_ASTC_12x12_KHR; + case MESA_FORMAT_ASTC_4x4_SRGB8_ALPHA8: + return GL_COMPRESSED_SRGB8_ALPHA8_ASTC_4x4_KHR; + case MESA_FORMAT_ASTC_5x4_SRGB8_ALPHA8: + return GL_COMPRESSED_SRGB8_ALPHA8_ASTC_5x4_KHR; + case MESA_FORMAT_ASTC_5x5_SRGB8_ALPHA8: + return GL_COMPRESSED_SRGB8_ALPHA8_ASTC_5x5_KHR; + case MESA_FORMAT_ASTC_6x5_SRGB8_ALPHA8: + return GL_COMPRESSED_SRGB8_ALPHA8_ASTC_6x5_KHR; + case MESA_FORMAT_ASTC_6x6_SRGB8_ALPHA8: +
[Mesa-dev] [PATCH v3 05/18] mesa: add ASTC extensions to the extensions table
From: Nanley Chery v2: alphabetize the extensions. remove OES ASTC extension. Reviewed-by: Anuj Phogat Signed-off-by: Nanley Chery --- src/mesa/main/extensions.c | 2 ++ src/mesa/main/mtypes.h | 2 ++ 2 files changed, 4 insertions(+) diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c index 4176a69..adbeecc 100644 --- a/src/mesa/main/extensions.c +++ b/src/mesa/main/extensions.c @@ -337,6 +337,8 @@ static const struct extension extension_table[] = { /* KHR extensions */ { "GL_KHR_debug", o(dummy_true), GL, 2012 }, { "GL_KHR_context_flush_control", o(dummy_true), GL | ES2, 2014 }, + { "GL_KHR_texture_compression_astc_hdr", o(KHR_texture_compression_astc_hdr),GL | ES2, 2012 }, + { "GL_KHR_texture_compression_astc_ldr", o(KHR_texture_compression_astc_ldr),GL | ES2, 2012 }, /* Vendor extensions */ { "GL_3DFX_texture_compression_FXT1", o(TDFX_texture_compression_FXT1), GL, 1999 }, diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index 983b9dc..6a5d15f 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -3772,6 +3772,8 @@ struct gl_extensions GLboolean ATI_fragment_shader; GLboolean ATI_separate_stencil; GLboolean INTEL_performance_query; + GLboolean KHR_texture_compression_astc_hdr; + GLboolean KHR_texture_compression_astc_ldr; GLboolean MESA_pack_invert; GLboolean MESA_ycbcr_texture; GLboolean NV_conditional_render; -- 2.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 09/18] mesa/formats: store whether or not a format is sRGB in gl_format_info
From: Nanley Chery v2: remove extra newline. v3: use bool instead of GLboolean. Reviewed-by: Anuj Phogat Signed-off-by: Nanley Chery --- src/mesa/main/format_info.py | 2 ++ src/mesa/main/formats.c | 28 2 files changed, 6 insertions(+), 24 deletions(-) diff --git a/src/mesa/main/format_info.py b/src/mesa/main/format_info.py index 40104a2..8134e8e 100644 --- a/src/mesa/main/format_info.py +++ b/src/mesa/main/format_info.py @@ -191,6 +191,8 @@ for fmat in formats: bits = [ get_channel_bits(fmat, name) for name in ['l', 'i', 'z', 's']] print ' {0},'.format(', '.join(map(str, bits))) + print ' {0:d},'.format(fmat.colorspace == 'srgb') + print ' {0}, {1}, {2},'.format(fmat.block_width, fmat.block_height, int(fmat.block_size() / 8)) diff --git a/src/mesa/main/formats.c b/src/mesa/main/formats.c index 745fd8c..1f5a2b9 100644 --- a/src/mesa/main/formats.c +++ b/src/mesa/main/formats.c @@ -65,6 +65,8 @@ struct gl_format_info GLubyte DepthBits; GLubyte StencilBits; + bool IsSRGBFormat; + /** * To describe compressed formats. If not compressed, Width=Height=1. */ @@ -553,30 +555,8 @@ _mesa_is_format_color_format(mesa_format format) GLenum _mesa_get_format_color_encoding(mesa_format format) { - /* XXX this info should be encoded in gl_format_info */ - switch (format) { - case MESA_FORMAT_BGR_SRGB8: - case MESA_FORMAT_A8B8G8R8_SRGB: - case MESA_FORMAT_B8G8R8A8_SRGB: - case MESA_FORMAT_A8R8G8B8_SRGB: - case MESA_FORMAT_R8G8B8A8_SRGB: - case MESA_FORMAT_L_SRGB8: - case MESA_FORMAT_L8A8_SRGB: - case MESA_FORMAT_A8L8_SRGB: - case MESA_FORMAT_SRGB_DXT1: - case MESA_FORMAT_SRGBA_DXT1: - case MESA_FORMAT_SRGBA_DXT3: - case MESA_FORMAT_SRGBA_DXT5: - case MESA_FORMAT_R8G8B8X8_SRGB: - case MESA_FORMAT_ETC2_SRGB8: - case MESA_FORMAT_ETC2_SRGB8_ALPHA8_EAC: - case MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1: - case MESA_FORMAT_B8G8R8X8_SRGB: - case MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM: - return GL_SRGB; - default: - return GL_LINEAR; - } + const struct gl_format_info *info = _mesa_get_format_info(format); + return info->IsSRGBFormat ? GL_SRGB : GL_LINEAR; } -- 2.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 00/18] Enable support for 2D ASTC (LDR and HDR modes) in SKL
From: Nanley Chery This patch series enables support for the KHR_texture_compression_astc_{ldr,hdr} extensions on Skylake machines. This revision includes developer suggestions and fixes rendering issues on previously untested systems. The sRGB issues were fixed and determined to be unrelated to this patchset. The Piglit tests for this extension can be found here: cgit.freedesktop.org/~nchery/piglit Nanley Chery (18): mesa/formats: define the 2D ASTC formats glapi: add support for KHR_texture_compression_astc_ldr mesa: disable online compression for ASTC formats mesa: return bool instead of GLboolean in compressedteximage_only_format() mesa: add ASTC extensions to the extensions table mesa/glformats: recognize ASTC formats as compressed mesa/texcompress: enable translation between MESA and GL ASTC formats mesa/teximage: return the base internal format of the ASTC formats mesa/formats: store whether or not a format is sRGB in gl_format_info i965/surface_formats: add support for 2D ASTC surface formats mesa/macros: add power-of-two assertions for alignment macros mesa/macros: move ALIGN_NPOT to macros.h i965: use ALIGN_NPOT for setting ASTC mipmap layouts i965: correct mt->align_h for 2D textures on Skylake i965: change the meaning of cpp for compressed textures i965: enable ASTC support for Skylake i965: refactor miptree alignment calculation code swrast: add a new macro, FETCH_COMPRESSED .../glapi/gen/KHR_texture_compression_astc.xml | 40 +++ src/mapi/glapi/gen/Makefile.am | 1 + src/mapi/glapi/gen/gl_API.xml | 2 +- src/mesa/drivers/dri/i965/brw_defines.h| 32 +++ src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 2 +- src/mesa/drivers/dri/i965/brw_surface_formats.c| 80 ++ src/mesa/drivers/dri/i965/brw_tex_layout.c | 105 src/mesa/drivers/dri/i965/intel_copy_image.c | 19 +- src/mesa/drivers/dri/i965/intel_extensions.c | 5 + src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 15 +- src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 2 +- src/mesa/drivers/dri/i965/intel_upload.c | 6 - src/mesa/main/extensions.c | 2 + src/mesa/main/format_info.py | 5 + src/mesa/main/formats.c| 158 ++-- src/mesa/main/formats.csv | 31 +++ src/mesa/main/formats.h| 30 +++ src/mesa/main/glformats.c | 29 +++ src/mesa/main/macros.h | 22 +- src/mesa/main/mtypes.h | 2 + src/mesa/main/texcompress.c| 136 +++ src/mesa/main/teximage.c | 70 +- src/mesa/swrast/s_texfetch.c | 269 ++--- 23 files changed, 736 insertions(+), 327 deletions(-) create mode 100644 src/mapi/glapi/gen/KHR_texture_compression_astc.xml -- 2.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 01/18] mesa/formats: define the 2D ASTC formats
From: Nanley Chery Includes definition of the formats, updates to functions likely to be used, as well as changes necessary for compilation. Reviewed-by: Anuj Phogat Signed-off-by: Nanley Chery --- src/mesa/main/format_info.py | 3 + src/mesa/main/formats.c | 130 +++ src/mesa/main/formats.csv| 31 +++ src/mesa/main/formats.h | 30 ++ src/mesa/swrast/s_texfetch.c | 32 ++- 5 files changed, 225 insertions(+), 1 deletion(-) diff --git a/src/mesa/main/format_info.py b/src/mesa/main/format_info.py index 3bae57e..40104a2 100644 --- a/src/mesa/main/format_info.py +++ b/src/mesa/main/format_info.py @@ -130,6 +130,9 @@ def get_channel_bits(fmat, chan_name): elif fmat.layout == 'bptc': bits = 16 if fmat.name.endswith('_FLOAT') else 8 return bits if fmat.has_channel(chan_name) else 0 + elif fmat.layout == 'astc': + bits = 16 if fmat.name.endswith('_RGBA') else 8 + return bits if fmat.has_channel(chan_name) else 0 else: assert False else: diff --git a/src/mesa/main/formats.c b/src/mesa/main/formats.c index baeb1bf..745fd8c 100644 --- a/src/mesa/main/formats.c +++ b/src/mesa/main/formats.c @@ -667,6 +667,48 @@ _mesa_get_srgb_format_linear(mesa_format format) case MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM: format = MESA_FORMAT_BPTC_RGBA_UNORM; break; + case MESA_FORMAT_ASTC_4x4_SRGB8_ALPHA8: + format = MESA_FORMAT_ASTC_4x4_RGBA; + break; + case MESA_FORMAT_ASTC_5x4_SRGB8_ALPHA8: + format = MESA_FORMAT_ASTC_5x4_RGBA; + break; + case MESA_FORMAT_ASTC_5x5_SRGB8_ALPHA8: + format = MESA_FORMAT_ASTC_5x5_RGBA; + break; + case MESA_FORMAT_ASTC_6x5_SRGB8_ALPHA8: + format = MESA_FORMAT_ASTC_6x5_RGBA; + break; + case MESA_FORMAT_ASTC_6x6_SRGB8_ALPHA8: + format = MESA_FORMAT_ASTC_6x6_RGBA; + break; + case MESA_FORMAT_ASTC_8x5_SRGB8_ALPHA8: + format = MESA_FORMAT_ASTC_8x5_RGBA; + break; + case MESA_FORMAT_ASTC_8x6_SRGB8_ALPHA8: + format = MESA_FORMAT_ASTC_8x6_RGBA; + break; + case MESA_FORMAT_ASTC_8x8_SRGB8_ALPHA8: + format = MESA_FORMAT_ASTC_8x8_RGBA; + break; + case MESA_FORMAT_ASTC_10x5_SRGB8_ALPHA8: + format = MESA_FORMAT_ASTC_10x5_RGBA; + break; + case MESA_FORMAT_ASTC_10x6_SRGB8_ALPHA8: + format = MESA_FORMAT_ASTC_10x6_RGBA; + break; + case MESA_FORMAT_ASTC_10x8_SRGB8_ALPHA8: + format = MESA_FORMAT_ASTC_10x8_RGBA; + break; + case MESA_FORMAT_ASTC_10x10_SRGB8_ALPHA8: + format = MESA_FORMAT_ASTC_10x10_RGBA; + break; + case MESA_FORMAT_ASTC_12x10_SRGB8_ALPHA8: + format = MESA_FORMAT_ASTC_12x10_RGBA; + break; + case MESA_FORMAT_ASTC_12x12_SRGB8_ALPHA8: + format = MESA_FORMAT_ASTC_12x12_RGBA; + break; case MESA_FORMAT_B8G8R8X8_SRGB: format = MESA_FORMAT_B8G8R8X8_UNORM; break; @@ -741,6 +783,36 @@ _mesa_get_uncompressed_format(mesa_format format) case MESA_FORMAT_BPTC_RGB_UNSIGNED_FLOAT: case MESA_FORMAT_BPTC_RGB_SIGNED_FLOAT: return MESA_FORMAT_RGB_FLOAT32; + case MESA_FORMAT_ASTC_4x4_RGBA: + case MESA_FORMAT_ASTC_5x4_RGBA: + case MESA_FORMAT_ASTC_5x5_RGBA: + case MESA_FORMAT_ASTC_6x5_RGBA: + case MESA_FORMAT_ASTC_6x6_RGBA: + case MESA_FORMAT_ASTC_8x5_RGBA: + case MESA_FORMAT_ASTC_8x6_RGBA: + case MESA_FORMAT_ASTC_8x8_RGBA: + case MESA_FORMAT_ASTC_10x5_RGBA: + case MESA_FORMAT_ASTC_10x6_RGBA: + case MESA_FORMAT_ASTC_10x8_RGBA: + case MESA_FORMAT_ASTC_10x10_RGBA: + case MESA_FORMAT_ASTC_12x10_RGBA: + case MESA_FORMAT_ASTC_12x12_RGBA: + return MESA_FORMAT_RGBA_FLOAT16; + case MESA_FORMAT_ASTC_4x4_SRGB8_ALPHA8: + case MESA_FORMAT_ASTC_5x4_SRGB8_ALPHA8: + case MESA_FORMAT_ASTC_5x5_SRGB8_ALPHA8: + case MESA_FORMAT_ASTC_6x5_SRGB8_ALPHA8: + case MESA_FORMAT_ASTC_6x6_SRGB8_ALPHA8: + case MESA_FORMAT_ASTC_8x5_SRGB8_ALPHA8: + case MESA_FORMAT_ASTC_8x6_SRGB8_ALPHA8: + case MESA_FORMAT_ASTC_8x8_SRGB8_ALPHA8: + case MESA_FORMAT_ASTC_10x5_SRGB8_ALPHA8: + case MESA_FORMAT_ASTC_10x6_SRGB8_ALPHA8: + case MESA_FORMAT_ASTC_10x8_SRGB8_ALPHA8: + case MESA_FORMAT_ASTC_10x10_SRGB8_ALPHA8: + case MESA_FORMAT_ASTC_12x10_SRGB8_ALPHA8: + case MESA_FORMAT_ASTC_12x12_SRGB8_ALPHA8: + return MESA_FORMAT_A8B8G8R8_SRGB; default: #ifdef DEBUG assert(!_mesa_is_format_compressed(format)); @@ -1253,6 +1325,34 @@ _mesa_format_to_type_and_comps(mesa_format format, case MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM: case MESA_FORMAT_BPTC_RGB_SIGNED_FLOAT: case MESA_FORMAT_BPTC_RGB_UNSIGNED_FLOAT: + case MESA_FORMAT_ASTC_4x4_RGBA: + case MESA_FORMAT_ASTC_5x4_RGBA: + case MESA_FORMAT_ASTC_5x5_RGBA: + case MESA_FORMAT_ASTC_6x5_RGBA: + case MESA_FORMAT_ASTC_6x6_RGBA: + case MESA_FORMAT_ASTC_8x5_RGBA: + case MESA_FORMAT_ASTC_8x6_RGBA: + case MESA_FORMAT_ASTC_8x8_RGBA: + case MESA_FORMAT_ASTC_10x5_RGBA: + case MESA
[Mesa-dev] [PATCH v3 14/18] i965: correct mt->align_h for 2D textures on Skylake
From: Nanley Chery In agreement with commit 4ab8d59a23, vertical alignment values are equal to four times the block height on Gen9+. v2: add newlines to separate declarations, statments, and comments. Reviewed-by: Anuj Phogat Reviewed-by: Neil Roberts Signed-off-by: Nanley Chery --- src/mesa/drivers/dri/i965/brw_tex_layout.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c b/src/mesa/drivers/dri/i965/brw_tex_layout.c index 4007697..ade2940 100644 --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c @@ -270,9 +270,14 @@ intel_vertical_texture_alignment_unit(struct brw_context *brw, * Where "*" means either VALIGN_2 or VALIGN_4 depending on the setting of * the SURFACE_STATE "Surface Vertical Alignment" field. */ - if (_mesa_is_format_compressed(mt->format)) - /* See comment above for the horizontal alignment */ - return brw->gen >= 9 ? 16 : 4; +if (_mesa_is_format_compressed(mt->format)) { + unsigned int i, j; + + _mesa_get_format_block_size(mt->format, &i, &j); + + /* See comment above for the horizontal alignment */ + return brw->gen >= 9 ? j * 4 : 4; +} if (mt->format == MESA_FORMAT_S_UINT8) return brw->gen >= 7 ? 8 : 4; -- 2.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 17/18] i965: refactor miptree alignment calculation code
From: Nanley Chery Remove redundant checks and comments by grouping our calculations for align_w and align_h wherever possible. v2: reintroduce brw. don't include functional changes. don't adjust function parameters or create a new function. Signed-off-by: Nanley Chery --- src/mesa/drivers/dri/i965/brw_tex_layout.c | 85 +++--- 1 file changed, 30 insertions(+), 55 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c b/src/mesa/drivers/dri/i965/brw_tex_layout.c index 840a069..493ed4f 100644 --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c @@ -123,12 +123,6 @@ intel_horizontal_texture_alignment_unit(struct brw_context *brw, return 16; /** -* From the "Alignment Unit Size" section of various specs, namely: -* - Gen3 Spec: "Memory Data Formats" Volume, Section 1.20.1.4 -* - i965 and G45 PRMs: Volume 1, Section 6.17.3.4. -* - Ironlake and Sandybridge PRMs: Volume 1, Part 1, Section 7.18.3.4 -* - BSpec (for Ivybridge and slight variations in separate stencil) -* * +--+ * || alignment unit width ("i") | * | Surface Property |-| @@ -146,32 +140,6 @@ intel_horizontal_texture_alignment_unit(struct brw_context *brw, * On IVB+, non-special cases can be overridden by setting the SURFACE_STATE * "Surface Horizontal Alignment" field to HALIGN_4 or HALIGN_8. */ -if (_mesa_is_format_compressed(mt->format)) { - /* The hardware alignment requirements for compressed textures -* happen to match the block boundaries. -*/ - unsigned int i, j; - _mesa_get_format_block_size(mt->format, &i, &j); - - /* On Gen9+ we can pick our own alignment for compressed textures but it - * has to be a multiple of the block size. The minimum alignment we can - * pick is 4 so we effectively have to align to 4 times the block - * size - */ - if (brw->gen >= 9) - return i * 4; - else - return i; -} - - if (mt->format == MESA_FORMAT_S_UINT8) - return 8; - - if (brw->gen >= 9 && mt->tr_mode != INTEL_MIPTREE_TRMODE_NONE) { - uint32_t align = tr_mode_horizontal_texture_alignment(brw, mt); - /* XY_FAST_COPY_BLT doesn't support horizontal alignment < 32. */ - return align < 32 ? 32 : align; - } if (brw->gen >= 7 && mt->format == MESA_FORMAT_Z_UNORM16) return 8; @@ -248,12 +216,6 @@ intel_vertical_texture_alignment_unit(struct brw_context *brw, const struct intel_mipmap_tree *mt) { /** -* From the "Alignment Unit Size" section of various specs, namely: -* - Gen3 Spec: "Memory Data Formats" Volume, Section 1.20.1.4 -* - i965 and G45 PRMs: Volume 1, Section 6.17.3.4. -* - Ironlake and Sandybridge PRMs: Volume 1, Part 1, Section 7.18.3.4 -* - BSpec (for Ivybridge and slight variations in separate stencil) -* * +--+ * || alignment unit height ("j") | * | Surface Property |-| @@ -270,23 +232,6 @@ intel_vertical_texture_alignment_unit(struct brw_context *brw, * Where "*" means either VALIGN_2 or VALIGN_4 depending on the setting of * the SURFACE_STATE "Surface Vertical Alignment" field. */ -if (_mesa_is_format_compressed(mt->format)) { - unsigned int i, j; - - _mesa_get_format_block_size(mt->format, &i, &j); - - /* See comment above for the horizontal alignment */ - return brw->gen >= 9 ? j * 4 : 4; -} - - if (mt->format == MESA_FORMAT_S_UINT8) - return brw->gen >= 7 ? 8 : 4; - - if (mt->tr_mode != INTEL_MIPTREE_TRMODE_NONE) { - uint32_t align = tr_mode_vertical_texture_alignment(brw, mt); - /* XY_FAST_COPY_BLT doesn't support vertical alignment < 64 */ - return align < 64 ? 64 : align; - } /* Broadwell only supports VALIGN of 4, 8, and 16. The BSpec says 4 * should always be used, except for stencil buffers, which should be 8. @@ -780,6 +725,13 @@ brw_miptree_layout(struct brw_context *brw, mt->tr_mode = INTEL_MIPTREE_TRMODE_NONE; + /** +* From the "Alignment Unit Size" section of various specs, namely: +* - Gen3 Spec: "Memory Data Formats" Volume, Section 1.20.1.4 +* - i965 and G45 PRMs: Volume 1, Section 6.17.3.4. +* - Ironlake and Sandybridge PRMs: Volume 1, Part 1, Section 7.18.3.4 +* - BSpec (for Ivybridge and slight variations in separate stencil) +*/ if (brw->gen == 6 && mt->array_layout == ALL_SLICES_AT_EACH_LOD) { const GLenum base_format = _mesa_
[Mesa-dev] [PATCH v3 11/18] mesa/macros: add power-of-two assertions for alignment macros
From: Nanley Chery ALIGN and ROUND_DOWN_TO both require that the alignment value passed into the macro be a power of two in the comments. Using software assertions verifies this to be the case. v2: use static inline functions instead of gcc-specific statement expressions. Signed-off-by: Nanley Chery --- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 2 +- src/mesa/main/macros.h | 16 +--- 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index 59081ea..1a57784 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -134,7 +134,7 @@ fs_visitor::nir_setup_outputs(nir_shader *shader) : var->type->vector_elements; if (stage == MESA_SHADER_VERTEX) { - for (int i = 0; i < ALIGN(type_size(var->type), 4) / 4; i++) { + for (unsigned int i = 0; i < ALIGN(type_size(var->type), 4) / 4; i++) { int output = var->data.location + i; this->outputs[output] = offset(reg, 4 * i); this->output_components[output] = vector_elements; diff --git a/src/mesa/main/macros.h b/src/mesa/main/macros.h index 0608650..4a640ad 100644 --- a/src/mesa/main/macros.h +++ b/src/mesa/main/macros.h @@ -684,7 +684,7 @@ minify(unsigned value, unsigned levels) * Note that this considers 0 a power of two. */ static inline bool -is_power_of_two(unsigned value) +is_power_of_two(uintptr_t value) { return (value & (value - 1)) == 0; } @@ -700,7 +700,12 @@ is_power_of_two(unsigned value) * * \sa ROUND_DOWN_TO() */ -#define ALIGN(value, alignment) (((value) + (alignment) - 1) & ~((alignment) - 1)) +static inline uintptr_t +ALIGN(uintptr_t value, uintptr_t alignment) +{ + assert(is_power_of_two(alignment)); + return (((value) + (alignment) - 1) & ~((alignment) - 1)); +} /** * Align a value down to an alignment value @@ -713,7 +718,12 @@ is_power_of_two(unsigned value) * * \sa ALIGN() */ -#define ROUND_DOWN_TO(value, alignment) ((value) & ~(alignment - 1)) +static inline uintptr_t +ROUND_DOWN_TO(uintptr_t value, uintptr_t alignment) +{ + assert(is_power_of_two(alignment)); + return ((value) & ~(alignment - 1)); +} /** Cross product of two 3-element vectors */ -- 2.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 08/18] mesa/teximage: return the base internal format of the ASTC formats
From: Nanley Chery This is necesary to initialize the gl_texture_image struct. From the KHR_texture_compression_astc_ldr spec: "Added to Section 3.8.6, Compressed Texture Images Add the tokens specified above to Table 3.16, Compressed Internal Formats. In all cases, the base internal format will be RGBA. The encoding allows images to be encoded with fewer channels, but this is always presented as RGBA to the sampler." Reviewed-by: Anuj Phogat Signed-off-by: Nanley Chery --- src/mesa/main/teximage.c | 36 1 file changed, 36 insertions(+) diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c index 0e0488a..8de0c11 100644 --- a/src/mesa/main/teximage.c +++ b/src/mesa/main/teximage.c @@ -565,6 +565,42 @@ _mesa_base_tex_format( struct gl_context *ctx, GLint internalFormat ) } } + if (ctx->Extensions.KHR_texture_compression_astc_ldr) { + switch (internalFormat) { + case GL_COMPRESSED_RGBA_ASTC_4x4_KHR: + case GL_COMPRESSED_RGBA_ASTC_5x4_KHR: + case GL_COMPRESSED_RGBA_ASTC_5x5_KHR: + case GL_COMPRESSED_RGBA_ASTC_6x5_KHR: + case GL_COMPRESSED_RGBA_ASTC_6x6_KHR: + case GL_COMPRESSED_RGBA_ASTC_8x5_KHR: + case GL_COMPRESSED_RGBA_ASTC_8x6_KHR: + case GL_COMPRESSED_RGBA_ASTC_8x8_KHR: + case GL_COMPRESSED_RGBA_ASTC_10x5_KHR: + case GL_COMPRESSED_RGBA_ASTC_10x6_KHR: + case GL_COMPRESSED_RGBA_ASTC_10x8_KHR: + case GL_COMPRESSED_RGBA_ASTC_10x10_KHR: + case GL_COMPRESSED_RGBA_ASTC_12x10_KHR: + case GL_COMPRESSED_RGBA_ASTC_12x12_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_4x4_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_5x4_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_5x5_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_6x5_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_6x6_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_8x5_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_8x6_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_8x8_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_10x5_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_10x6_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_10x8_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_10x10_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_12x10_KHR: + case GL_COMPRESSED_SRGB8_ALPHA8_ASTC_12x12_KHR: + return GL_RGBA; + default: + ; /* fallthrough */ + } + } + if (_mesa_is_gles3(ctx) || ctx->Extensions.ARB_ES3_compatibility) { switch (internalFormat) { case GL_COMPRESSED_RGB8_ETC2: -- 2.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 16/18] i965: enable ASTC support for Skylake
From: Nanley Chery v2: remove OES ASTC extension reference. Signed-off-by: Nanley Chery --- src/mesa/drivers/dri/i965/intel_extensions.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/drivers/dri/i965/intel_extensions.c index 365b4b8..cc793e5 100644 --- a/src/mesa/drivers/dri/i965/intel_extensions.c +++ b/src/mesa/drivers/dri/i965/intel_extensions.c @@ -354,6 +354,11 @@ intelInitExtensions(struct gl_context *ctx) ctx->Extensions.ARB_stencil_texturing = true; } + if (brw->gen >= 9) { + ctx->Extensions.KHR_texture_compression_astc_ldr = true; + ctx->Extensions.KHR_texture_compression_astc_hdr = true; + } + if (ctx->API == API_OPENGL_CORE) ctx->Extensions.ARB_base_instance = true; if (ctx->API != API_OPENGL_CORE) -- 2.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 12/18] mesa/macros: move ALIGN_NPOT to macros.h
From: Nanley Chery Aligning with a non-power-of-two number is a general task that can be used in various places. This commit is required for the next one. Signed-off-by: Nanley Chery --- src/mesa/drivers/dri/i965/intel_upload.c | 6 -- src/mesa/main/macros.h | 6 ++ 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_upload.c b/src/mesa/drivers/dri/i965/intel_upload.c index 870aabc..deaae6c 100644 --- a/src/mesa/drivers/dri/i965/intel_upload.c +++ b/src/mesa/drivers/dri/i965/intel_upload.c @@ -44,12 +44,6 @@ #define INTEL_UPLOAD_SIZE (64*1024) -/** - * Like ALIGN(), but works with a non-power-of-two alignment. - */ -#define ALIGN_NPOT(value, alignment) \ - (((value) + (alignment) - 1) / (alignment) * (alignment)) - void intel_upload_finish(struct brw_context *brw) { diff --git a/src/mesa/main/macros.h b/src/mesa/main/macros.h index 4a640ad..4a08130 100644 --- a/src/mesa/main/macros.h +++ b/src/mesa/main/macros.h @@ -708,6 +708,12 @@ ALIGN(uintptr_t value, uintptr_t alignment) } /** + * Like ALIGN(), but works with a non-power-of-two alignment. + */ +#define ALIGN_NPOT(value, alignment) \ + (((value) + (alignment) - 1) / (alignment) * (alignment)) + +/** * Align a value down to an alignment value * * If \c value is not already aligned to the requested alignment value, it -- 2.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 18/18] swrast: add a new macro, FETCH_COMPRESSED
From: Nanley Chery This patch creates a new macro, FETCH_COMPRESSED - similar in nature to the other FETCH_* macros. This reduces repetition in the code that deals with compressed textures. Reviewed-by: Anuj Phogat Signed-off-by: Nanley Chery --- src/mesa/swrast/s_texfetch.c | 239 --- 1 file changed, 41 insertions(+), 198 deletions(-) diff --git a/src/mesa/swrast/s_texfetch.c b/src/mesa/swrast/s_texfetch.c index 14e5293..92a4a37 100644 --- a/src/mesa/swrast/s_texfetch.c +++ b/src/mesa/swrast/s_texfetch.c @@ -116,6 +116,14 @@ static void fetch_null_texelf( const struct swrast_texture_image *texImage, NULL \ } +#define FETCH_COMPRESSED(NAME) \ + {\ + MESA_FORMAT_ ## NAME, \ + fetch_compressed, \ + fetch_compressed, \ + fetch_compressed \ + } + /** * Table to map MESA_FORMAT_ to texel fetch/store funcs. */ @@ -344,214 +352,49 @@ texfetch_funcs[] = FETCH_NULL(RGBX_SINT32), /* DXT compressed formats */ - { - MESA_FORMAT_RGB_DXT1, - fetch_compressed, - fetch_compressed, - fetch_compressed - }, - { - MESA_FORMAT_RGBA_DXT1, - fetch_compressed, - fetch_compressed, - fetch_compressed - }, - { - MESA_FORMAT_RGBA_DXT3, - fetch_compressed, - fetch_compressed, - fetch_compressed - }, - { - MESA_FORMAT_RGBA_DXT5, - fetch_compressed, - fetch_compressed, - fetch_compressed - }, + FETCH_COMPRESSED(RGB_DXT1), + FETCH_COMPRESSED(RGBA_DXT1), + FETCH_COMPRESSED(RGBA_DXT3), + FETCH_COMPRESSED(RGBA_DXT5), /* DXT sRGB compressed formats */ - { - MESA_FORMAT_SRGB_DXT1, - fetch_compressed, - fetch_compressed, - fetch_compressed - }, - { - MESA_FORMAT_SRGBA_DXT1, - fetch_compressed, - fetch_compressed, - fetch_compressed - }, - { - MESA_FORMAT_SRGBA_DXT3, - fetch_compressed, - fetch_compressed, - fetch_compressed - }, - { - MESA_FORMAT_SRGBA_DXT5, - fetch_compressed, - fetch_compressed, - fetch_compressed - }, + FETCH_COMPRESSED(SRGB_DXT1), + FETCH_COMPRESSED(SRGBA_DXT1), + FETCH_COMPRESSED(SRGBA_DXT3), + FETCH_COMPRESSED(SRGBA_DXT5), /* FXT1 compressed formats */ - { - MESA_FORMAT_RGB_FXT1, - fetch_compressed, - fetch_compressed, - fetch_compressed - }, - { - MESA_FORMAT_RGBA_FXT1, - fetch_compressed, - fetch_compressed, - fetch_compressed - }, + FETCH_COMPRESSED(RGB_FXT1), + FETCH_COMPRESSED(RGBA_FXT1), /* RGTC compressed formats */ - { - MESA_FORMAT_R_RGTC1_UNORM, - fetch_compressed, - fetch_compressed, - fetch_compressed - }, - { - MESA_FORMAT_R_RGTC1_SNORM, - fetch_compressed, - fetch_compressed, - fetch_compressed - }, - { - MESA_FORMAT_RG_RGTC2_UNORM, - fetch_compressed, - fetch_compressed, - fetch_compressed - }, - { - MESA_FORMAT_RG_RGTC2_SNORM, - fetch_compressed, - fetch_compressed, - fetch_compressed - }, + FETCH_COMPRESSED(R_RGTC1_UNORM), + FETCH_COMPRESSED(R_RGTC1_SNORM), + FETCH_COMPRESSED(RG_RGTC2_UNORM), + FETCH_COMPRESSED(RG_RGTC2_SNORM), /* LATC1/2 compressed formats */ - { - MESA_FORMAT_L_LATC1_UNORM, - fetch_compressed, - fetch_compressed, - fetch_compressed - }, - { - MESA_FORMAT_L_LATC1_SNORM, - fetch_compressed, - fetch_compressed, - fetch_compressed - }, - { - MESA_FORMAT_LA_LATC2_UNORM, - fetch_compressed, - fetch_compressed, - fetch_compressed - }, - { - MESA_FORMAT_LA_LATC2_SNORM, - fetch_compressed, - fetch_compressed, - fetch_compressed - }, + FETCH_COMPRESSED(L_LATC1_UNORM), + FETCH_COMPRESSED(L_LATC1_SNORM), + FETCH_COMPRESSED(LA_LATC2_UNORM), + FETCH_COMPRESSED(LA_LATC2_SNORM), /* ETC1/2 compressed formats */ - { - MESA_FORMAT_ETC1_RGB8, - fetch_compressed, - fetch_compressed, - fetch_compressed - }, - { - MESA_FORMAT_ETC2_RGB8, - fetch_compressed, - fetch_compressed, - fetch_compressed - }, - { - MESA_FORMAT_ETC2_SRGB8, - fetch_compressed, - fetch_compressed, - fetch_compressed - }, - { - MESA_FORMAT_ETC2_RGBA8_EAC, - fetch_compressed, - fetch_compressed, - fetch_compressed - }, - { - MESA_FORMAT_ETC2_SRGB8_ALPHA8_EAC, - fetch_compressed, - fetch_compressed, - fetch_compressed - }, - { - MESA_FORMAT_ETC2_R11_EAC, - fetch_compressed, - fetch_compressed, - fetch_compressed - }, - { - MESA_FORMAT_ETC2_RG11_EAC, - fetch_compressed, - fetch_compressed, - fetch_compressed - }, - { - MESA_FORMAT_ETC2_SIGNED_R11_EAC, - fetch_compressed, - fetch_compressed, -
[Mesa-dev] [PATCH v3 13/18] i965: use ALIGN_NPOT for setting ASTC mipmap layouts
From: Nanley Chery ALIGN is changed to ALIGN_NPOT because alignment values are sometimes not powers of two when working with ASTC. Signed-off-by: Nanley Chery --- src/mesa/drivers/dri/i965/brw_tex_layout.c| 12 ++-- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 4 ++-- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c b/src/mesa/drivers/dri/i965/brw_tex_layout.c index 998d8c4..4007697 100644 --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c @@ -367,7 +367,7 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt) mt->total_width = mt->physical_width0; if (mt->compressed) { - mt->total_width = ALIGN(mt->physical_width0, mt->align_w); + mt->total_width = ALIGN_NPOT(mt->physical_width0, mt->align_w); } /* May need to adjust width to accommodate the placement of @@ -379,10 +379,10 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt) unsigned mip1_width; if (mt->compressed) { - mip1_width = ALIGN(minify(mt->physical_width0, 1), mt->align_w) + - ALIGN(minify(mt->physical_width0, 2), bw); + mip1_width = ALIGN_NPOT(minify(mt->physical_width0, 1), mt->align_w) + + ALIGN_NPOT(minify(mt->physical_width0, 2), bw); } else { - mip1_width = ALIGN(minify(mt->physical_width0, 1), mt->align_w) + + mip1_width = ALIGN_NPOT(minify(mt->physical_width0, 1), mt->align_w) + minify(mt->physical_width0, 2); } @@ -398,7 +398,7 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt) intel_miptree_set_level_info(mt, level, x, y, depth); - img_height = ALIGN(height, mt->align_h); + img_height = ALIGN_NPOT(height, mt->align_h); if (mt->compressed) img_height /= bh; @@ -415,7 +415,7 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt) /* Layout_below: step right after second mipmap. */ if (level == mt->first_level + 1) { -x += ALIGN(width, mt->align_w); +x += ALIGN_NPOT(width, mt->align_w); } else { y += img_height; } diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index 6aa969a..b47f49d0 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -1213,8 +1213,8 @@ intel_miptree_copy_slice(struct brw_context *brw, if (dst_mt->compressed) { unsigned int i, j; _mesa_get_format_block_size(dst_mt->format, &i, &j); - height = ALIGN(height, j) / j; - width = ALIGN(width, i); + height = ALIGN_NPOT(height, j) / j; + width = ALIGN_NPOT(width, i); } /* If it's a packed depth/stencil buffer with separate stencil, the blit -- 2.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 10/18] i965/surface_formats: add support for 2D ASTC surface formats
From: Nanley Chery Intel surface formats default to LDR unless there is hardware support for HDR and the texture is able to be processed in HDR mode. v2: remove extra newlines. v3: follow existing coding style in translate_tex_format(). Signed-off-by: Nanley Chery --- src/mesa/drivers/dri/i965/brw_defines.h | 32 ++ src/mesa/drivers/dri/i965/brw_surface_formats.c | 80 + 2 files changed, 112 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index bfcc442..da5d434 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -504,6 +504,38 @@ #define BRW_SURFACEFORMAT_R8G8B8_UINT0x1C8 #define BRW_SURFACEFORMAT_R8G8B8_SINT0x1C9 #define BRW_SURFACEFORMAT_RAW0x1FF + +#define GEN9_SURFACE_ASTC_HDR_FORMAT_BIT 0x100 + +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_4x4_U8sRGB 0x200 +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_5x4_U8sRGB 0x208 +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_5x5_U8sRGB 0x209 +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_6x5_U8sRGB 0x211 +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_6x6_U8sRGB 0x212 +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_8x5_U8sRGB 0x221 +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_8x6_U8sRGB 0x222 +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_8x8_U8sRGB 0x224 +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_10x5_U8sRGB0x231 +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_10x6_U8sRGB0x232 +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_10x8_U8sRGB0x234 +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_10x10_U8sRGB 0x236 +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_12x10_U8sRGB 0x23E +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_12x12_U8sRGB 0x23F +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_4x4_FLT16 0x240 +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_5x4_FLT16 0x248 +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_5x5_FLT16 0x249 +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_6x5_FLT16 0x251 +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_6x6_FLT16 0x252 +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_8x5_FLT16 0x261 +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_8x6_FLT16 0x262 +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_8x8_FLT16 0x264 +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_10x5_FLT16 0x271 +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_10x6_FLT16 0x272 +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_10x8_FLT16 0x274 +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_10x10_FLT160x276 +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_12x10_FLT160x27E +#define BRW_SURFACEFORMAT_ASTC_LDR_2D_12x12_FLT160x27F + #define BRW_SURFACE_FORMAT_SHIFT 18 #define BRW_SURFACE_FORMAT_MASKINTEL_MASK(26, 18) diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c b/src/mesa/drivers/dri/i965/brw_surface_formats.c index 0501606..a896b79 100644 --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c @@ -307,6 +307,34 @@ const struct surface_format_info surface_formats[] = { SF( x, x, x, x, x, x, x, x, x, ETC2_EAC_SRGB8_A8) SF( x, x, x, x, x, x, x, x, x, R8G8B8_UINT) SF( x, x, x, x, x, x, x, x, x, R8G8B8_SINT) + SF(90, 90, x, x, x, x, x, x, x, ASTC_LDR_2D_4x4_FLT16) + SF(90, 90, x, x, x, x, x, x, x, ASTC_LDR_2D_5x4_FLT16) + SF(90, 90, x, x, x, x, x, x, x, ASTC_LDR_2D_5x5_FLT16) + SF(90, 90, x, x, x, x, x, x, x, ASTC_LDR_2D_6x5_FLT16) + SF(90, 90, x, x, x, x, x, x, x, ASTC_LDR_2D_6x6_FLT16) + SF(90, 90, x, x, x, x, x, x, x, ASTC_LDR_2D_8x5_FLT16) + SF(90, 90, x, x, x, x, x, x, x, ASTC_LDR_2D_8x6_FLT16) + SF(90, 90, x, x, x, x, x, x, x, ASTC_LDR_2D_8x8_FLT16) + SF(90, 90, x, x, x, x, x, x, x, ASTC_LDR_2D_10x5_FLT16) + SF(90, 90, x, x, x, x, x, x, x, ASTC_LDR_2D_10x6_FLT16) + SF(90, 90, x, x, x, x, x, x, x, ASTC_LDR_2D_10x8_FLT16) + SF(90, 90, x, x, x, x, x, x, x, ASTC_LDR_2D_10x10_FLT16) + SF(90, 90, x, x, x, x, x, x, x, ASTC_LDR_2D_12x10_FLT16) + SF(90, 90, x, x, x, x, x, x, x, ASTC_LDR_2D_12x12_FLT16) + SF(90, 90, x, x, x, x, x, x, x, ASTC_LDR_2D_4x4_U8sRGB) + SF(90, 90, x, x, x, x, x, x, x, ASTC_LDR_2D_5x4_U8sRGB) + SF(90, 90, x, x, x, x, x, x, x, ASTC_LDR_2D_5x5_U8sRGB) + SF(90, 90, x, x, x, x, x, x, x, ASTC_LDR_2D_6x5_U8sRGB) + SF(90, 90, x, x, x, x, x, x, x, ASTC_LDR_2D_6x6_U8sRGB) + SF(90, 90, x, x, x, x, x, x, x, ASTC_LDR_2D_8x5_U8sRGB) + SF(90, 90, x, x, x, x, x, x, x, ASTC_LDR_2D_8x6_U8sRGB) + SF(90, 90, x, x, x, x, x, x, x, ASTC_LDR_2D_8x8_U8sRGB) + SF(90, 90, x, x, x, x, x, x, x, ASTC_LDR_2D_10x5_U8sRGB) + SF(90, 90, x, x, x, x, x, x, x, ASTC_LDR_2D_10x6_U8sRGB) + SF(90, 90, x, x, x, x, x
[Mesa-dev] [PATCH v3 15/18] i965: change the meaning of cpp for compressed textures
From: Nanley Chery An ASTC block takes up 16 bytes for all block width and height configurations. This size is not integrally divisible by all ASTC block widths. Therefore cpp is changed to mean bytes per block if the texture is compressed. Because the original definition was bytes per block divided by block width, all references to the mipmap width must be divided the block width. This keeps the address calculation formulas consistent. For example, the units for miptree_level x_offset and miptree total_width has changed from pixels to blocks. v2: reuse preexisting ALIGN_NPOT macro located in an i965 driver file. v3: move ALIGN_NPOT into seperate commit. simplify cpp assignment in copy_image_with_blitter(). update miptree width and offset variables in: intel_miptree_copy_slice(), intel_miptree_map_gtt(), and brw_miptree_layout_texture_3d(). Signed-off-by: Nanley Chery --- src/mesa/drivers/dri/i965/brw_tex_layout.c| 15 +-- src/mesa/drivers/dri/i965/intel_copy_image.c | 19 +-- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 13 +++-- src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 2 +- 4 files changed, 14 insertions(+), 35 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c b/src/mesa/drivers/dri/i965/brw_tex_layout.c index ade2940..840a069 100644 --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c @@ -396,6 +396,7 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt) } } + mt->total_width /= bw; mt->total_height = 0; for (unsigned level = mt->first_level; level <= mt->last_level; level++) { @@ -420,7 +421,7 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt) /* Layout_below: step right after second mipmap. */ if (level == mt->first_level + 1) { -x += ALIGN_NPOT(width, mt->align_w); +x += ALIGN_NPOT(width, mt->align_w) / bw; } else { y += img_height; } @@ -582,12 +583,14 @@ static void brw_miptree_layout_texture_3d(struct brw_context *brw, struct intel_mipmap_tree *mt) { - unsigned yscale = mt->compressed ? 4 : 1; - mt->total_width = 0; mt->total_height = 0; unsigned ysum = 0; + unsigned bh, bw; + + _mesa_get_format_block_size(mt->format, &bw, &bh); + for (unsigned level = mt->first_level; level <= mt->last_level; level++) { unsigned WL = MAX2(mt->physical_width0 >> level, 1); unsigned HL = MAX2(mt->physical_height0 >> level, 1); @@ -604,9 +607,9 @@ brw_miptree_layout_texture_3d(struct brw_context *brw, unsigned x = (q % (1 << level)) * wL; unsigned y = ysum + (q >> level) * hL; - intel_miptree_set_image_offset(mt, level, q, x, y / yscale); - mt->total_width = MAX2(mt->total_width, x + wL); - mt->total_height = MAX2(mt->total_height, (y + hL) / yscale); + intel_miptree_set_image_offset(mt, level, q, x / bw, y / bh); + mt->total_width = MAX2(mt->total_width, (x + wL) / bw); + mt->total_height = MAX2(mt->total_height, (y + hL) / bh); } ysum += ALIGN(DL, 1 << level) / (1 << level) * hL; diff --git a/src/mesa/drivers/dri/i965/intel_copy_image.c b/src/mesa/drivers/dri/i965/intel_copy_image.c index f4c7eff..93a64b5 100644 --- a/src/mesa/drivers/dri/i965/intel_copy_image.c +++ b/src/mesa/drivers/dri/i965/intel_copy_image.c @@ -41,7 +41,6 @@ copy_image_with_blitter(struct brw_context *brw, { GLuint bw, bh; uint32_t src_image_x, src_image_y, dst_image_x, dst_image_y; - int cpp; /* The blitter doesn't understand multisampling at all. */ if (src_mt->num_samples > 0 || dst_mt->num_samples > 0) @@ -86,16 +85,6 @@ copy_image_with_blitter(struct brw_context *brw, src_y /= (int)bh; src_width /= (int)bw; src_height /= (int)bh; - - /* Inside of the miptree, the x offsets are stored in pixels while - * the y offsets are stored in blocks. We need to scale just the x - * offset. - */ - src_image_x /= bw; - - cpp = _mesa_get_format_bytes(src_mt->format); - } else { - cpp = src_mt->cpp; } src_x += src_image_x; src_y += src_image_y; @@ -111,18 +100,12 @@ copy_image_with_blitter(struct brw_context *brw, dst_x /= (int)bw; dst_y /= (int)bh; - - /* Inside of the miptree, the x offsets are stored in pixels while - * the y offsets are stored in blocks. We need to scale just the x - * offset. - */ - dst_image_x /= bw; } dst_x += dst_image_x; dst_y += dst_image_y; return intelEmitCopyBlit(brw, -cpp, +src_mt->cpp, src_mt->pitch, src_mt->bo, src_mt->offset, src_mt->tiling, diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipma
Re: [Mesa-dev] [PATCH v3 11/18] mesa/macros: add power-of-two assertions for alignment macros
On 06/22/2015 05:02 PM, Nanley Chery wrote: From: Nanley Chery ALIGN and ROUND_DOWN_TO both require that the alignment value passed into the macro be a power of two in the comments. Using software assertions verifies this to be the case. v2: use static inline functions instead of gcc-specific statement expressions. Signed-off-by: Nanley Chery --- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 2 +- src/mesa/main/macros.h | 16 +--- 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index 59081ea..1a57784 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -134,7 +134,7 @@ fs_visitor::nir_setup_outputs(nir_shader *shader) : var->type->vector_elements; if (stage == MESA_SHADER_VERTEX) { - for (int i = 0; i < ALIGN(type_size(var->type), 4) / 4; i++) { + for (unsigned int i = 0; i < ALIGN(type_size(var->type), 4) / 4; i++) { int output = var->data.location + i; this->outputs[output] = offset(reg, 4 * i); this->output_components[output] = vector_elements; diff --git a/src/mesa/main/macros.h b/src/mesa/main/macros.h index 0608650..4a640ad 100644 --- a/src/mesa/main/macros.h +++ b/src/mesa/main/macros.h @@ -684,7 +684,7 @@ minify(unsigned value, unsigned levels) * Note that this considers 0 a power of two. */ static inline bool -is_power_of_two(unsigned value) +is_power_of_two(uintptr_t value) { return (value & (value - 1)) == 0; } @@ -700,7 +700,12 @@ is_power_of_two(unsigned value) * * \sa ROUND_DOWN_TO() */ -#define ALIGN(value, alignment) (((value) + (alignment) - 1) & ~((alignment) - 1)) +static inline uintptr_t +ALIGN(uintptr_t value, uintptr_t alignment) +{ + assert(is_power_of_two(alignment)); + return (((value) + (alignment) - 1) & ~((alignment) - 1)); Looks like more than 3-space indentation here and below. -Brian +} /** * Align a value down to an alignment value @@ -713,7 +718,12 @@ is_power_of_two(unsigned value) * * \sa ALIGN() */ -#define ROUND_DOWN_TO(value, alignment) ((value) & ~(alignment - 1)) +static inline uintptr_t +ROUND_DOWN_TO(uintptr_t value, uintptr_t alignment) +{ + assert(is_power_of_two(alignment)); + return ((value) & ~(alignment - 1)); +} /** Cross product of two 3-element vectors */ ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 28/46] glsl: don't lower variable indexing on non-patch tessellation inputs/outputs
On Wednesday, June 17, 2015 01:01:24 AM Marek Olšák wrote: > From: Marek Olšák > > There is no way to lower them, because the array sizes are unknown > at compile time. > > Based on a patch from: Fabian Bieler I'm a bit confused by the justification given for this patch. TCS/TES per-vertex inputs: -- ...are always fixed-size arrays of length gl_MaxPatchVertices, because: "The length of gl_in is equal to the implementation-dependent maximum patch size (gl_MaxPatchVertices)." "Similarly to the built-in inputs, each user-defined input variable has a value for each vertex and thus needs to be declared as arrays or inside input blocks declared as arrays. Declaring an array size is optional. If no size is specified, it will be taken from the implementation-dependent maximum patch size (gl_MaxPatchVertices). If a size is specified, it must match the maximum patch size; otherwise, a link-error will occur." This same text exists for both TCS inputs and TES inputs. Since we always know the array size, I don't see why we can't do lowering in this case. I'm pretty new to tessellation shaders, so am I missing something? TCS per-patch inputs: - ...don't exist AFAICT. TES per-patch inputs: - ...do exist, require no special handling. TCS per-vertex outputs: --- ...are arrays whose size is known at link time, but not necessarily compile time. "The length of gl_out is equal to the output patch size specified in the tessellation control shader output layout declaration." "A tessellation control shader may also declare user-defined per-vertex output variables. User-defined per-vertex output variables are declared with the qualifier out and have a value for each vertex in the output patch. Such variables must be declared as arrays or inside output blocks declared as arrays. Declaring an array size is optional. If no size is specified, it will be taken from the output patch size declared in the shader." Apparently, the index must also be gl_InvocationID when writing: "While per-vertex output variables are declared as arrays indexed by vertex number, each tessellation control shader invocation may write only to those outputs corresponding to its output patch vertex. Tessellation control shaders must use the input variable gl_InvocationID as the vertex number index when writing to per-vertex output variables." So we clearly don't want to do lowering on writes. But for reads, it seems like we could do lowering when the array size is known (such as post-linking). I'm not sure whether or not it's beneficial... It might be nice to add a comment explaining why it makes no sense to lower variable indexing on TCS output writes (with the above spec citation). TES outputs: ...require no special handling. > --- > src/glsl/ir_optimization.h | 5 +-- > src/glsl/lower_variable_index_to_cond_assign.cpp | 43 > +--- > src/glsl/test_optpass.cpp| 3 +- > src/mesa/drivers/dri/i965/brw_shader.cpp | 8 +++-- > src/mesa/program/ir_to_mesa.cpp | 2 +- > src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 2 +- > 6 files changed, 42 insertions(+), 21 deletions(-) > > diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h > index 688a5e1..a174c96 100644 > --- a/src/glsl/ir_optimization.h > +++ b/src/glsl/ir_optimization.h > @@ -114,8 +114,9 @@ bool lower_discard(exec_list *instructions); > void lower_discard_flow(exec_list *instructions); > bool lower_instructions(exec_list *instructions, unsigned what_to_lower); > bool lower_noise(exec_list *instructions); > -bool lower_variable_index_to_cond_assign(exec_list *instructions, > -bool lower_input, bool lower_output, bool lower_temp, bool > lower_uniform); > +bool lower_variable_index_to_cond_assign(gl_shader_stage stage, > +exec_list *instructions, bool lower_input, bool lower_output, > +bool lower_temp, bool lower_uniform); > bool lower_quadop_vector(exec_list *instructions, bool dont_lower_swz); > bool lower_const_arrays_to_uniforms(exec_list *instructions); > bool lower_clip_distance(gl_shader *shader); > diff --git a/src/glsl/lower_variable_index_to_cond_assign.cpp > b/src/glsl/lower_variable_index_to_cond_assign.cpp > index d878cb0..b6421f5 100644 > --- a/src/glsl/lower_variable_index_to_cond_assign.cpp > +++ b/src/glsl/lower_variable_index_to_cond_assign.cpp > @@ -335,12 +335,14 @@ struct switch_generator > > class variable_index_to_cond_assign_visitor : public ir_rvalue_visitor { > public: > - variable_index_to_cond_assign_visitor(bool lower_input, > - bool lower_output, > - bool lower_temp, > - bool lower_uniform) > + variable_index_to_cond_assign_visitor(gl_shader_stage stage, > +
Re: [Mesa-dev] [PATCH 5/5] i965/gen9: Allocate YF/YS tiled buffer objects
On Wed, Jun 10, 2015 at 03:30:50PM -0700, Anuj Phogat wrote: > In case of I915_TILING_{X,Y} we need to pass tiling format to libdrm > using drm_intel_bo_alloc_tiled(). But, In case of YF/YS tiled buffers > libdrm need not know about the tiling format because these buffers > don't have hardware support to be tiled or detiled through a fenced > region. libdrm still need to know buffer alignment value for its use > in kernel when resolving the relocation. > > Using drm_intel_bo_alloc_for_render() for YF/YS tiled buffers > satisfy both the above conditions. > > Signed-off-by: Anuj Phogat > Cc: Ben Widawsky > --- > src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 86 > +-- > 1 file changed, 80 insertions(+), 6 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > index 615cbfb..d4d9e76 100644 > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > @@ -522,6 +522,65 @@ intel_lower_compressed_format(struct brw_context *brw, > mesa_format format) > } > } > > +/* This function computes Yf/Ys tiled bo size and alignment. */ It also computes pitch for the yf/ys case > +static uint64_t > +intel_get_yf_ys_bo_size(struct intel_mipmap_tree *mt, unsigned *alignment) > +{ > + const uint32_t bpp = mt->cpp * 8; > + const uint32_t aspect_ratio = (bpp == 16 || bpp == 64) ? 2 : 1; > + uint32_t tile_width, tile_height; > + const uint64_t min_size = 512 * 1024; > + const uint64_t max_size = 64 * 1024 * 1024; Where do min/max come from? Add a comment? > + uint64_t i, stride, size, aligned_y; > + > + assert(mt->tr_mode != INTEL_MIPTREE_TRMODE_NONE); > + > + switch (bpp) { > + case 8: > + tile_height = 64; > + break; > + case 16: > + case 32: > + tile_height = 32; > + break; > + case 64: > + case 128: > + tile_height = 16; > + break; > + default: > + tile_height = 0; make this unreachable() > + printf("Invalid bits per pixel in %s: bpp = %d\n", > + __FUNCTION__, bpp); > + } I think ideally you should roll this logic into intel_miptree_get_tile_masks(). > + > + if (mt->tr_mode == INTEL_MIPTREE_TRMODE_YS) > + tile_height *= 4; > + > + aligned_y = ALIGN(mt->total_height, tile_height); > + > + stride = mt->total_width * mt->cpp; > + tile_width = tile_height * mt->cpp * aspect_ratio; > + stride = ALIGN(stride, tile_width); > + size = stride * aligned_y; > + > + if (mt->tr_mode == INTEL_MIPTREE_TRMODE_YF) { > + *alignment = 4096; > + size = ALIGN(size, 4096); > + } else { > + *alignment = 64 * 1024; > + size = ALIGN(size, 64 * 1024); > + } Hmm. I think the above calculation for size is redundant since you already aligned to tile_width and height, above. Right? assert((size % 64K) == 0); > + > + if (size > max_size) { > + mt->tr_mode = INTEL_MIPTREE_TRMODE_NONE; > + return 0; > + } else { > + mt->pitch = stride; > + for (i = min_size; i < size; i <<= 1) > + ; > + return i; I don't understand this. Why don't you just return size? It seems incredibly wasteful to both start a 512K, and to increment by powers of 2. Did I miss something? Also, I don't understand max_size. I must be missing something in the spec with the min/max values, can you point me to them? > + } > +} > > struct intel_mipmap_tree * > intel_miptree_create(struct brw_context *brw, > @@ -575,12 +634,27 @@ intel_miptree_create(struct brw_context *brw, > > unsigned long pitch; > mt->etc_format = etc_format; > - mt->bo = drm_intel_bo_alloc_tiled(brw->bufmgr, "miptree", > - total_width, total_height, mt->cpp, > - &mt->tiling, &pitch, > - (expect_accelerated_upload ? > - BO_ALLOC_FOR_RENDER : 0)); > - mt->pitch = pitch; > + > + if (mt->tr_mode != INTEL_MIPTREE_TRMODE_NONE) { > + unsigned alignment = 0; > + unsigned long size; > + size = intel_get_yf_ys_bo_size(mt, &alignment); > + > + /* intel_get_yf_ys_bo_size() might change the tr_mode. */ > + if (size > 0 && mt->tr_mode != INTEL_MIPTREE_TRMODE_NONE) { > + mt->bo = drm_intel_bo_alloc_for_render(brw->bufmgr, "miptree", > +size, alignment); > + } > + } > + > + if (mt->tr_mode == INTEL_MIPTREE_TRMODE_NONE) { > + mt->bo = drm_intel_bo_alloc_tiled(brw->bufmgr, "miptree", > +total_width, total_height, mt->cpp, > +&mt->tiling, &pitch, > +(expect_accelerated_upload ? > + BO_ALLOC_FOR_RENDER : 0)); > + mt->pitch = pitch; > + } > > /* If the BO is too large to fit in the apertu
Re: [Mesa-dev] [PATCH 2/5] i965/gen9: Plugin the code for selecting YF/YS tiling on skl+
On Mon, Jun 22, 2015 at 2:53 PM, Ben Widawsky wrote: > On Wed, Jun 10, 2015 at 03:30:47PM -0700, Anuj Phogat wrote: >> Buffers with Yf/Ys tiling end up using meta upload / download >> paths or the blitter for cases where they used tiled_memcpy paths >> in case of Y tiling. This has exposed some bugs in meta path. To >> avoid any piglit regressions on SKL this patch keeps the Yf/Ys >> tiling disabled at the moment. >> >> V3: Make brw_miptree_choose_tr_mode() actually choose TRMODE. (Ben) >> Few cosmetic changes. >> V4: Get rid of brw_miptree_choose_tr_mode(). >> Take care of all tile resource modes {Yf, Ys, none} for all >> generations at one place. >> >> Signed-off-by: Anuj Phogat >> Cc: Ben Widawsky >> --- >> src/mesa/drivers/dri/i965/brw_tex_layout.c | 97 >> -- >> 1 file changed, 79 insertions(+), 18 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c >> b/src/mesa/drivers/dri/i965/brw_tex_layout.c >> index b9ac4cf..c0ef5cc 100644 >> --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c >> +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c >> @@ -807,27 +807,88 @@ brw_miptree_layout(struct brw_context *brw, >> enum intel_miptree_tiling_mode requested, >> struct intel_mipmap_tree *mt) >> { >> - mt->tr_mode = INTEL_MIPTREE_TRMODE_NONE; >> + const unsigned bpp = mt->cpp * 8; >> + const bool is_tr_mode_yf_ys_allowed = >> + brw->gen >= 9 && >> + !for_bo && >> + !mt->compressed && >> + /* Enable YF/YS tiling only for color surfaces because depth and >> + * stencil surfaces are not supported in blitter using fast copy >> + * blit and meta PBO upload, download paths. No other paths >> + * currently support Yf/Ys tiled surfaces. >> + * FIXME: Remove this restriction once we have a tiled_memcpy() >> + * path to do depth/stencil data upload/download to Yf/Ys tiled >> + * surfaces. >> + */ > > I think it's more readable to move this comment above the variable > declaration. > Up to you though. Also I think "FINISHME" is the more appropriate > classification > for this type of thing. > Sure. >> + _mesa_is_format_color_format(mt->format) && >> + (requested == INTEL_MIPTREE_TILING_Y || >> + requested == INTEL_MIPTREE_TILING_ANY) && > > This is where my tiling flags would have helped a bit since you should be able > to do flags & Y_TILED :P > Yes, I will do a follow up patch to make use of that. >> + (bpp && is_power_of_two(bpp)) && >> + /* FIXME: To avoid piglit regressions keep the Yf/Ys tiling >> + * disabled at the moment. >> + */ >> + false; > > Also, "FINISHME" > >> >> - intel_miptree_set_alignment(brw, mt); >> - intel_miptree_set_total_width_height(brw, mt); >> + /* Lower index (Yf) is the higher priority mode */ >> + const uint32_t tr_mode[3] = {INTEL_MIPTREE_TRMODE_YF, >> +INTEL_MIPTREE_TRMODE_YS, >> +INTEL_MIPTREE_TRMODE_NONE}; >> + int i = is_tr_mode_yf_ys_allowed ? 0 : ARRAY_SIZE(tr_mode) - 1; >> >> - if (!mt->total_width || !mt->total_height) { >> - intel_miptree_release(&mt); >> - return; >> - } >> + while (i < ARRAY_SIZE(tr_mode)) { >> + if (brw->gen < 9) >> + assert(tr_mode[i] == INTEL_MIPTREE_TRMODE_NONE); >> + else >> + assert(tr_mode[i] == INTEL_MIPTREE_TRMODE_YF || >> +tr_mode[i] == INTEL_MIPTREE_TRMODE_YS || >> +tr_mode[i] == INTEL_MIPTREE_TRMODE_NONE); >> >> - /* On Gen9+ the alignment values are expressed in multiples of the block >> -* size >> -*/ >> - if (brw->gen >= 9) { >> - unsigned int i, j; >> - _mesa_get_format_block_size(mt->format, &i, &j); >> - mt->align_w /= i; >> - mt->align_h /= j; >> - } >> + mt->tr_mode = tr_mode[i]; >> + intel_miptree_set_alignment(brw, mt); >> + intel_miptree_set_total_width_height(brw, mt); >> >> - if (!for_bo) >> - mt->tiling = brw_miptree_choose_tiling(brw, requested, mt); >> + if (!mt->total_width || !mt->total_height) { >> + intel_miptree_release(&mt); >> + return; >> + } >> + >> + /* On Gen9+ the alignment values are expressed in multiples of the >> + * block size. >> + */ >> + if (brw->gen >= 9) { >> + unsigned int i, j; >> + _mesa_get_format_block_size(mt->format, &i, &j); >> + mt->align_w /= i; >> + mt->align_h /= j; >> + } > > Can we just combine this alignment calculation into > intel_miptree_set_alignment()? > No. intel_miptree_set_total_width_height() called after intel_miptree_set_alignment() needs align_w and align_h values in pixels. We do the division later to directly use mt->align_w and mt->align_h while setting the surface state which needs the values in number of blocks. I have a cleanup patch moving this code to surface state setup. >> + >> +
Re: [Mesa-dev] [PATCH 3/4] i965/gen9: Don't use encrypted MOCS
On Thu, Jun 18, 2015 at 03:41:50PM -0700, Kenneth Graunke wrote: > On Wednesday, June 17, 2015 03:50:13 PM Ben Widawsky wrote: > > On gen9+ MOCS is an index into a table. It is 7 bits, and AFAICT, bit 0 is > > for > > doing encrypted reads. > > > > I don't recall how I decided to do this for BXT. I don't know this patch was > > ever needed, since it seems nothing is broken today on SKL. Furthermore, > > this > > patch may no longer be needed because of the ongoing changes with MOCS > > setup. It > > is what is being used/tested, so it's included in the series. > > > > The chosen values are the old values left shifted. That was also an > > arbitrary > > choice. > > > > Cc: Francisco Jerez > > Signed-off-by: Ben Widawsky > > --- > > src/mesa/drivers/dri/i965/brw_defines.h | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h > > b/src/mesa/drivers/dri/i965/brw_defines.h > > index bfcc442..5358edc 100644 > > --- a/src/mesa/drivers/dri/i965/brw_defines.h > > +++ b/src/mesa/drivers/dri/i965/brw_defines.h > > @@ -2495,8 +2495,8 @@ enum brw_wm_barycentric_interp_mode { > > * cache settings. We still use only either write-back or write-through; > > and > > * rely on the documented default values. > > */ > > -#define SKL_MOCS_WB 9 > > -#define SKL_MOCS_WT 5 > > +#define SKL_MOCS_WB 0x12 > > +#define SKL_MOCS_WT 0xa > > > Yeah, it looks like Kristian made these defines the indices into the > table, but may have missed that the MOCS field puts that table index in > [6:1] and bit 0 is something else. > > So shifting left by 1 seems like a good plan. Perhaps write it as > > #define SKL_MOCS_WB (0b000101 << 1) > #define SKL_MOCS_WT (0b001001 << 1) > You meant this, right (you reversed it, I think)? #define SKL_MOCS_WB (0b001001 << 1) #define SKL_MOCS_WT (0b000101 << 1) > so the index value is written like it is in the documentation, and the > shift 1 indicates moving it into the right place for MOCS? > > Either way, > Reviewed-by: Kenneth Graunke > > Incidentally...the WT value (index 5) appears to skip eLLC - the target > cache is 01b = "LLC only". That doesn't seem desirable. We probably > want index 6 instead (0b000110 << 1) which uses both LLC and eLLC. > > That said, we shouldn't ever be using WT in the driver - we want to use > the PTE value. (krh even added a FINISHME comment to that effect.) > > I think a proper value for that would be: > #define SKL_MOCS_PTE (0b10 << 1) > (Default: 0b10, > LeCC = 0x00 - use cacheability controls from page table / ... > TC = LLC/eLLC allowed) > > We could either fix the _WT define or just delete it. > > > > > #define MEDIA_VFE_STATE 0x7000 > > /* GEN7 DW2, GEN8+ DW3 */ > > I'll get on this too. Thanks. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/16] i965: Finish removing brw_context from the compiler
I started working on this project some time ago to remove brw_context from the backend compiler. I got a bunch of refactoring done but eventualy got stuck up on shader_time and some debug logging stuff. I've finally gotten around to finishing it and here it is. Jason Ekstrand (15): i965: Replace some instances of brw->gen with devinfo->gen i965: Plumb compiler debug logging through a function pointer in brw_compiler i965: Remove the dependance on brw_context from the generators i965: Move INTEL_DEBUG variable parsing to screen creation time i965/fs: Make no16 non-variadic i965/fs: Do the no16 perf logging directly in fs_visitor::no16() i965/fs: Plumb compiler debug logging through brw_compiler i965: Add compiler options to brw_compiler i965: Use a single index per shader for shader_time. i965: Pull calls to get_shader_time_index out of the visitor i965/fs: Add a do_rep_send flag to run_fs i965/vs: Pass the current set of clip planes through run() and run_vs() i965/vec4: Turn some _mesa_problem calls into asserts i965/vec4_vs: Add an explicit use_legacy_snorm_formula flag i965: Remove the brw_context from the visitors Kenneth Graunke (1): mesa: Add a va_args variant of _mesa_gl_debug(). src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp| 3 +- src/mesa/drivers/dri/i965/brw_context.c| 54 ++--- src/mesa/drivers/dri/i965/brw_context.h| 15 +-- src/mesa/drivers/dri/i965/brw_cs.cpp | 17 ++- src/mesa/drivers/dri/i965/brw_fs.cpp | 127 - src/mesa/drivers/dri/i965/brw_fs.h | 28 +++-- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 21 ++-- src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 1 - src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 30 ++--- src/mesa/drivers/dri/i965/brw_program.c| 67 --- src/mesa/drivers/dri/i965/brw_shader.cpp | 100 +++- src/mesa/drivers/dri/i965/brw_shader.h | 13 ++- src/mesa/drivers/dri/i965/brw_vec4.cpp | 49 src/mesa/drivers/dri/i965/brw_vec4.h | 23 ++-- src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 22 ++-- src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 32 -- src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h| 5 +- .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 1 - src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 16 +-- src/mesa/drivers/dri/i965/brw_vec4_vp.cpp | 9 +- src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp | 16 +-- src/mesa/drivers/dri/i965/brw_vs.h | 8 +- src/mesa/drivers/dri/i965/gen6_gs_visitor.h| 7 +- src/mesa/drivers/dri/i965/intel_debug.c| 13 +-- src/mesa/drivers/dri/i965/intel_debug.h| 4 +- src/mesa/drivers/dri/i965/intel_screen.c | 3 + src/mesa/main/errors.c | 29 +++-- src/mesa/main/errors.h | 9 ++ 28 files changed, 379 insertions(+), 343 deletions(-) -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/16] mesa: Add a va_args variant of _mesa_gl_debug().
From: Kenneth Graunke This will be useful for wrapper functions. Signed-off-by: Kenneth Graunke --- src/mesa/main/errors.c | 29 + src/mesa/main/errors.h | 9 + 2 files changed, 30 insertions(+), 8 deletions(-) diff --git a/src/mesa/main/errors.c b/src/mesa/main/errors.c index 16f10dd..b340666 100644 --- a/src/mesa/main/errors.c +++ b/src/mesa/main/errors.c @@ -1413,6 +1413,26 @@ should_output(struct gl_context *ctx, GLenum error, const char *fmtString) void +_mesa_gl_vdebug(struct gl_context *ctx, +GLuint *id, +enum mesa_debug_source source, +enum mesa_debug_type type, +enum mesa_debug_severity severity, +const char *fmtString, +va_list args) +{ + char s[MAX_DEBUG_MESSAGE_LENGTH]; + int len; + + debug_get_id(id); + + len = _mesa_vsnprintf(s, MAX_DEBUG_MESSAGE_LENGTH, fmtString, args); + + log_msg(ctx, source, type, *id, severity, len, s); +} + + +void _mesa_gl_debug(struct gl_context *ctx, GLuint *id, enum mesa_debug_source source, @@ -1420,17 +1440,10 @@ _mesa_gl_debug(struct gl_context *ctx, enum mesa_debug_severity severity, const char *fmtString, ...) { - char s[MAX_DEBUG_MESSAGE_LENGTH]; - int len; va_list args; - - debug_get_id(id); - va_start(args, fmtString); - len = _mesa_vsnprintf(s, MAX_DEBUG_MESSAGE_LENGTH, fmtString, args); + _mesa_gl_vdebug(ctx, id, source, type, severity, fmtString, args); va_end(args); - - log_msg(ctx, source, type, *id, severity, len, s); } diff --git a/src/mesa/main/errors.h b/src/mesa/main/errors.h index e6dc9b5..24f234f 100644 --- a/src/mesa/main/errors.h +++ b/src/mesa/main/errors.h @@ -76,6 +76,15 @@ extern FILE * _mesa_get_log_file(void); extern void +_mesa_gl_vdebug(struct gl_context *ctx, +GLuint *id, +enum mesa_debug_source source, +enum mesa_debug_type type, +enum mesa_debug_severity severity, +const char *fmtString, +va_list args); + +extern void _mesa_gl_debug(struct gl_context *ctx, GLuint *id, enum mesa_debug_source source, -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/16] i965: Remove the dependance on brw_context from the generators
--- src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp | 2 +- src/mesa/drivers/dri/i965/brw_cs.cpp | 2 +- src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +- src/mesa/drivers/dri/i965/brw_fs.h| 4 +++- src/mesa/drivers/dri/i965/brw_fs_generator.cpp| 5 +++-- src/mesa/drivers/dri/i965/brw_vec4.cpp| 4 ++-- src/mesa/drivers/dri/i965/brw_vec4.h | 4 +++- src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 3 ++- src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 2 +- 9 files changed, 17 insertions(+), 11 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp index 9c04137..789520c 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp @@ -29,7 +29,7 @@ brw_blorp_eu_emitter::brw_blorp_eu_emitter(struct brw_context *brw, bool debug_flag) : mem_ctx(ralloc_context(NULL)), - generator(brw->intelScreen->compiler, + generator(brw->intelScreen->compiler, brw, mem_ctx, (void *) rzalloc(mem_ctx, struct brw_wm_prog_key), (struct brw_stage_prog_data *) rzalloc(mem_ctx, struct brw_wm_prog_data), NULL, 0, false, "BLORP") diff --git a/src/mesa/drivers/dri/i965/brw_cs.cpp b/src/mesa/drivers/dri/i965/brw_cs.cpp index f93ca2f..0833404 100644 --- a/src/mesa/drivers/dri/i965/brw_cs.cpp +++ b/src/mesa/drivers/dri/i965/brw_cs.cpp @@ -128,7 +128,7 @@ brw_cs_emit(struct brw_context *brw, return NULL; } - fs_generator g(brw->intelScreen->compiler, + fs_generator g(brw->intelScreen->compiler, brw, mem_ctx, (void*) key, &prog_data->base, &cp->Base, v8.promoted_constants, v8.runtime_check_aads_emit, "CS"); if (INTEL_DEBUG & DEBUG_CS) { diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 2b892f0..615c2f1 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -4069,7 +4069,7 @@ brw_wm_fs_emit(struct brw_context *brw, prog_data->no_8 = false; } - fs_generator g(brw->intelScreen->compiler, + fs_generator g(brw->intelScreen->compiler, brw, mem_ctx, (void *) key, &prog_data->base, &fp->Base, v.promoted_constants, v.runtime_check_aads_emit, "FS"); diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 7414b65..1d52ff0 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -398,7 +398,7 @@ public: class fs_generator { public: - fs_generator(const struct brw_compiler *compiler, + fs_generator(const struct brw_compiler *compiler, void *log_data, void *mem_ctx, const void *key, struct brw_stage_prog_data *prog_data, @@ -494,6 +494,8 @@ private: bool patch_discard_jumps_to_fb_writes(); const struct brw_compiler *compiler; + void *log_data; /* Passed to compiler->*_log functions */ + const struct brw_device_info *devinfo; struct brw_codegen *p; diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index d98a40d..2ed0bac 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -121,7 +121,7 @@ brw_reg_from_fs_reg(fs_reg *reg) return brw_reg; } -fs_generator::fs_generator(const struct brw_compiler *compiler, +fs_generator::fs_generator(const struct brw_compiler *compiler, void *log_data, void *mem_ctx, const void *key, struct brw_stage_prog_data *prog_data, @@ -130,7 +130,8 @@ fs_generator::fs_generator(const struct brw_compiler *compiler, bool runtime_check_aads_emit, const char *stage_abbrev) - : compiler(compiler), devinfo(compiler->devinfo), key(key), + : compiler(compiler), log_data(log_data), + devinfo(compiler->devinfo), key(key), prog_data(prog_data), prog(prog), promoted_constants(promoted_constants), runtime_check_aads_emit(runtime_check_aads_emit), debug_flag(false), diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 5e549c4..572bc17 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1910,7 +1910,7 @@ brw_vs_emit(struct brw_context *brw, return NULL; } - fs_generator g(brw->intelScreen->compiler, + fs_generator g(brw->intelScreen->compiler, brw, mem_ctx, (void *) &c->key, &prog_data->base.base, &c->vp->program.Base, v.promoted_constants, v.runtime_check_aads_emit, "VS"); @@ -1948,7 +1948,7 @@ brw_vs_emit(struct brw_
Re: [Mesa-dev] [PATCH 4/5] i965/gen9: Add XY_FAST_COPY_BLT support to intelEmitCopyBlit()
On Fri, Jun 19, 2015 at 02:41:50PM -0700, Anuj Phogat wrote: > On Wed, Jun 10, 2015 at 3:34 PM, Anuj Phogat wrote: > > This patch enables using XY_FAST_COPY_BLT only for Yf/Ys tiled buffers. > > It can be later turned on for other tiling patterns (X,Y) too. > > > > V3: Flush in between sequential fast copy blits. > > Fix src/dst alignment requirements. > > Make can_fast_copy_blit() helper. > > Use ffs(), is_power_of_two() > > Move overlap computation inside intel_miptree_blit(). > > > > V4: Use _mesa_regions_overlap() function. > > Simplify horizontal and vertical alignment computations. > > > > Signed-off-by: Anuj Phogat > > Cc: Ben Widawsky > > --- > > src/mesa/drivers/dri/i965/intel_blit.c | 295 > > ++- > > src/mesa/drivers/dri/i965/intel_blit.h | 2 + > > src/mesa/drivers/dri/i965/intel_copy_image.c | 2 + > > src/mesa/drivers/dri/i965/intel_reg.h| 16 ++ > > 4 files changed, 268 insertions(+), 47 deletions(-) > > > > diff --git a/src/mesa/drivers/dri/i965/intel_blit.c > > b/src/mesa/drivers/dri/i965/intel_blit.c > > index 5afc771..800ed7e 100644 > > --- a/src/mesa/drivers/dri/i965/intel_blit.c > > +++ b/src/mesa/drivers/dri/i965/intel_blit.c > > @@ -27,6 +27,7 @@ > > > > > > #include "main/mtypes.h" > > +#include "main/blit.h" > > #include "main/context.h" > > #include "main/enums.h" > > #include "main/colormac.h" > > @@ -43,6 +44,23 @@ > > > > #define FILE_DEBUG_FLAG DEBUG_BLIT > > > > +#define SET_TILING_XY_FAST_COPY_BLT(tiling, tr_mode, type) \ > > +({ \ > > + switch (tiling) { \ > > + case I915_TILING_X: \ > > + CMD |= type ## _TILED_X; \ > > + break; \ > > + case I915_TILING_Y: \ > > + if (tr_mode == INTEL_MIPTREE_TRMODE_YS)\ > > + CMD |= type ## _TILED_64K; \ > > + else \ > > + CMD |= type ## _TILED_Y;\ > > + break; \ > > + default: \ > > + unreachable("not reached");\ > > + } \ > > +}) > > + > > static void > > intel_miptree_set_alpha_to_one(struct brw_context *brw, > > struct intel_mipmap_tree *mt, > > @@ -75,6 +93,10 @@ static uint32_t > > br13_for_cpp(int cpp) > > { > > switch (cpp) { > > + case 16: > > + return BR13_32323232; > > + case 8: > > + return BR13_16161616; > > case 4: > >return BR13_; > >break; > > @@ -89,6 +111,66 @@ br13_for_cpp(int cpp) > > } > > } > > > > +static uint32_t > > +get_tr_horizontal_align(uint32_t tr_mode, uint32_t cpp, bool is_src) { > > + /* Alignment tables for YF/YS tiled surfaces. */ > > + const uint32_t align_2d_yf[] = {64, 64, 32, 32, 16}; > > + const uint32_t align_2d_ys[] = {256, 256, 128, 128, 64}; If you move the alignment stuff from the other patch series to a more generic place, you could reuse it here. Also, as you pointed out in that other patch, ys = 4 * ys > > + const uint32_t bpp = cpp * 8; > > + const uint32_t shift = is_src ? 17 : 10; > > + uint32_t align; > > + int i = 0; > > + > > + if (tr_mode == INTEL_MIPTREE_TRMODE_NONE) > > + return 0; > > + > > + /* Compute array index. */ > > + assert (bpp >= 8 && bpp <= 128 && is_power_of_two(bpp)); > > + i = ffs(bpp / 8) - 1; > > + > > + align = tr_mode == INTEL_MIPTREE_TRMODE_YF ? > > + align_2d_yf[i] : > > + align_2d_ys[i]; > > + > > + assert(is_power_of_two(align)); > > + > > + /* XY_FAST_COPY_BLT doesn't support horizontal alignment of 16. */ > > + if (align == 16) > > + align = 32; > > + > > + return (ffs(align) - 6) << shift; > > +} > > + > > +static uint32_t > > +get_tr_vertical_align(uint32_t tr_mode, uint32_t cpp, bool is_src) { > > + /* Vertical alignment tables for YF/YS tiled surfaces. */ > > + const unsigned align_2d_yf[] = {64, 32, 32, 16, 16}; > > + const unsigned align_2d_ys[] = {256, 128, 128, 64, 64}; > > + const uint32_t bpp = cpp * 8; > > + const uint32_t shift = is_src ? 15 : 8; > > + uint32_t align; > > + int i = 0; > > + > > + if (tr_mode == INTEL_MIPTREE_TRMODE_NONE) > > + return 0; > > + > > + /* Compute array index. */ > > + assert (bpp >= 8 && bpp <= 128 && is_power_of_two(bpp)); > > + i = ffs(bpp / 8) - 1; > > + > > + align = tr_mode == INTEL_MIPTREE_TRMODE_YF ? > > + align_2d_yf[i] : > > +
[Mesa-dev] [PATCH 10/16] i965: Use a single index per shader for shader_time.
Previously, each shader took 3 shader time indices which were potentially at arbirary points in the shader time buffer. Now, each shader gets a single index which refers to 3 consecutive locations in the buffer. This simplifies some of the logic at the cost of having a magic 3 a few places. --- src/mesa/drivers/dri/i965/brw_context.h | 14 + src/mesa/drivers/dri/i965/brw_fs.cpp | 28 -- src/mesa/drivers/dri/i965/brw_fs.h| 3 +- src/mesa/drivers/dri/i965/brw_program.c | 67 +++ src/mesa/drivers/dri/i965/brw_vec4.cpp| 18 +++--- src/mesa/drivers/dri/i965/brw_vec4.h | 10 +--- src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 3 +- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp| 8 +-- src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp | 2 +- 9 files changed, 53 insertions(+), 100 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index d8fcfff..a7d83f8 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -821,20 +821,10 @@ struct brw_tracked_state { enum shader_time_shader_type { ST_NONE, ST_VS, - ST_VS_WRITTEN, - ST_VS_RESET, ST_GS, - ST_GS_WRITTEN, - ST_GS_RESET, ST_FS8, - ST_FS8_WRITTEN, - ST_FS8_RESET, ST_FS16, - ST_FS16_WRITTEN, - ST_FS16_RESET, ST_CS, - ST_CS_WRITTEN, - ST_CS_RESET, }; struct brw_vertex_buffer { @@ -979,6 +969,8 @@ enum brw_predicate_state { BRW_PREDICATE_STATE_USE_BIT }; +struct shader_times; + /** * brw_context is derived from gl_context. */ @@ -1503,7 +1495,7 @@ struct brw_context const char **names; int *ids; enum shader_time_shader_type *types; - uint64_t *cumulative; + struct shader_times *cumulative; int num_entries; int max_entries; double report_time; diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 460120d..c1bfe86 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -578,38 +578,30 @@ fs_visitor::emit_shader_time_begin() void fs_visitor::emit_shader_time_end() { - enum shader_time_shader_type type, written_type, reset_type; + enum shader_time_shader_type type; switch (stage) { case MESA_SHADER_VERTEX: type = ST_VS; - written_type = ST_VS_WRITTEN; - reset_type = ST_VS_RESET; break; case MESA_SHADER_GEOMETRY: type = ST_GS; - written_type = ST_GS_WRITTEN; - reset_type = ST_GS_RESET; break; case MESA_SHADER_FRAGMENT: if (dispatch_width == 8) { type = ST_FS8; - written_type = ST_FS8_WRITTEN; - reset_type = ST_FS8_RESET; } else { assert(dispatch_width == 16); type = ST_FS16; - written_type = ST_FS16_WRITTEN; - reset_type = ST_FS16_RESET; } break; case MESA_SHADER_COMPUTE: type = ST_CS; - written_type = ST_CS_WRITTEN; - reset_type = ST_CS_RESET; break; default: unreachable("fs_visitor::emit_shader_time_end missing code"); } + int shader_time_index = brw_get_shader_time_index(brw, shader_prog, prog, + type); /* Insert our code just before the final SEND with EOT. */ exec_node *end = this->instructions.get_tail(); @@ -639,20 +631,20 @@ fs_visitor::emit_shader_time_end() * trying to determine the time taken for single instructions. */ ibld.ADD(diff, diff, fs_reg(-2u)); - SHADER_TIME_ADD(ibld, type, diff); - SHADER_TIME_ADD(ibld, written_type, fs_reg(1u)); + SHADER_TIME_ADD(ibld, shader_time_index, 0, diff); + SHADER_TIME_ADD(ibld, shader_time_index, 1, fs_reg(1u)); ibld.emit(BRW_OPCODE_ELSE); - SHADER_TIME_ADD(ibld, reset_type, fs_reg(1u)); + SHADER_TIME_ADD(ibld, shader_time_index, 2, fs_reg(1u)); ibld.emit(BRW_OPCODE_ENDIF); } void fs_visitor::SHADER_TIME_ADD(const fs_builder &bld, -enum shader_time_shader_type type, fs_reg value) +int shader_time_index, int shader_time_subindex, +fs_reg value) { - int shader_time_index = - brw_get_shader_time_index(brw, shader_prog, prog, type); - fs_reg offset = fs_reg(shader_time_index * SHADER_TIME_STRIDE); + int index = shader_time_index * 3 + shader_time_subindex; + fs_reg offset = fs_reg(index * SHADER_TIME_STRIDE); fs_reg payload; if (dispatch_width == 8) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index cffedc0..55a9722 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -278,7 +278,8 @@ public: void emit_shader_time_begin(); void emit_shader_time_end(); void SHADER_TIME_ADD(const brw::fs_builder &bld, -
[Mesa-dev] [PATCH 14/16] i965/vec4: Turn some _mesa_problem calls into asserts
--- src/mesa/drivers/dri/i965/brw_vec4_vp.cpp | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_vp.cpp b/src/mesa/drivers/dri/i965/brw_vec4_vp.cpp index 92d1085..dcbd240 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_vp.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_vp.cpp @@ -381,8 +381,7 @@ vec4_vs_visitor::emit_program_code() break; default: - _mesa_problem(ctx, "Unsupported opcode %s in vertex program\n", - _mesa_opcode_string(vpi->Opcode)); + assert(!"Unsupported opcode in vertex program"); } /* Copy the temporary back into the actual destination register. */ @@ -574,15 +573,13 @@ vec4_vs_visitor::get_vp_src_reg(const prog_src_register &src) break; default: - _mesa_problem(ctx, "bad uniform src register file: %s\n", - _mesa_register_file_name((gl_register_file)src.File)); + assert(!"Bad uniform in src register file"); return src_reg(this, glsl_type::vec4_type); } break; default: - _mesa_problem(ctx, "bad src register file: %s\n", -_mesa_register_file_name((gl_register_file)src.File)); + assert(!"Bad src register file"); return src_reg(this, glsl_type::vec4_type); } -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/16] i965: Pull calls to get_shader_time_index out of the visitor
--- src/mesa/drivers/dri/i965/brw_cs.cpp | 8 +++- src/mesa/drivers/dri/i965/brw_fs.cpp | 55 --- src/mesa/drivers/dri/i965/brw_fs.h| 7 ++- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 7 ++- src/mesa/drivers/dri/i965/brw_vec4.cpp| 25 ++- src/mesa/drivers/dri/i965/brw_vec4.h | 7 ++- src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 18 +--- src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h | 3 +- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp| 4 +- src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp | 5 ++- src/mesa/drivers/dri/i965/brw_vs.h| 3 +- src/mesa/drivers/dri/i965/gen6_gs_visitor.h | 5 ++- 12 files changed, 75 insertions(+), 72 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_cs.cpp b/src/mesa/drivers/dri/i965/brw_cs.cpp index 0833404..fa8b5c8 100644 --- a/src/mesa/drivers/dri/i965/brw_cs.cpp +++ b/src/mesa/drivers/dri/i965/brw_cs.cpp @@ -88,10 +88,14 @@ brw_cs_emit(struct brw_context *brw, cfg_t *cfg = NULL; const char *fail_msg = NULL; + int st_index = -1; + if (INTEL_DEBUG & DEBUG_SHADER_TIME) + st_index = brw_get_shader_time_index(brw, prog, &cp->Base, ST_CS); + /* Now the main event: Visit the shader IR and generate our CS IR for it. */ fs_visitor v8(brw, mem_ctx, MESA_SHADER_COMPUTE, key, &prog_data->base, prog, - &cp->Base, 8); + &cp->Base, 8, st_index); if (!v8.run_cs()) { fail_msg = v8.fail_msg; } else if (local_workgroup_size <= 8 * brw->max_cs_threads) { @@ -100,7 +104,7 @@ brw_cs_emit(struct brw_context *brw, } fs_visitor v16(brw, mem_ctx, MESA_SHADER_COMPUTE, key, &prog_data->base, prog, - &cp->Base, 16); + &cp->Base, 16, st_index); if (likely(!(INTEL_DEBUG & DEBUG_NO16)) && !fail_msg && !v8.simd16_unsupported && local_workgroup_size <= 16 * brw->max_cs_threads) { diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index c1bfe86..252196a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -578,31 +578,6 @@ fs_visitor::emit_shader_time_begin() void fs_visitor::emit_shader_time_end() { - enum shader_time_shader_type type; - switch (stage) { - case MESA_SHADER_VERTEX: - type = ST_VS; - break; - case MESA_SHADER_GEOMETRY: - type = ST_GS; - break; - case MESA_SHADER_FRAGMENT: - if (dispatch_width == 8) { - type = ST_FS8; - } else { - assert(dispatch_width == 16); - type = ST_FS16; - } - break; - case MESA_SHADER_COMPUTE: - type = ST_CS; - break; - default: - unreachable("fs_visitor::emit_shader_time_end missing code"); - } - int shader_time_index = brw_get_shader_time_index(brw, shader_prog, prog, - type); - /* Insert our code just before the final SEND with EOT. */ exec_node *end = this->instructions.get_tail(); assert(end && ((fs_inst *) end)->eot); @@ -631,16 +606,16 @@ fs_visitor::emit_shader_time_end() * trying to determine the time taken for single instructions. */ ibld.ADD(diff, diff, fs_reg(-2u)); - SHADER_TIME_ADD(ibld, shader_time_index, 0, diff); - SHADER_TIME_ADD(ibld, shader_time_index, 1, fs_reg(1u)); + SHADER_TIME_ADD(ibld, 0, diff); + SHADER_TIME_ADD(ibld, 1, fs_reg(1u)); ibld.emit(BRW_OPCODE_ELSE); - SHADER_TIME_ADD(ibld, shader_time_index, 2, fs_reg(1u)); + SHADER_TIME_ADD(ibld, 2, fs_reg(1u)); ibld.emit(BRW_OPCODE_ENDIF); } void fs_visitor::SHADER_TIME_ADD(const fs_builder &bld, -int shader_time_index, int shader_time_subindex, +int shader_time_subindex, fs_reg value) { int index = shader_time_index * 3 + shader_time_subindex; @@ -3823,7 +3798,7 @@ fs_visitor::run_vs() assign_common_binding_table_offsets(0); setup_vs_payload(); - if (INTEL_DEBUG & DEBUG_SHADER_TIME) + if (shader_time_index >= 0) emit_shader_time_begin(); emit_nir_code(); @@ -3833,7 +3808,7 @@ fs_visitor::run_vs() emit_urb_writes(); - if (INTEL_DEBUG & DEBUG_SHADER_TIME) + if (shader_time_index >= 0) emit_shader_time_end(); calculate_cfg(); @@ -3871,7 +3846,7 @@ fs_visitor::run_fs() } else if (brw->use_rep_send && dispatch_width == 16) { emit_repclear_shader(); } else { - if (INTEL_DEBUG & DEBUG_SHADER_TIME) + if (shader_time_index >= 0) emit_shader_time_begin(); calculate_urb_setup(); @@ -3906,7 +3881,7 @@ fs_visitor::run_fs() emit_fb_writes(); - if (INTEL_DEBUG & DEBUG_SHADER_TIME) + if (shader_time_index >= 0) emit_shader_time_end(); calculate_cfg(); @@ -3950,7 +3925,7 @@ fs_visitor::run_cs() setu
[Mesa-dev] [PATCH 01/16] i965: Replace some instances of brw->gen with devinfo->gen
--- src/mesa/drivers/dri/i965/brw_fs.cpp | 4 ++-- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 8 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 5563c5a..ac65202 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3187,7 +3187,7 @@ fs_visitor::lower_integer_multiplication() fs_reg high(GRF, alloc.allocate(dispatch_width / 8), inst->dst.type, dispatch_width); - if (brw->gen >= 7) { + if (devinfo->gen >= 7) { fs_reg src1_0_w = inst->src[1]; fs_reg src1_1_w = inst->src[1]; @@ -3616,7 +3616,7 @@ fs_visitor::setup_vs_payload() void fs_visitor::setup_cs_payload() { - assert(brw->gen >= 7); + assert(devinfo->gen >= 7); payload.num_regs = 1; } diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 4770838..cafe64a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1344,7 +1344,7 @@ fs_visitor::emit_interpolation_setup_gen6() struct brw_reg g1_uw = retype(brw_vec1_grf(1, 0), BRW_REGISTER_TYPE_UW); fs_builder abld = bld.annotate("compute pixel centers"); - if (brw->gen >= 8 || dispatch_width == 8) { + if (devinfo->gen >= 8 || dispatch_width == 8) { /* The "Register Region Restrictions" page says for BDW (and newer, * presumably): * @@ -1623,7 +1623,7 @@ fs_visitor::emit_single_fb_write(const fs_builder &bld, /* On pre-SNB, we have to interlace the color values. LOAD_PAYLOAD * will do this for us if we just give it a COMPR4 destination. */ - if (brw->gen < 6 && exec_size == 16) + if (devinfo->gen < 6 && exec_size == 16) load->dst.reg |= BRW_MRF_COMPR4; write = ubld.emit(FS_OPCODE_FB_WRITE); @@ -1934,7 +1934,7 @@ fs_visitor::emit_urb_writes() void fs_visitor::emit_cs_terminate() { - assert(brw->gen >= 7); + assert(devinfo->gen >= 7); /* We are getting the thread ID from the compute shader header */ assert(stage == MESA_SHADER_COMPUTE); @@ -1956,7 +1956,7 @@ fs_visitor::emit_cs_terminate() void fs_visitor::emit_barrier() { - assert(brw->gen >= 7); + assert(devinfo->gen >= 7); /* We are getting the barrier ID from the compute shader header */ assert(stage == MESA_SHADER_COMPUTE); -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 13/16] i965/vs: Pass the current set of clip planes through run() and run_vs()
Previously, these were pulled out of the GL context conditionally based on whether we were running ff/ARB or a GLSL program. Now, we just pass them in so that the visitor doesn't have to grab them itself. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 4 ++-- src/mesa/drivers/dri/i965/brw_fs.h| 8 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 11 +-- src/mesa/drivers/dri/i965/brw_vec4.cpp| 8 src/mesa/drivers/dri/i965/brw_vec4.h | 4 ++-- src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 4 ++-- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp| 4 +--- 7 files changed, 20 insertions(+), 23 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index bf04e26..23f60c2 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3791,7 +3791,7 @@ fs_visitor::allocate_registers() } bool -fs_visitor::run_vs() +fs_visitor::run_vs(gl_clip_plane *clip_planes) { assert(stage == MESA_SHADER_VERTEX); @@ -3806,7 +3806,7 @@ fs_visitor::run_vs() if (failed) return false; - emit_urb_writes(); + emit_urb_writes(clip_planes); if (shader_time_index >= 0) emit_shader_time_end(); diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 4db5a91..e0a8984 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -84,8 +84,8 @@ public: fs_reg vgrf(const glsl_type *const type); void import_uniforms(fs_visitor *v); - void setup_uniform_clipplane_values(); - void compute_clip_distance(); + void setup_uniform_clipplane_values(gl_clip_plane *clip_planes); + void compute_clip_distance(gl_clip_plane *clip_planes); uint32_t gather_channel(int orig_chan, uint32_t sampler); void swizzle_result(ir_texture_opcode op, int dest_components, @@ -104,7 +104,7 @@ public: void DEP_RESOLVE_MOV(const brw::fs_builder &bld, int grf); bool run_fs(bool do_rep_send); - bool run_vs(); + bool run_vs(gl_clip_plane *clip_planes); bool run_cs(); void optimize(); void allocate_registers(); @@ -271,7 +271,7 @@ public: fs_reg src0_alpha, unsigned components, unsigned exec_size, bool use_2nd_half = false); void emit_fb_writes(); - void emit_urb_writes(); + void emit_urb_writes(gl_clip_plane *clip_planes); void emit_cs_terminate(); void emit_barrier(); diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 9ce8491..395394c 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1715,9 +1715,8 @@ fs_visitor::emit_fb_writes() } void -fs_visitor::setup_uniform_clipplane_values() +fs_visitor::setup_uniform_clipplane_values(gl_clip_plane *clip_planes) { - gl_clip_plane *clip_planes = brw_select_clip_planes(ctx); const struct brw_vue_prog_key *key = (const struct brw_vue_prog_key *) this->key; @@ -1731,7 +1730,7 @@ fs_visitor::setup_uniform_clipplane_values() } } -void fs_visitor::compute_clip_distance() +void fs_visitor::compute_clip_distance(gl_clip_plane *clip_planes) { struct brw_vue_prog_data *vue_prog_data = (struct brw_vue_prog_data *) prog_data; @@ -1760,7 +1759,7 @@ void fs_visitor::compute_clip_distance() if (outputs[clip_vertex].file == BAD_FILE) return; - setup_uniform_clipplane_values(); + setup_uniform_clipplane_values(clip_planes); const fs_builder abld = bld.annotate("user clip distances"); @@ -1781,7 +1780,7 @@ void fs_visitor::compute_clip_distance() } void -fs_visitor::emit_urb_writes() +fs_visitor::emit_urb_writes(gl_clip_plane *clip_planes) { int slot, urb_offset, length; struct brw_vs_prog_data *vs_prog_data = @@ -1796,7 +1795,7 @@ fs_visitor::emit_urb_writes() /* Lower legacy ff and ClipVertex clipping to clip distances */ if (key->base.userclip_active && !prog->UsesClipDistanceOut) - compute_clip_distance(); + compute_clip_distance(clip_planes); /* If we don't have any valid slots to write, just do a minimal urb write * send to terminate the shader. */ diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 093802c..9c45034 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1706,7 +1706,7 @@ vec4_visitor::emit_shader_time_write(int shader_time_subindex, src_reg value) } bool -vec4_visitor::run() +vec4_visitor::run(gl_clip_plane *clip_planes) { sanity_param_count = prog->Parameters->NumParameters; @@ -1728,7 +1728,7 @@ vec4_visitor::run() base_ir = NULL; if (key->userclip_active && !prog->UsesClipDistanceOut) - setup_uniform_clipplane_values(); + setup_uniform_clipplane_values(clip_pla
[Mesa-dev] [PATCH 09/16] i965: Add compiler options to brw_compiler
This creates the options at screen cration time and then we just copy them into the context at context creation time. We also move is_scalar to the brw_compiler structure. We also end up manually setting some values that the core would have set by default for us. Fortunately, there are only two non-zero shader compiler option defaults that we aren't overriding anyway so this isn't a big deal. --- src/mesa/drivers/dri/i965/brw_context.c | 46 ++ src/mesa/drivers/dri/i965/brw_context.h | 1 - src/mesa/drivers/dri/i965/brw_shader.cpp | 49 +++- src/mesa/drivers/dri/i965/brw_shader.h | 3 ++ src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 +- src/mesa/drivers/dri/i965/intel_screen.c | 1 + 6 files changed, 56 insertions(+), 46 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 327a668..33cdbd2 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -50,6 +50,7 @@ #include "brw_context.h" #include "brw_defines.h" +#include "brw_shader.h" #include "brw_draw.h" #include "brw_state.h" @@ -68,8 +69,6 @@ #include "tnl/t_pipeline.h" #include "util/ralloc.h" -#include "glsl/nir/nir.h" - /*** * Mesa's Driver Functions ***/ @@ -558,48 +557,12 @@ brw_initialize_context_constants(struct brw_context *brw) ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxInputComponents = 128; } - static const nir_shader_compiler_options nir_options = { - .native_integers = true, - /* In order to help allow for better CSE at the NIR level we tell NIR - * to split all ffma instructions during opt_algebraic and we then - * re-combine them as a later step. - */ - .lower_ffma = true, - .lower_sub = true, - }; - /* We want the GLSL compiler to emit code that uses condition codes */ for (int i = 0; i < MESA_SHADER_STAGES; i++) { - ctx->Const.ShaderCompilerOptions[i].MaxIfDepth = brw->gen < 6 ? 16 : UINT_MAX; - ctx->Const.ShaderCompilerOptions[i].EmitCondCodes = true; - ctx->Const.ShaderCompilerOptions[i].EmitNoNoise = true; - ctx->Const.ShaderCompilerOptions[i].EmitNoMainReturn = true; - ctx->Const.ShaderCompilerOptions[i].EmitNoIndirectInput = true; - ctx->Const.ShaderCompilerOptions[i].EmitNoIndirectOutput = -(i == MESA_SHADER_FRAGMENT); - ctx->Const.ShaderCompilerOptions[i].EmitNoIndirectTemp = -(i == MESA_SHADER_FRAGMENT); - ctx->Const.ShaderCompilerOptions[i].EmitNoIndirectUniform = false; - ctx->Const.ShaderCompilerOptions[i].LowerClipDistance = true; + ctx->Const.ShaderCompilerOptions[i] = + brw->intelScreen->compiler->glsl_compiler_options[i]; } - ctx->Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].OptimizeForAOS = true; - ctx->Const.ShaderCompilerOptions[MESA_SHADER_GEOMETRY].OptimizeForAOS = true; - - if (brw->scalar_vs) { - /* If we're using the scalar backend for vertex shaders, we need to - * configure these accordingly. - */ - ctx->Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].EmitNoIndirectOutput = true; - ctx->Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].EmitNoIndirectTemp = true; - ctx->Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].OptimizeForAOS = false; - - ctx->Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].NirOptions = &nir_options; - } - - ctx->Const.ShaderCompilerOptions[MESA_SHADER_FRAGMENT].NirOptions = &nir_options; - ctx->Const.ShaderCompilerOptions[MESA_SHADER_COMPUTE].NirOptions = &nir_options; - /* ARB_viewport_array */ if (brw->gen >= 6 && ctx->API == API_OPENGL_CORE) { ctx->Const.MaxViewports = GEN6_NUM_VIEWPORTS; @@ -832,9 +795,6 @@ brwCreateContext(gl_api api, if (INTEL_DEBUG & DEBUG_AUB) drm_intel_bufmgr_gem_set_aub_dump(brw->bufmgr, true); - if (brw->gen >= 8 && !(INTEL_DEBUG & DEBUG_VEC4VS)) - brw->scalar_vs = true; - brw_initialize_context_constants(brw); ctx->Const.ResetStrategy = notify_reset diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 58119ee..d8fcfff 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -1137,7 +1137,6 @@ struct brw_context bool has_pln; bool no_simd8; bool use_rep_send; - bool scalar_vs; /** * Some versions of Gen hardware don't do centroid interpolation correctly diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index 3ac5ef1..683946b 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -84,6 +84,53 @@ brw_compiler_create(void *mem_ctx, const struct brw_device_info *devinfo) brw_fs_alloc_reg_sets(compiler); brw_vec4_alloc_reg_set(compiler); +
[Mesa-dev] [PATCH 07/16] i965/fs: Do the no16 perf logging directly in fs_visitor::no16()
While we're at it, we'll drop the note about 10-20% performance loss. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 13 ++--- 1 file changed, 2 insertions(+), 11 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index a9d9f37..40e2c44 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -710,12 +710,7 @@ fs_visitor::no16(const char *msg) } else { simd16_unsupported = true; - if (brw->perf_debug) { - if (no16_msg) -ralloc_strcat(&no16_msg, msg); - else -no16_msg = ralloc_strdup(mem_ctx, msg); - } + perf_debug("SIMD16 shader failed to compile: %s", msg); } } @@ -4042,14 +4037,10 @@ brw_wm_fs_emit(struct brw_context *brw, /* Try a SIMD16 compile */ v2.import_uniforms(&v); if (!v2.run_fs()) { -perf_debug("SIMD16 shader failed to compile, falling back to " - "SIMD8 at a 10-20%% performance cost: %s", v2.fail_msg); +perf_debug("SIMD16 shader failed to compile: %s", v2.fail_msg); } else { simd16_cfg = v2.cfg; } - } else { - perf_debug("SIMD16 shader unsupported, falling back to " -"SIMD8 at a 10-20%% performance cost: %s", v.no16_msg); } } -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev