On 2018-04-09 01:06 PM, Bas Vermeulen wrote: > The parameters for the compute engine are wrong when using > an E8860 on a big endian machine. > To fix this, convert the contents of struct dispatch_packet > to little endian. > > This ensures that get_global_id(0) and similar functions > in the OpenCL code get the correct endian values, and > makes my simple OpenCL program work correctly. > > Signed-off-by: Bas Vermeulen <b...@daedalean.ai> > --- > src/gallium/drivers/radeonsi/si_compute.c | 24 ++++++++++++------------ > 1 file changed, 12 insertions(+), 12 deletions(-) > > diff --git a/src/gallium/drivers/radeonsi/si_compute.c > b/src/gallium/drivers/radeonsi/si_compute.c > index dfede47605..8ac5b262c4 100644 > --- a/src/gallium/drivers/radeonsi/si_compute.c > +++ b/src/gallium/drivers/radeonsi/si_compute.c > @@ -564,18 +564,18 @@ static void si_setup_user_sgprs_co_v2(struct si_context > *sctx, > /* Upload dispatch ptr */ > memset(&dispatch, 0, sizeof(dispatch)); > > - dispatch.workgroup_size_x = info->block[0]; > - dispatch.workgroup_size_y = info->block[1]; > - dispatch.workgroup_size_z = info->block[2]; > + dispatch.workgroup_size_x = util_cpu_to_le16(info->block[0]); > + dispatch.workgroup_size_y = util_cpu_to_le16(info->block[1]); > + dispatch.workgroup_size_z = util_cpu_to_le16(info->block[2]); > > - dispatch.grid_size_x = info->grid[0] * info->block[0]; > - dispatch.grid_size_y = info->grid[1] * info->block[1]; > - dispatch.grid_size_z = info->grid[2] * info->block[2]; > + dispatch.grid_size_x = util_cpu_to_le32(info->grid[0] * > info->block[0]); > + dispatch.grid_size_y = util_cpu_to_le32(info->grid[1] * > info->block[1]); > + dispatch.grid_size_z = util_cpu_to_le32(info->grid[2] * > info->block[2]); > > - dispatch.private_segment_size = program->private_size; > - dispatch.group_segment_size = program->local_size; > + dispatch.private_segment_size = > util_cpu_to_le32(program->private_size); > + dispatch.group_segment_size = > util_cpu_to_le32(program->local_size); > > - dispatch.kernarg_address = kernel_args_va; > + dispatch.kernarg_address = util_cpu_to_le64(kernel_args_va); > > u_upload_data(sctx->b.const_uploader, 0, sizeof(dispatch), > 256, &dispatch, &dispatch_offset, > @@ -652,9 +652,9 @@ static bool si_upload_compute_input(struct si_context > *sctx, > > if (!code_object) { > for (i = 0; i < 3; i++) { > - kernel_args[i] = info->grid[i]; > - kernel_args[i + 3] = info->grid[i] * info->block[i]; > - kernel_args[i + 6] = info->block[i]; > + kernel_args[i] = util_cpu_to_le32(info->grid[i]); > + kernel_args[i + 3] = util_cpu_to_le32(info->grid[i] * > info->block[i]); > + kernel_args[i + 6] = util_cpu_to_le32(info->block[i]); > } > } > >
This patch is Reviewed-by: Michel Dänzer <michel.daen...@amd.com> For patch 1, I agree with Gert that a single version of the code using explicit shifts & masks would be better than multiple versions using bit-fields. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev