Thanks Jo !

so now I'm working on the OpenCL port. I have this function in the IOP :

#ifdef HAVE_OPENCL int process_cl(struct dt_iop_module_t *self,
dt_dev_pixelpipe_iop_t *piece, cl_mem dev_in, cl_mem dev_out,
               const dt_iop_roi_t *const roi_in, const dt_iop_roi_t
*const roi_out) {   dt_iop_profilegamma_data_t *d =
(dt_iop_profilegamma_data_t *)piece->data;  
dt_iop_profilegamma_global_data_t *gd =
(dt_iop_profilegamma_global_data_t *)self->data;   cl_int err = -999;  
const int devid = piece->pipe->devid;   const int width = roi_in->width;
  const int height = roi_in->height;   const float noise = powf(2.,
d->noise_level);   const float grey = d->grey_point / 100.;     size_t
sizes[3] = { ROUNDUPWD(width), ROUNDUPHT(height), 1 };  
dt_opencl_set_kernel_arg(devid, gd->kernel_profilegamma_log, 0,
sizeof(cl_mem), (void *)&dev_in);   dt_opencl_set_kernel_arg(devid,
gd->kernel_profilegamma_log, 1, sizeof(cl_mem), (void *)&dev_out);  
dt_opencl_set_kernel_arg(devid, gd->kernel_profilegamma_log, 2,
sizeof(int), (void *)&width);   dt_opencl_set_kernel_arg(devid,
gd->kernel_profilegamma_log, 3, sizeof(int), (void *)&height);  
dt_opencl_set_kernel_arg(devid, gd->kernel_profilegamma_log, 4,
sizeof(float), (void *)&(d->camera_factor));  
dt_opencl_set_kernel_arg(devid, gd->kernel_profilegamma_log, 5,
sizeof(float), (void *)&(d->dynamic_range));  
dt_opencl_set_kernel_arg(devid, gd->kernel_profilegamma_log, 6,
sizeof(float), (void *)&noise);   dt_opencl_set_kernel_arg(devid,
gd->kernel_profilegamma_log, 7, sizeof(float), (void
*)&(d->shadows_range));   dt_opencl_set_kernel_arg(devid,
gd->kernel_profilegamma_log, 8, sizeof(float), (void *)&grey);     err =
dt_opencl_enqueue_kernel_2d(devid, gd->kernel_profilegamma_log, sizes);
  if(err != CL_SUCCESS) goto error;   return TRUE; error:  
dt_print(DT_DEBUG_OPENCL, "[opencl_profilegamma_log] couldn't enqueue
kernel! %d\n", err);   return FALSE; } #endif

and this one in basic.cl :

kernel void profilegamma_log (read_only image2d_t in, write_only
image2d_t out, unsigned int width, unsigned int height,         float
factor, float range, float noise_level, float shadows_range, float grey)
{   const unsigned int x = get_global_id(0);   const unsigned int y =
get_global_id(1);   if(x >= height || y >= width) return;   float4 i =
read_imagef(in, sampleri, (int2)(x, y));   const float4 min_val =
noise_level / (noise_level + grey);   const float4 Logmin_val =
log2(min_val);   i.xyz = (log2(factor * (i.xyz + noise_level) / (grey +
noise_level)) - shadows_range) / range;   write_imagef(out, (int2)(x,
y), i); }

While trying to run it, I get :

1,376046 [opencl_create_kernel] successfully loaded kernel
`profilegamma_log' (165) for device 0 …

7,995011 [opencl_events_flush] could not get event info for '[Read Image
(from device to host)]': -9999

7,995018 [opencl_events_flush] could not get event info for
'profilegamma_log': -9999

7,995021 [opencl_pixelpipe] could not run module 'profile_gamma' on gpu.
falling back to cpu path

7,995208 [opencl_pixelpipe (b)] late opencl error detected while copying
back to cpu buffer: -5

Any clue about what could be happening ?

Thanks,

Aurélien.


Le 18/09/2018 à 03:26, johannes hanika a écrit :
> heya,
>
> [..]
>> Actually, I even wonder if OpenCL is relevant for this as it's a linear 
>> operation performed on only one pixel at the time over the flatten array. I 
>> wouldn't be surprised if the OpenCL version were slower on some systems than 
>> a good SSE2 version.
> maybe on some systems. the thing with opencl is that you need to copy
> the buffer to the gpu and back at the end. if you have one module that
> interrupts the pipeline, you'll need to copy more (get your input
> buffer back to the cpu, process, copy back to gpu). this slows down
> the whole process significantly, even if the module would run at same
> speed on both devices.
>
>
>> Considering the code itself, my only remarks are for this line:
>>       for(size_t k = 1; k < (size_t)ch * roi_out->width * roi_out->height; 
>> k++)
>> First, is there a reason why you are using a size_t type? int or unsigned 
>> would be fine I think, and you wouldn't need a cast.
> you definitely want 64 bits for the counter if you go width*height
> (times channel count here, too). size_t happens to be unsigned 64-bit
> int on many systems. using stdint.h you could use uint64_t to be even
> clearer and maybe more portable. note that you could have used a
> nested loop for y and for x together with an openmp annotation
> "collapse(2)" to get similar results.
>
> and yes, please start at 0 :)
>
> cheers,
>  jo
> ___________________________________________________________________________
> darktable developer mailing list
> to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org
>


___________________________________________________________________________
darktable developer mailing list
to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org

Reply via email to