Ulrich Pegelow <ulrich.pege...@tongareva.de> writes:

>>> Please consider that changes in that place also will require changes to
>>> the equivalent OpenCL code which might be anything but trivial.
>> 
>> Oh no! I was thinking of the dt_iop_clip_and_zoom_demosaic_{half,third}_*() 
>> functions in imageop_math.c. Those don't have OpenCL implementations, 
>> though? Certainly getting into revising OpenCL code would be a way more 
>> invasive effort.
>> 
>
> Well there are:
>
> demosaic_vng.cl:clip_and_zoom_demosaic_third_size_xtrans()
> demosaic_ppg.cl:clip_and_zoom_demosaic_half_size()
>

Oh no indeed! I haven't touched the floating point variants, and utterly failed 
to note this.

I've only been working on the non-float versions. These at least have no OpenCL 
equivalents, yes? They are only called from _init_f() in mipmap_cache.c for 
previews (which then never pass through the float downscale), as these are now 
downscaled while still mosaiced (which, alas, results in artifacts, but allows 
for processing mosaiced data through the pre-demosaic stages of the pipeline).

I believe that the float downscale code (CPU or OpenCL) is called in demosaic 
in the case of low quality thumbnails or in the full pipeline when the image is 
zoomed out and "demosaicing for zoomed out darkroom mode" is "always bilinear 
(fast)".

>> I also remember now that I wanted to look into whether taking advantage of 
>> box filters being separable would speed up the code. But that would involve 
>> allocating a bit of memory so might also be something about which to be 
>> conservative.
>
> I have not looked into your code, so I don't know how large the base of 
> the box filters is. But probably you implement them as gliding window 
> filters. That's exactly something you can't implement efficiently in OpenCL.
>

Right now the implementation is way too naive, not even a gliding window, just 
iterating over the source pixels. But a gliding window would be better.

> We have that situation in some modules like highpass coming from times 
> much before OpenCL. In the end I implemented a gaussian filter in OpenCL 
> that tries to mimic the box filter as close as possible. Still that is 
> not an ideal way ...

I actually tried a Gaussian filter and overall it produced very nice results, 
but was slow and gave artifacts around the edges of highlights (see 
https://redmine.darktable.org/issues/11167#note-28). If I recall, those 
highlight artifacts were actually worse, as the larger sigma of the Gaussian 
filter spread them around a bit more. But your OpenCL Gaussian implementation 
looks really nice, and makes me want try it for the float variants which are 
called later on in the pipeline, when highlights shouldn't be a problem.

The upshot of 11167 seemed to be to implement good-enough and fast-enough 
downscaling code for now (CPU and SSE path), and find a better algorithm 
eventually. If the uint16 variants have no OpenCL equivalents, I think there's 
latitude there to improve the CPU/SSE paths for 2.4.0.

For the float variants, in the case of thumbnails and zoomed out darkroom view 
I don't see any artifacts with the extant code on the CPU path. I'm thinking 
it's fine for now to leave the float code as is, hence requiring no changes to 
the corresponding OpenCL code. But eventually to consider converting the float 
variants to use a Gaussian filter.

Dan

> Ulrich
>

[...]

___________________________________________________________________________
darktable developer mailing list
to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org

Reply via email to