Hi Bjoern,

hm, Global memory size is 5.78GiB and Max allocation size is 2.89GiB. I
had a look at darktable's OpenCL code and opencl_memory_requirement from
darktablerc is compared against CL_DEVICE_GLOBAL_MEM_SIZE. So in your
case that should be 512 MB < 5.78GiB and the device should be used.

The relevant error message should be "[opencl_init] discarding device
$DEVICE due to insufficient global memory ($GLOBAL_MEM MB)". Do you see
that error message when running darktable with "-d opencl", and if yes,
what does it say?

I would understand it if the device/driver was blacklisted, or if the
system didn't have enough memory, but I'm curious as to why it works
when you set opencl_memory_requirement to slightly less than 512 MB. It
feels like a rather arbitrary limit.

cheers,
Simon









Am 08.03.19 um 16:22 schrieb Björn Sozumschein:
> Hi Simon,
> 
> thank you for the information!
> 
> I'm running Arch Linux by the way. The output of clinfo is:
> """
> Number of platforms                               1
>   Platform Name                                   Intel(R) OpenCL HD
> Graphics
>   Platform Vendor                                 Intel(R) Corporation
>   Platform Version                                OpenCL 2.1
>   Platform Profile                                FULL_PROFILE
>   Platform Extensions                             cl_khr_3d_image_writes
> cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images
> cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
> cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics
> cl_khr_local_int32_extended_atomics cl_intel_subgroups
> cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir
> cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics
> cl_intel_device_side_avc_motion_estimation cl_khr_priority_hints
> cl_khr_throttle_hints cl_khr_create_command_queue cl_khr_fp64
> cl_khr_subgroups cl_khr_il_program
> cl_intel_spirv_device_side_avc_motion_estimation
> cl_intel_spirv_media_block_io cl_intel_spirv_subgroups
> cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv
> cl_intel_packed_yuv cl_intel_motion_estimation
> cl_intel_advanced_motion_estimation cl_intel_va_api_media_sharing
>   Platform Host timer resolution                  1ns
>   Platform Extensions function suffix             INTEL
> 
>   Platform Name                                   Intel(R) OpenCL HD
> Graphics
> Number of devices                                 1
>   Device Name                                     Intel(R) Gen9 HD
> Graphics NEO
>   Device Vendor                                   Intel(R) Corporation
>   Device Vendor ID                                0x8086
>   Device Version                                  OpenCL 2.1 NEO
>   Driver Version                                  19.02.12143
>   Device OpenCL C Version                         OpenCL C 2.0
>   Device Type                                     GPU
>   Device Profile                                  FULL_PROFILE
>   Device Available                                Yes
>   Compiler Available                              Yes
>   Linker Available                                Yes
>   Max compute units                               24
>   Max clock frequency                             1050MHz
>   Device Partition                                (core)
>     Max number of sub-devices                     0
>     Supported partition types                     None
>     Supported affinity domains                    (n/a)
>   Max work item dimensions                        3
>   Max work item sizes                             256x256x256
>   Max work group size                             256
>   Preferred work group size multiple              32
>   Max sub-groups per work group                   32
>   Sub-group sizes (Intel)                         8, 16, 32
>   Preferred / native vector sizes                
>     char                                                16 / 16     
>     short                                                8 / 8      
>     int                                                  4 / 4      
>     long                                                 1 / 1      
>     half                                                 8 / 8       
> (cl_khr_fp16)
>     float                                                1 / 1      
>     double                                               1 / 1       
> (cl_khr_fp64)
>   Half-precision Floating-point support           (cl_khr_fp16)
>     Denormals                                     Yes
>     Infinity and NANs                             Yes
>     Round to nearest                              Yes
>     Round to zero                                 Yes
>     Round to infinity                             Yes
>     IEEE754-2008 fused multiply-add               Yes
>     Support is emulated in software               No
>   Single-precision Floating-point support         (core)
>     Denormals                                     Yes
>     Infinity and NANs                             Yes
>     Round to nearest                              Yes
>     Round to zero                                 Yes
>     Round to infinity                             Yes
>     IEEE754-2008 fused multiply-add               Yes
>     Support is emulated in software               No
>     Correctly-rounded divide and sqrt operations  Yes
>   Double-precision Floating-point support         (cl_khr_fp64)
>     Denormals                                     Yes
>     Infinity and NANs                             Yes
>     Round to nearest                              Yes
>     Round to zero                                 Yes
>     Round to infinity                             Yes
>     IEEE754-2008 fused multiply-add               Yes
>     Support is emulated in software               No
>   Address bits                                    64, Little-Endian
>   Global memory size                              6206062592 (5.78GiB)
>   Error Correction support                        No
>   Max memory allocation                           3103031296 (2.89GiB)
>   Unified memory for Host and Device              Yes
>   Shared Virtual Memory (SVM) capabilities        (core)
>     Coarse-grained buffer sharing                 Yes
>     Fine-grained buffer sharing                   No
>     Fine-grained system sharing                   No
>     Atomics                                       No
>   Minimum alignment for any data type             128 bytes
>   Alignment of base address                       1024 bits (128 bytes)
>   Preferred alignment for atomics                
>     SVM                                           64 bytes
>     Global                                        64 bytes
>     Local                                         64 bytes
>   Max size for global variable                    65536 (64KiB)
>   Preferred total size of global vars             3103031296 (2.89GiB)
>   Global Memory cache type                        Read/Write
>   Global Memory cache size                        524288 (512KiB)
>   Global Memory cache line size                   64 bytes
>   Image support                                   Yes
>     Max number of samplers per kernel             16
>     Max size for 1D images from buffer            193939456 pixels
>     Max 1D or 2D image array size                 2048 images
>     Base address alignment for 2D image buffers   4 bytes
>     Pitch alignment for 2D image buffers          4 pixels
>     Max 2D image size                             16384x16384 pixels
>     Max planar YUV image size                     16384x16380 pixels
>     Max 3D image size                             16384x16384x2048 pixels
>     Max number of read image args                 128
>     Max number of write image args                128
>     Max number of read/write image args           128
>   Max number of pipe args                         16
>   Max active pipe reservations                    1
>   Max pipe packet size                            1024
>   Local memory type                               Local
>   Local memory size                               65536 (64KiB)
>   Max number of constant args                     8
>   Max constant buffer size                        3103031296 (2.89GiB)
>   Max size of kernel argument                     1024
>   Queue properties (on host)                     
>     Out-of-order execution                        Yes
>     Profiling                                     Yes
>   Queue properties (on device)                   
>     Out-of-order execution                        Yes
>     Profiling                                     Yes
>     Preferred size                                131072 (128KiB)
>     Max size                                      67108864 (64MiB)
>   Max queues on device                            1
>   Max events on device                            1024
>   Prefer user sync for interop                    Yes
>   Profiling timer resolution                      83ns
>   Execution capabilities                         
>     Run OpenCL kernels                            Yes
>     Run native kernels                            No
>     Sub-group independent forward progress        Yes
>     IL version                                    SPIR-V_1.0
>     SPIR versions                                 1.2
>   printf() buffer size                            4194304 (4MiB)
>   Built-in kernels                               
> block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block_advanced_motion_estimate_bidirectional_check_intel;
>   Motion Estimation accelerator version (Intel)   2
>     Device-side AVC Motion Estimation version     1
>       Supports texture sampler use                Yes
>       Supports preemption                         No
>   Device Extensions                               cl_khr_3d_image_writes
> cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images
> cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics
> cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics
> cl_khr_local_int32_extended_atomics cl_intel_subgroups
> cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir
> cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics
> cl_intel_device_side_avc_motion_estimation cl_khr_priority_hints
> cl_khr_throttle_hints cl_khr_create_command_queue cl_khr_fp64
> cl_khr_subgroups cl_khr_il_program
> cl_intel_spirv_device_side_avc_motion_estimation
> cl_intel_spirv_media_block_io cl_intel_spirv_subgroups
> cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv
> cl_intel_packed_yuv cl_intel_motion_estimation
> cl_intel_advanced_motion_estimation cl_intel_va_api_media_sharing
> 
> NULL platform behavior
>   clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Intel(R) OpenCL HD
> Graphics
>   clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [INTEL]
>   clCreateContext(NULL, ...) [default]            Success [INTEL]
>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
>     Platform Name                                 Intel(R) OpenCL HD
> Graphics
>     Device Name                                   Intel(R) Gen9 HD
> Graphics NEO
>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in
> platform
>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
>     Platform Name                                 Intel(R) OpenCL HD
> Graphics
>     Device Name                                   Intel(R) Gen9 HD
> Graphics NEO
>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices
> found in platform
>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found
> in platform
>   clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
>     Platform Name                                 Intel(R) OpenCL HD
> Graphics
>     Device Name                                   Intel(R) Gen9 HD
> Graphics NEO
> 
> ICD loader properties
>   ICD loader Name                                 OpenCL ICD Loader
>   ICD loader Vendor                               OCL Icd free software
>   ICD loader Version                              2.2.12
>   ICD loader Profile                              OpenCL 2.2
> """
> 
> Best,
> Bjoern
> 
> Am Fr., 8. März 2019 um 13:12 Uhr schrieb Sturm Flut
> <sturmf...@lieberbiber.de <mailto:sturmf...@lieberbiber.de>>:
> 
>     Hi,
> 
>     Am 08.03.19 um 09:47 schrieb Björn Sozumschein:
>     > However, as far as I undestand, although 512 MB VRAM is reported, the
>     > integrated graphics may allocate more shared memory
> 
>     Yes and No. On one hand it depends on the chip generation, operating
>     system, driver and system configuration. Most of the current generation
>     Intel GPUs have a hard limit at either 32 GB or half the amount of RAM
>     installed in the system, whichever is lower (see [1]). On the other hand
>     the operating system must have free RAM left when the GPU driver wants
>     to allocate some more.
> 
>     Could you maybe post the output of the clinfo command (Linux
>     distributions have it in their repositories, Windows version here [2] at
>     the bottom)? On my ASUS notebook with an Intel Graphics 620 and 16 GB of
>     RAM it reports ~6 GB for "Global memory size" and 2 GB for "Max memory
>     allocation" when the system is idle.
> 
>     I see that most ThinkPad X1 Carbon with the Intel Graphics 520 seem to
>     have shipped with 8 or 16 GB of RAM. Since the Intel OpenCL driver
>     really seems to report slightly less than 512 MB in your case, there is
>     probably a reason for that.
> 
>     kind regards,
>     Simon
> 
> 
>     [1]
>     
> https://www.intel.com/content/www/us/en/support/articles/000020962/graphics-drivers.html
> 
>     [2] https://github.com/Oblomov/clinfo
> 
> ___________________________________________________________________________
> darktable developer mailing list to unsubscribe send a mail to
> darktable-dev+unsubscr...@lists.darktable.org
___________________________________________________________________________
darktable developer mailing list
to unsubscribe send a mail to darktable-dev+unsubscr...@lists.darktable.org

Reply via email to