[Lldb-commits] [lldb] [MLIR] Enabling Intel GPU Integration. (PR #65539)

2023-09-06 Thread Ronan Keryell via lldb-commits

https://github.com/keryell commented:

Quite interesting!
At some point it would be nice to have some design document or documentation 
somewhere explaining how all these MLIR runners works, including this one.
Globally this PR add a SYCL runner, but it is very specific for Intel Level 0.
It would be nice to have in the future some generalization, like SYCL using 
OpenCL interoperability interface to run the SPIR-V kernels or even native 
kernels.

https://github.com/llvm/llvm-project/pull/65539
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [MLIR] Enabling Intel GPU Integration. (PR #65539)

2023-09-06 Thread Ronan Keryell via lldb-commits


@@ -116,6 +116,7 @@ 
add_definitions(-DMLIR_ROCM_CONVERSIONS_ENABLED=${MLIR_ENABLE_ROCM_CONVERSIONS})
 
 set(MLIR_ENABLE_CUDA_RUNNER 0 CACHE BOOL "Enable building the mlir CUDA 
runner")
 set(MLIR_ENABLE_ROCM_RUNNER 0 CACHE BOOL "Enable building the mlir ROCm 
runner")
+set(MLIR_ENABLE_SYCL_RUNNER 0 CACHE BOOL "Enable building the mlir Sycl 
runner")

keryell wrote:

Please spell SYCL correctly.
```suggestion
set(MLIR_ENABLE_SYCL_RUNNER 0 CACHE BOOL "Enable building the mlir SYCL runner")
```
One could argue that `mlir` should be spelled `MLIR` but the train seems to 
have left long time ago. :-)

https://github.com/llvm/llvm-project/pull/65539
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [MLIR] Enabling Intel GPU Integration. (PR #65539)

2023-09-06 Thread Ronan Keryell via lldb-commits

https://github.com/keryell edited 
https://github.com/llvm/llvm-project/pull/65539
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [MLIR] Enabling Intel GPU Integration. (PR #65539)

2023-09-06 Thread Ronan Keryell via lldb-commits


@@ -0,0 +1,68 @@
+# CMake find_package() module for SYCL Runtime
+#
+# Example usage:
+#
+# find_package(SyclRuntime)

keryell wrote:

Shouldn't it
```suggestion
# find_package(SYCLRuntime)
```
everywhere?

https://github.com/llvm/llvm-project/pull/65539
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [MLIR] Enabling Intel GPU Integration. (PR #65539)

2023-09-06 Thread Ronan Keryell via lldb-commits


@@ -0,0 +1,223 @@
+//===- SyclRuntimeWrappers.cpp - MLIR SYCL wrapper library ===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Implements C wrappers around the sycl runtime library.
+//
+//===--===//
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef _WIN32
+#define SYCL_RUNTIME_EXPORT __declspec(dllexport)
+#else
+#define SYCL_RUNTIME_EXPORT
+#endif // _WIN32
+
+namespace {
+
+template 
+auto catchAll(F &&func) {
+  try {
+return func();
+  } catch (const std::exception &e) {
+fprintf(stdout, "An exception was thrown: %s\n", e.what());
+fflush(stdout);
+abort();
+  } catch (...) {
+fprintf(stdout, "An unknown exception was thrown\n");
+fflush(stdout);
+abort();
+  }
+}
+
+#define L0_SAFE_CALL(call) 
\
+  {
\
+ze_result_t status = (call);   
\
+if (status != ZE_RESULT_SUCCESS) { 
\
+  fprintf(stdout, "L0 error %d\n", status);
\
+  fflush(stdout);  
\
+  abort(); 
\
+}  
\
+  }
+
+} // namespace
+
+static sycl::device getDefaultDevice() {
+  auto platformList = sycl::platform::get_platforms();
+  for (const auto &platform : platformList) {
+auto platformName = platform.get_info();
+bool isLevelZero = platformName.find("Level-Zero") != std::string::npos;
+if (!isLevelZero)
+  continue;
+
+return platform.get_devices()[0];
+  }
+  throw std::runtime_error("getDefaultDevice failed");
+}
+
+#pragma clang diagnostic push
+#pragma clang diagnostic ignored "-Wglobal-constructors"
+
+// Create global device and context
+sycl::device syclDevice = getDefaultDevice();
+sycl::context syclContext = sycl::context(syclDevice);
+
+#pragma clang diagnostic pop
+
+struct QUEUE {
+  sycl::queue syclQueue_;
+
+  QUEUE() { syclQueue_ = sycl::queue(syclContext, syclDevice); }
+};
+
+static void *allocDeviceMemory(QUEUE *queue, size_t size, bool isShared) {
+  void *memPtr = nullptr;
+  if (isShared) {
+memPtr = sycl::aligned_alloc_shared(64, size, syclDevice, syclContext);
+  } else {
+memPtr = sycl::aligned_alloc_device(64, size, syclDevice, syclContext);
+  }
+  if (memPtr == nullptr) {
+throw std::runtime_error("mem allocation failed!");
+  }
+  return memPtr;
+}
+
+static void deallocDeviceMemory(QUEUE *queue, void *ptr) {
+  sycl::free(ptr, queue->syclQueue_);
+}
+
+static ze_module_handle_t loadModule(const void *data, size_t dataSize) {
+  assert(data);
+  ze_module_handle_t zeModule;
+  ze_module_desc_t desc = {ZE_STRUCTURE_TYPE_MODULE_DESC,
+   nullptr,
+   ZE_MODULE_FORMAT_IL_SPIRV,
+   dataSize,
+   (const uint8_t *)data,
+   nullptr,
+   nullptr};
+  auto zeDevice =
+  sycl::get_native(syclDevice);
+  auto zeContext =
+  sycl::get_native(syclContext);
+  L0_SAFE_CALL(zeModuleCreate(zeContext, zeDevice, &desc, &zeModule, nullptr));
+  return zeModule;
+}
+
+static sycl::kernel *getKernel(ze_module_handle_t zeModule, const char *name) {
+  assert(zeModule);
+  assert(name);
+  ze_kernel_handle_t zeKernel;
+  sycl::kernel *syclKernel;
+  ze_kernel_desc_t desc = {};
+  desc.pKernelName = name;
+
+  L0_SAFE_CALL(zeKernelCreate(zeModule, &desc, &zeKernel));
+  sycl::kernel_bundle kernelBundle =
+  sycl::make_kernel_bundle({zeModule},
+   syclContext);
+
+  auto kernel = sycl::make_kernel(
+  {kernelBundle, zeKernel}, syclContext);
+  syclKernel = new sycl::kernel(kernel);
+  return syclKernel;
+}
+
+static void launchKernel(QUEUE *queue, sycl::kernel *kernel, size_t gridX,
+ size_t gridY, size_t gridZ, size_t blockX,
+ size_t blockY, size_t blockZ, size_t sharedMemBytes,
+ void **params, size_t paramsCount) {
+  auto syclGlobalRange =
+  ::sycl::range<3>(blockZ * gridZ, blockY * gridY, blockX * gridX);
+  auto syclLocalRange = ::sycl::range<3>(blockZ, blockY, blockX);
+  sycl::nd_range<3> syclNdRange(
+  sycl::nd_range<3>(syclGloba

[Lldb-commits] [lldb] [MLIR] Enabling Intel GPU Integration. (PR #65539)

2023-09-07 Thread Ronan Keryell via lldb-commits


@@ -811,8 +812,13 @@ LogicalResult 
ConvertAllocOpToGpuRuntimeCallPattern::matchAndRewrite(
   // descriptor.
   Type elementPtrType = this->getElementPtrType(memRefType);
   auto stream = adaptor.getAsyncDependencies().front();
+
+  auto isHostShared = rewriter.create(
+  loc, llvmInt64Type, rewriter.getI64IntegerAttr(isShared));
+
   Value allocatedPtr =
-  allocCallBuilder.create(loc, rewriter, {sizeBytes, stream}).getResult();
+  allocCallBuilder.create(loc, rewriter, {sizeBytes, stream, isHostShared})
+  .getResult();

keryell wrote:

Technically, SYCL provides a more abstract memory management with 
`sycl::buffer` and `sycl::accessor` defining an implicit asynchronous task 
graph. The allocation details are left to the implementation, asynchronous or 
synchronous allocation is left to the implementers.
Here the lower-level synchronous USM memory management API of SYCL is used 
instead, similar to CUDA/HIP memory management.
So, should the `async` allocation in the example be synchronous instead?

https://github.com/llvm/llvm-project/pull/65539
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits


[Lldb-commits] [lldb] [MLIR] Enabling Intel GPU Integration. (PR #65539)

2023-09-12 Thread Ronan Keryell via lldb-commits


@@ -811,8 +812,13 @@ LogicalResult 
ConvertAllocOpToGpuRuntimeCallPattern::matchAndRewrite(
   // descriptor.
   Type elementPtrType = this->getElementPtrType(memRefType);
   auto stream = adaptor.getAsyncDependencies().front();
+
+  auto isHostShared = rewriter.create(
+  loc, llvmInt64Type, rewriter.getI64IntegerAttr(isShared));
+
   Value allocatedPtr =
-  allocCallBuilder.create(loc, rewriter, {sizeBytes, stream}).getResult();
+  allocCallBuilder.create(loc, rewriter, {sizeBytes, stream, isHostShared})
+  .getResult();

keryell wrote:

I guess that if the runtime uses actually synchronous allocation behind the 
scene and produces an always-ready async token, it works, even if non optimal.

https://github.com/llvm/llvm-project/pull/65539
___
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits