> Why? Your module::argument::image_size and ::image_format cases don't > touch the explicit_arg iterator at all AFAICT, so it will be left > pointing one past the last general semantic argument
Ok, my mistake, I didn't think this through. > Hmmm... So you only need it as padding? Wouldn't it be possible for > you to declare samplers to be 0 bytes? Maybe it can be done by changing the type of the sampler arg from i32 to an empty struct. I'll have to try this, I don't know if it will work. On Sun, Jul 26, 2015 at 2:40 PM, Francisco Jerez <curroje...@riseup.net> wrote: > Zoltán Gilián <zoltan.gil...@gmail.com> writes: > >>> auto img = dynamic_cast<image_argument &>(**(explicit_arg - 1)) >> >> Ok, so it should be (explicit_arg - 2) for the image format, I >> presume. > > Why? Your module::argument::image_size and ::image_format cases don't > touch the explicit_arg iterator at all AFAICT, so it will be left > pointing one past the last general semantic argument > >> This will be incorrect, however, if the targets that need implicit >> argument for format metadata are indeed a strict superset of the ones >> that need dimension, as you mentioned before. The targets that only >> need format will break this code. Should I swap the order of the >> format and dimension implicit args to make this approach work under >> the aforementioned assumption? >> > Why would it be incorrect? The kernel::_args vector explicit_arg > iterates in contains explicit arguments only so it shouldn't make a > difference in which order you emit them as long as you don't interleave > other explicit arguments in between. > >>> It also seems like you've got rid of the static casts you had >>> previously? >> >> That is a mistake, I'll fix it. >> >>> My expectation here was that the compiler would be able to hard-code >>> sampler indices in the shader without the API passing any actual data in >>> the input buffer. Doesn't that work for you? >> >> Yes, that's correct, but this is also the case with images. If image >> index is uploaded explicitly, I don't see why it can't be done with >> sampler indices. But probably it's a better idea to send the sampler >> value rather than the index, in case the kernel needs it (e.g. the >> normalized or non-normalized nature of texture coordinates may have to >> be specified in the fetch instruction itself, and not by hardware >> registers), so I'll definitely change this. > > Some hardware (e.g. Intel's) will need an index (which can probably be > hardcoded in the shader for now) into a table of sampler configurations > which can be set-up later on by the driver at state-emit time. In any > case we can always add a new argument semantic enum later on for the > target to select sampler config vs index if different drivers turn out > to need different things. > >> But the bigger problem is, that the byte offsets of the kernel >> arguments are computed considering the sampler argument too, so the >> binary expects it to be present in the input vector. Furthermore I >> can't erase the sampler argument from the IR, because it is needed to >> make it possible for get_kernel_args to detect the sampler. But if >> sampler_argument::bind doesn't append 4 bytes (clang compiles >> sampler_t to i32) to the input vector, the binary will try to load the >> following arguments from wrong locations. > > Hmmm... So you only need it as padding? Wouldn't it be possible for > you to declare samplers to be 0 bytes? > >> >>> I don't think it's a good idea to use such obvious names for the >>> implicit argument types. Couldn't they collide with some type declared >> by the user that happens to have the same name? How about >> "__clover_image_size" or something similar? >> >> Indeed, that's a good point. I'd go with something like >> "__opencl_image_*" or "__llvm_image_*", because these strings will be >> added to llvm, and non-clover code may depend on them in the future. >> > Sounds good to me. > >>> (Identifiers starting with >>> double underscore are reserved for the implementation at least on C and >>> GLSL, not sure about OpenCL-C) >> >> I believe this is true for OpenCL C too, since it is an extension to >> C99. Identifiers starting with double underscores are reserved in C99. >> > IIRC identifiers starting with double underscore and single underscore > followed by some upper-case letter were reserved even in the original > ANSI C spec. > >> On Sat, Jul 25, 2015 at 1:06 PM, Francisco Jerez <curroje...@riseup.net> >> wrote: >>> Zoltan Gilian <zoltan.gil...@gmail.com> writes: >>> >>>> Read-only and write-only image arguments are recognized and >>>> distinguished. >>>> Attributes of the image arguments are passed to the kernel as implicit >>>> arguments. >>>> --- >>>> src/gallium/state_trackers/clover/core/kernel.cpp | 46 ++++++- >>>> src/gallium/state_trackers/clover/core/kernel.hpp | 13 +- >>>> src/gallium/state_trackers/clover/core/memory.cpp | 2 +- >>>> src/gallium/state_trackers/clover/core/module.hpp | 4 +- >>>> .../state_trackers/clover/llvm/invocation.cpp | 147 >>>> ++++++++++++++++++++- >>>> 5 files changed, 198 insertions(+), 14 deletions(-) >>>> >>>> diff --git a/src/gallium/state_trackers/clover/core/kernel.cpp >>>> b/src/gallium/state_trackers/clover/core/kernel.cpp >>>> index 0756f06..1a6c28f 100644 >>>> --- a/src/gallium/state_trackers/clover/core/kernel.cpp >>>> +++ b/src/gallium/state_trackers/clover/core/kernel.cpp >>>> @@ -158,13 +158,18 @@ >>>> kernel::exec_context::bind(intrusive_ptr<command_queue> _q, >>>> auto margs = find(name_equals(kern.name()), m.syms).args; >>>> auto msec = find(type_equals(module::section::text), m.secs); >>>> auto explicit_arg = kern._args.begin(); >>>> + image_argument *last_image_arg = nullptr; >>>> >>>> for (auto &marg : margs) { >>>> switch (marg.semantic) { >>>> - case module::argument::general: >>>> + case module::argument::general: { >>>> + auto image_arg = >>>> dynamic_cast<image_argument*>(explicit_arg->get()); >>>> + if (image_arg) { >>>> + last_image_arg = image_arg; >>>> + } >>>> (*(explicit_arg++))->bind(*this, marg); >>>> break; >>>> - >>>> + } >>>> case module::argument::grid_dimension: { >>>> const cl_uint dimension = grid_offset.size(); >>>> auto arg = argument::create(marg); >>>> @@ -182,6 +187,36 @@ >>>> kernel::exec_context::bind(intrusive_ptr<command_queue> _q, >>>> } >>>> break; >>>> } >>>> + case module::argument::image_size: { >>>> + assert(last_image_arg); >>>> + auto img = last_image_arg->get(); >>> >>> Instead of carrying around an extra variable during the loop, you could >>> achieve the same effect more locally by doing: >>> >>> | auto img = dynamic_cast<image_argument &>(**(explicit_arg - 1)) >>> | .get(); >>> >>> The cast to reference would also make sure that the argument is of the >>> specified type or otherwise throw std::bad_cast which is as good as an >>> assertion failure. >>> >>>> + std::vector<cl_uint> image_size({ >>>> + img->width(), >>>> + img->height(), >>>> + img->depth()}); >>> >>> Parentheses are not necessary around the curly braces, you could just: >>> >>> | std::vector<cl_uint> image_size { >>> | img->width(), >>> | img->height(), >>> | img->depth() >>> | }; >>> >>> It also seems like you've got rid of the static casts you had >>> previously? These image methods return a size_t value which may be of >>> different size than cl_uint and cause a warning. >>> >>>> + for (auto x: image_size) { >>> >>> Leave a space around the colon for consistency, also in the cases below. >>> >>>> + auto arg = argument::create(marg); >>>> + >>>> + arg->set(sizeof(x), &x); >>>> + arg->bind(*this, marg); >>>> + } >>>> + break; >>>> + } >>>> + case module::argument::image_format: { >>>> + assert(last_image_arg); >>>> + auto img = last_image_arg->get(); >>>> + cl_image_format fmt = img->format(); >>>> + std::vector<cl_uint> image_format({ >>>> + fmt.image_channel_data_type, >>>> + fmt.image_channel_order}); >>> >>> Same comments apply here. >>> >>>> + for (auto x: image_format) { >>>> + auto arg = argument::create(marg); >>>> + >>>> + arg->set(sizeof(x), &x); >>>> + arg->bind(*this, marg); >>>> + } >>>> + break; >>>> + } >>>> } >>>> } >>>> >>>> @@ -532,6 +567,13 @@ kernel::sampler_argument::set(size_t size, const void >>>> *value) { >>>> void >>>> kernel::sampler_argument::bind(exec_context &ctx, >>>> const module::argument &marg) { >>>> + auto v = bytes(ctx.samplers.size()); >>>> + >>>> + extend(v, module::argument::zero_ext, marg.target_size); >>>> + byteswap(v, ctx.q->device().endianness()); >>>> + align(ctx.input, marg.target_align); >>>> + insert(ctx.input, v); >>>> + >>> >>> My expectation here was that the compiler would be able to hard-code >>> sampler indices in the shader without the API passing any actual data in >>> the input buffer. Doesn't that work for you? >>> >>>> st = s->bind(*ctx.q); >>>> ctx.samplers.push_back(st); >>>> } >>>> diff --git a/src/gallium/state_trackers/clover/core/kernel.hpp >>>> b/src/gallium/state_trackers/clover/core/kernel.hpp >>>> index d6432a4..007ffcc 100644 >>>> --- a/src/gallium/state_trackers/clover/core/kernel.hpp >>>> +++ b/src/gallium/state_trackers/clover/core/kernel.hpp >>>> @@ -190,7 +190,14 @@ namespace clover { >>>> pipe_surface *st; >>>> }; >>>> >>>> - class image_rd_argument : public argument { >>>> + class image_argument : public argument { >>>> + public: >>>> + const image *get() const { return img; } >>> >>> Please write the return statement on its own line for consistency. >>> >>>> + protected: >>>> + image *img; >>>> + }; >>>> + >>>> + class image_rd_argument : public image_argument { >>>> public: >>>> virtual void set(size_t size, const void *value); >>>> virtual void bind(exec_context &ctx, >>>> @@ -198,11 +205,10 @@ namespace clover { >>>> virtual void unbind(exec_context &ctx); >>>> >>>> private: >>>> - image *img; >>>> pipe_sampler_view *st; >>>> }; >>>> >>>> - class image_wr_argument : public argument { >>>> + class image_wr_argument : public image_argument { >>>> public: >>>> virtual void set(size_t size, const void *value); >>>> virtual void bind(exec_context &ctx, >>>> @@ -210,7 +216,6 @@ namespace clover { >>>> virtual void unbind(exec_context &ctx); >>>> >>>> private: >>>> - image *img; >>>> pipe_surface *st; >>>> }; >>>> >>>> diff --git a/src/gallium/state_trackers/clover/core/memory.cpp >>>> b/src/gallium/state_trackers/clover/core/memory.cpp >>>> index 055336a..b852e68 100644 >>>> --- a/src/gallium/state_trackers/clover/core/memory.cpp >>>> +++ b/src/gallium/state_trackers/clover/core/memory.cpp >>>> @@ -189,7 +189,7 @@ image2d::image2d(clover::context &ctx, cl_mem_flags >>>> flags, >>>> const cl_image_format *format, size_t width, >>>> size_t height, size_t row_pitch, >>>> void *host_ptr) : >>>> - image(ctx, flags, format, width, height, 0, >>>> + image(ctx, flags, format, width, height, 1, >>>> row_pitch, 0, height * row_pitch, host_ptr) { >>>> } >>>> >>>> diff --git a/src/gallium/state_trackers/clover/core/module.hpp >>>> b/src/gallium/state_trackers/clover/core/module.hpp >>>> index 9d65688..5db0548 100644 >>>> --- a/src/gallium/state_trackers/clover/core/module.hpp >>>> +++ b/src/gallium/state_trackers/clover/core/module.hpp >>>> @@ -72,7 +72,9 @@ namespace clover { >>>> enum semantic { >>>> general, >>>> grid_dimension, >>>> - grid_offset >>>> + grid_offset, >>>> + image_size, >>>> + image_format >>>> }; >>>> >>>> argument(enum type type, size_t size, >>>> diff --git a/src/gallium/state_trackers/clover/llvm/invocation.cpp >>>> b/src/gallium/state_trackers/clover/llvm/invocation.cpp >>>> index 967284d..e5cd59c 100644 >>>> --- a/src/gallium/state_trackers/clover/llvm/invocation.cpp >>>> +++ b/src/gallium/state_trackers/clover/llvm/invocation.cpp >>>> @@ -340,18 +340,91 @@ namespace { >>>> PM.run(*mod); >>>> } >>>> >>>> + // Kernel metadata >>>> + >>>> + const llvm::MDNode * >>>> + get_kernel_metadata(const llvm::Function *kernel_func) { >>>> + auto mod = kernel_func->getParent(); >>>> + auto kernels_node = mod->getNamedMetadata("opencl.kernels"); >>>> + if (!kernels_node) { >>>> + return nullptr; >>>> + } >>>> + >>>> + const llvm::MDNode *kernel_node = nullptr; >>>> + for (unsigned i = 0; i < kernels_node->getNumOperands(); ++i) { >>>> +#if HAVE_LLVM >= 0x0306 >>>> + auto func = llvm::mdconst::dyn_extract<llvm::Function>( >>>> +#else >>>> + auto func = llvm::dyn_cast<llvm::Function>( >>>> +#endif >>>> + >>>> kernels_node->getOperand(i)->getOperand(0)); >>>> + if (func == kernel_func) { >>>> + kernel_node = kernels_node->getOperand(i); >>>> + break; >>>> + } >>>> + } >>>> + >>>> + return kernel_node; >>>> + } >>>> + >>>> + llvm::MDNode* >>>> + node_from_op_checked(const llvm::MDOperand &md_operand, >>>> + llvm::StringRef expect_name, >>>> + unsigned expect_num_args) >>>> + { >>>> + auto node = llvm::cast<llvm::MDNode>(md_operand); >>>> + assert(node->getNumOperands() == expect_num_args && >>>> + "Wrong number of operands."); >>>> + >>>> + auto str_node = llvm::cast<llvm::MDString>(node->getOperand(0)); >>>> + assert(str_node->getString() == expect_name && >>>> + "Wrong metadata node name."); >>>> + >>>> + return node; >>>> + } >>>> + >>>> + struct kernel_arg_md { >>>> + llvm::StringRef type_name; >>>> + llvm::StringRef access_qual; >>>> + kernel_arg_md(llvm::StringRef type_name_, llvm::StringRef >>>> access_qual_): >>>> + type_name(type_name_), access_qual(access_qual_) {} >>>> + }; >>>> + >>>> + std::vector<kernel_arg_md> >>>> + get_kernel_arg_md(const llvm::Function *kernel_func) { >>>> + auto num_args = kernel_func->getArgumentList().size(); >>>> + >>>> + auto kernel_node = get_kernel_metadata(kernel_func); >>>> + auto aq = node_from_op_checked(kernel_node->getOperand(2), >>>> + "kernel_arg_access_qual", num_args + >>>> 1); >>>> + auto ty = node_from_op_checked(kernel_node->getOperand(3), >>>> + "kernel_arg_type", num_args + 1); >>>> + >>>> + auto res = std::vector<kernel_arg_md>(); >>>> + res.reserve(num_args); >>> >>> std::vector<kernel_arg_md> res; >>> >>>> + for (unsigned i = 0; i < num_args; ++i) { >>>> + res.push_back(kernel_arg_md( >>>> + llvm::cast<llvm::MDString>(ty->getOperand(i+1))->getString(), >>>> + >>>> llvm::cast<llvm::MDString>(aq->getOperand(i+1))->getString())); >>>> + } >>>> + >>>> + return res; >>>> + } >>>> + >>>> std::vector<module::argument> >>>> get_kernel_args(const llvm::Module *mod, const std::string >>>> &kernel_name, >>>> const clang::LangAS::Map &address_spaces) { >>>> >>>> std::vector<module::argument> args; >>>> llvm::Function *kernel_func = mod->getFunction(kernel_name); >>>> + assert(kernel_func && "Kernel name not found in module."); >>>> + auto arg_md = get_kernel_arg_md(kernel_func); >>>> >>>> llvm::DataLayout TD(mod); >>>> + llvm::Type *size_type = >>>> + TD.getSmallestLegalIntType(mod->getContext(), sizeof(cl_uint) * >>>> 8); >>>> >>>> - for (llvm::Function::const_arg_iterator I = >>>> kernel_func->arg_begin(), >>>> - E = kernel_func->arg_end(); I != E; >>>> ++I) { >>>> - const llvm::Argument &arg = *I; >>>> + for (const auto &arg: kernel_func->args()) { >>>> >>>> llvm::Type *arg_type = arg.getType(); >>>> const unsigned arg_store_size = TD.getTypeStoreSize(arg_type); >>>> @@ -369,6 +442,68 @@ namespace { >>>> unsigned target_size = TD.getTypeStoreSize(target_type); >>>> unsigned target_align = TD.getABITypeAlignment(target_type); >>>> >>>> + llvm::StringRef type_name = arg_md[arg.getArgNo()].type_name; >>>> + llvm::StringRef access_qual = arg_md[arg.getArgNo()].access_qual; >>>> + >>>> + // Image >>>> + bool is_image2d = type_name == "image2d_t"; >>>> + bool is_image3d = type_name == "image3d_t"; >>> >>> These should be marked const. >>> >>>> + if (is_image2d || is_image3d) { >>>> + bool is_write_only = access_qual == "write_only"; >>>> + bool is_read_only = access_qual == "read_only"; >>>> + >>> >>> And these too. >>> >>>> + typename module::argument::type marg_type; >>>> + if (is_image2d && is_read_only) { >>>> + marg_type = module::argument::image2d_rd; >>>> + } else if (is_image2d && is_write_only) { >>>> + marg_type = module::argument::image2d_wr; >>>> + } else if (is_image3d && is_read_only) { >>>> + marg_type = module::argument::image3d_rd; >>>> + } else if (is_image3d && is_write_only) { >>>> + marg_type = module::argument::image3d_wr; >>>> + } else { >>>> + assert(0 && "Wrong image access qualifier"); >>>> + } >>>> + >>>> + args.push_back(module::argument(marg_type, >>>> + arg_store_size, target_size, >>>> + target_align, >>>> + module::argument::zero_ext)); >>>> + continue; >>>> + } >>>> + >>>> + // Sampler >>>> + if (type_name == "sampler_t") { >>>> + args.push_back(module::argument(module::argument::sampler, >>>> + arg_store_size, target_size, >>>> + target_align, >>>> + module::argument::zero_ext)); >>>> + continue; >>>> + } >>>> + >>>> + // Image size implicit argument >>>> + if (type_name == "image_size") { >>> >>> I don't think it's a good idea to use such obvious names for the >>> implicit argument types. Couldn't they collide with some type declared >>> by the user that happens to have the same name? How about >>> "__clover_image_size" or something similar? (Identifiers starting with >>> double underscore are reserved for the implementation at least on C and >>> GLSL, not sure about OpenCL-C) >>> >>>> + args.push_back(module::argument(module::argument::scalar, >>>> + sizeof(cl_uint), >>>> + >>>> TD.getTypeStoreSize(size_type), >>>> + >>>> TD.getABITypeAlignment(size_type), >>>> + module::argument::zero_ext, >>>> + >>>> module::argument::image_size)); >>>> + continue; >>>> + } >>>> + >>>> + // Image format implicit argument >>>> + if (type_name == "image_format") { >>>> + args.push_back(module::argument(module::argument::scalar, >>>> + sizeof(cl_uint), >>>> + >>>> TD.getTypeStoreSize(size_type), >>>> + >>>> TD.getABITypeAlignment(size_type), >>>> + module::argument::zero_ext, >>>> + >>>> module::argument::image_format)); >>>> + continue; >>>> + } >>>> + >>>> + // Other types >>>> if (llvm::isa<llvm::PointerType>(arg_type) && >>>> arg.hasByValAttr()) { >>>> arg_type = >>>> >>>> llvm::dyn_cast<llvm::PointerType>(arg_type)->getElementType(); >>>> @@ -413,9 +548,6 @@ namespace { >>>> // Append implicit arguments. XXX - The types, ordering and >>>> // vector size of the implicit arguments should depend on the >>>> // target according to the selected calling convention. >>>> - llvm::Type *size_type = >>>> - TD.getSmallestLegalIntType(mod->getContext(), sizeof(cl_uint) * >>>> 8); >>>> - >>>> args.push_back( >>>> module::argument(module::argument::scalar, sizeof(cl_uint), >>>> TD.getTypeStoreSize(size_type), >>>> @@ -744,6 +876,9 @@ clover::compile_program_llvm(const std::string &source, >>>> std::vector<char> code = compile_native(mod, triple, processor, >>>> get_debug_flags() & >>>> DBG_ASM, >>>> r_log); >>>> + // Target-specific passes may alter functions >>>> + kernels.clear(); >>>> + find_kernels(mod, kernels); >>> >>> Bah... To make this sort of mistake impossible I'd get rid of the >>> "kernels" argument of optimize() and build_module_native/llvm(), and >>> make them call find_kernels() directly. It would also be nice to have >>> find_kernels() return its result as return value instead of returning >>> void, so you could declare kernel arrays as const and initialize them at >>> the same point. >>> >>>> m = build_module_native(code, mod, kernels, address_spaces, >>>> r_log); >>>> break; >>>> } >>>> -- >>>> 2.4.6 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev