[PATCH] D29758: [OpenMP] Parallel reduction on the NVPTX device.

Alexey Bataev via Phabricator via cfe-commits Fri, 10 Feb 2017 07:43:11 -0800

ABataev added inline comments.


================
Comment at: lib/CodeGen/CGOpenMPRuntime.h:956-962
   virtual void emitReduction(CodeGenFunction &CGF, SourceLocation Loc,
                              ArrayRef<const Expr *> Privates,
                              ArrayRef<const Expr *> LHSExprs,
                              ArrayRef<const Expr *> RHSExprs,
                              ArrayRef<const Expr *> ReductionOps,
-                             bool WithNowait, bool SimpleReduction);
+                             bool WithNowait, bool SimpleReduction,
+                             OpenMPDirectiveKind ReductionKind);
----------------
Number of parameters is getting too big, maybe it is better to aggregate them 
into a struct/class?


================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:118-133
+// GPU Configuration:  This information can be derived from cuda registers,
+// however, providing compile time constants helps generate more efficient
+// code.  For all practical purposes this is fine because the configuration
+// is the same for all known NVPTX architectures.
+enum MachineConfiguration : unsigned {
+  WarpSize = 32,
+  // Number of bits required to represent a lane identifier, which is
----------------
It's better to use `///` style of comments here


================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:653-675
+    /// Build int32_t __kmpc_shuffle_int32(int32_t element,
+    /// int16_t lane_offset, int16_t warp_size);
+    llvm::Type *TypeParams[] = {CGM.Int32Ty, CGM.Int16Ty, CGM.Int16Ty};
+    llvm::FunctionType *FnTy =
+        llvm::FunctionType::get(CGM.Int32Ty, TypeParams, /*isVarArg*/ false);
+    RTLFn = CGM.CreateRuntimeFunction(FnTy, "__kmpc_shuffle_int32");
+    break;
----------------
Use `//` instead of `///`


================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:963-965
+enum CopyAction : unsigned {
+  RemoteLaneToThread,
+  ThreadCopy,
----------------
Comments here?


================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:969-974
+// Emit instructions to copy a Reduce list, which contains partially
+// aggregated values, in the specified direction.
+//
+// RemoteLaneToThread: Copy over a Reduce list from a remote lane in
+//   the warp using shuffle instructions.
+// ThreadCopy: Make a copy of a Reduce list on the thread's stack.
----------------
Use `///`


================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:1272
+
+// Emit a helper that reduces data across two OpenMP threads (lanes)
+// in the same warp.  It uses shuffle instructions to copy over data from
----------------
`///` style here


================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp:1488
+
+//
+// Design of OpenMP reductions on the GPU
----------------
`///` here


================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.h:245
 
-public:
+  /// \brief Emit a code for reduction clause.
+  ///
----------------
Bo \brief


================
Comment at: lib/CodeGen/CGOpenMPRuntimeNVPTX.h:263
+
+  /// \brief Returns specified OpenMP runtime function for the current OpenMP
+  /// implementation.  Specialized for the NVPTX device.
----------------
No \brief


https://reviews.llvm.org/D29758



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D29758: [OpenMP] Parallel reduction on the NVPTX device.

Reply via email to