For UPC code generation, we're building an alternate method of accessing thread-local data that does not depend upon operating system support of the __thread qualifier.
The motivation for this change is that we've noticed that __thread has varying levels of support across operating system/hardware platforms, and that when used extensively, we've seen capacity limitations on some target systems. UPC programs, when compiled in "pthreads mode" implicitly define all normal, file scoped or static, variables as being thread-local, which can lead to many TLS variables or to a TLS section that is quite large. The alternate implementation of TLS begins by targeting all TLS variables to a special named section. As an example, the declaration, __thread int x; can be thought of as being re-written into: int x __attribute__ ((section("tls_section"))); The runtime will allocate a per-thread block of memory that is the size of "tls_section", and initialized by the contents of that dummy section. This per-thread TLS base address will be maintained in an OS-dependent fashion as a per-thread value that will be returned by a function, called __get_tls(), which will obtain the per-thread value (possibly calling a function an OS-supplied function, for example, pthread_getspecific()). All references to 'x' will be rewritten by the UPC-specific gimplify pass into: *((&x - __tls_section_start) + __get_tls()) Above, "&x" is the address of 'x' derived in the conventional fashion as its address inside the TLS dummy section, which starts at the address given by "__tls_section_start". The gimplify code that currently implements this calculation looks like this: tls_base = lookup_name (get_identifier (UPC_TLS_BEGIN_NAME_STR)); if (!tls_base) fatal_error ("UPC thread-local section start address not found. " "Cannot find a definition for " UPC_TLS_BEGIN_NAME_STR); tls_base = build1 (ADDR_EXPR, char_ptr_type, tls_base); /* Refer to a shadow variable so that we don't try to re-gimplify * this TLS variable reference. */ var_addr = shadow_var_addr (var_decl); tls_offset = build_binary_op (MINUS_EXPR, convert (ptrdiff_type_node, var_addr), convert (ptrdiff_type_node, tls_base), 0); if (!useless_type_conversion_p (sizetype, TREE_TYPE (tls_offset))) tls_offset = convert (sizetype, tls_offset); tls_var_addr = build2 (POINTER_PLUS_EXPR, char_ptr_type, cfun->upc_thread_ctx_tmp, tls_offset); tls_ref = build_fold_indirect_ref (tls_var_addr); *expr_p = tls_ref; return GS_OK; (If you see any opportunities to improve/correct this code, please feel free to comment.) Above, you'll see a reference to "cfun->upc_thread_ctx_tmp"; this is a temporary variable that holds the value returned from __get_tls(). The idea is to call __get_tls() only once upon entry to the current function being compiled, and to re-use its value where needed. I made a first attempt at implementing this caching of the __get_tls() value, but have so far been unsuccessful. Here's the current implementation: if (!cfun->upc_thread_ctx_tmp) { const char *libfunc_name = UPC_GET_TLS_LIBCALL; tree libfunc, lib_call, tmp; libfunc = lookup_name (get_identifier (libfunc_name)); if (!libfunc) internal_error ("runtime function %s not found", libfunc_name); lib_call = build_function_call (libfunc, NULL_TREE); if (!lang_hooks.types_compatible_p (char_ptr_type, TREE_TYPE (lib_call))) lib_call = build1 (NOP_EXPR, char_ptr_type, lib_call); tmp = create_tmp_var_raw (char_ptr_type, "TLS"); TREE_READONLY (tmp) = 1; DECL_INITIAL (tmp) = lib_call; /* Record the TLS base address at the outermost level of * this function. */ DECL_CONTEXT (tmp) = current_function_decl; DECL_SEEN_IN_BIND_EXPR_P (tmp) = 1; declare_vars (tmp, DECL_SAVED_TREE (current_function_decl), false); cfun->upc_thread_ctx_tmp = tmp; } (The code from "TREE_READONLY" to "DECL_SEEN_IN_BIND_EXPR" above is cribbed from "gimple_add_tmp_var()" and "gimplify_init_constructor()".) The idea above is to initialize a temporary variable at the outer scope of the current function. Presumably, setting the initial value to the value returned by calling __get_tls(), and then calling "declare_vars()" to declare the temp. variable at the outermost scope of the function will do the job, but this code isn't having the intended effect. My sense is that the DECL_INITIAL() value above is being ignored and that code isn't being generated for it, and it seems possible that it won't be properly rescanned for gimplification. I'd appreciate any observations that you might have on why the implementation above doesn't work, and how to re-implement this section of code so that it has the desired effect. Perhaps, there's is code in GCC that currently does something like this, that I can refer to? There are some workarounds that I can think of, including just calling __get_tls() every time it's needed, and letting the optimizer commonize calls to that function (on the assumption that the function is declared with __attribute__(("const")) ), but I'd rather find a way that generates reasonable code without the need for an optimization pass to fix things up. Thanks in advance for your help/suggestions.