On Thu, Mar 26, 2015 at 11:41:30PM +0300, Ilya Verbin wrote: > Here is the latest patch for libgomp and mic plugin. > make check-target-libgomp using intelmic emul passed. > Also I used a testcase from the attachment.
This applies cleanly. > Latest ptx part is here, I guess: > https://gcc.gnu.org/ml/gcc-patches/2015-02/msg01407.html But the one Julian posted doesn't apply on top of your patch. If there is any interdiff needed on top of your patch, can it be posted against trunk + your patch? > +/* Insert mapping of host -> target address pairs to splay tree. */ > + > +static void > +gomp_splay_tree_insert_mapping (struct gomp_device_descr *devicep, > + struct addr_pair *host_addr, > + struct addr_pair *tgt_addr) > +{ > + struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt)); > + tgt->refcount = 1; > + tgt->array = gomp_malloc (sizeof (*tgt->array)); > + tgt->tgt_start = tgt_addr->start; > + tgt->tgt_end = tgt_addr->end; > + tgt->to_free = NULL; > + tgt->list_count = 0; > + tgt->device_descr = devicep; > + splay_tree_node node = tgt->array; > + splay_tree_key k = &node->key; > + k->host_start = host_addr->start; > + k->host_end = host_addr->end; > + k->tgt_offset = 0; > + k->refcount = 1; > + k->copy_from = false; > + k->tgt = tgt; > + node->left = NULL; > + node->right = NULL; > + splay_tree_insert (&devicep->mem_map, node); > +} What is the reason to register and allocate these one at a time, rather than using one struct target_mem_desc with one tgt->array for all splay tree nodes registered from one image? Perhaps you would just use tgt_start of 0 and tgt_end of 0 too (to make it clear it is special) and just use tgt_offset relative to that (i.e. absolute), but having to malloc each node individually and having to malloc a target_mem_desc for each one sounds expensive. Everything is freed just once anyway, isn't it? > @@ -654,6 +727,18 @@ void > GOMP_offload_register (void *host_table, enum offload_target_type > target_type, > void *target_data) > { > + int i; > + gomp_mutex_lock (®ister_lock); > + > + /* Load image to all initialized devices. */ > + for (i = 0; i < num_devices; i++) > + { > + struct gomp_device_descr *devicep = &devices[i]; > + if (devicep->type == target_type && devicep->is_initialized) > + gomp_offload_image_to_device (devicep, host_table, target_data); Shouldn't either this function, or gomp_offload_image_to_device lock also devicep->lock mutex and unlock at the end? Where exactly I guess depends on if the devicep->* hook calls should be guarded with the mutex or not. If yes, it should be this function and gomp_init_device. > + if (devicep->type != target_type || !devicep->is_initialized) > + continue; > + Similarly. > + devicep->unload_image_func (devicep->target_id, target_data); > + > + /* Remove mapping from splay tree. */ > + for (j = 0; j < num_funcs; j++) > + { > + struct splay_tree_key_s k; > + k.host_start = (uintptr_t) host_func_table[j]; > + k.host_end = k.host_start + 1; > + splay_tree_remove (&devicep->mem_map, &k); > + } > + > + for (j = 0; j < num_vars; j++) > + { > + struct splay_tree_key_s k; > + k.host_start = (uintptr_t) host_var_table[j*2]; > + k.host_end = k.host_start + (uintptr_t) host_var_table[j*2+1]; > + splay_tree_remove (&devicep->mem_map, &k); > + } > + } Aren't you leaking here all the tgt->array and tgt allocations here? Though, if you change it to just two allocations (one tgt, one array), you'd need to free just once. Jakub