Nathaniel Smith added the comment:

It's not terribly difficult to write a crude-but-effective aligned allocator on 
top of raw malloc:

def aligned_malloc(size, alignment):
    assert alignment < 255
    raw_pointer = (uint8*) malloc(size + alignment)
    shift = alignment - (raw_pointer % alignment)
    assert 0 < shift <= alignment
    aligned_pointer = raw_pointer + shift
    *(aligned_pointer - 1) = shift
    return aligned_pointer

def aligned_free(uint8* pointer):
    shift = *(pointer - 1)
    free(pointer - shift)

But, this fallback and the official Win32 API both disallow the use of plain 
free() (like Victor points out in msg196834), so we can't just add an 
aligned_malloc slot to the PyMemAllocator struct. This kind of aligned 
allocation is effectively its own memory domain.

If native aligned allocation support were added to PyMalloc then it could 
potentially do better (e.g. by noticing that it already has a block on its 
freelist with the requested alignment and just returning that instead of 
overallocating). This might be the ideal solution for Raymond's use case, but I 
have no idea how much work it would be to mess around with PyMalloc innards.

Numpy doesn't currently use aligned allocation for anything, but we'd like to 
keep our options open. If we do end up using it in the future then there's a 
reasonable chance we might want to use it *without* the GIL held (e.g. for 
allocating temporary buffers inside C loops). OTOH we are also happy to 
implement the aligned allocation ourselves (either on top of the system APIs or 
directly) -- we just don't want to lose tracemalloc support when we do.

For numpy's purposes, I think the best approach would be to add a tracemalloc 
"escape valve", with an interface like:

PyMem_RecordAlloc(const char* domain, void* tag, size_t quantity, 
PyMem_RecordRealloc(const char* domain, void* old_tag, void* new_tag, size_t 
new_quantity)
PyMem_RecordFree(const char* domain, void* tag)

where the idea is that if after someone allocates memory (or potentially other 
discrete resources) directly without going through PyMem_*, they could then 
call these functions to tell tracemalloc what they just did.

This would be useful in a number of cases: in addition to tracking aligned 
allocations, it would make it possible to re-use the tracemalloc infrastructure 
to track GPU buffers allocated by CUDA/GPGPU-type code, mmap usage, hugetlbfs 
usage, etc. Potentially even open file descriptors if one wants to go there 
(seems pretty useful, actually).

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue18835>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to