Currently, VM `PooledAllocator` releases its memory only when the underlying device fails to allocate more memory: https://github.com/apache/tvm/blob/553778885388a9eff4d611e1022baecd75c69088/src/runtime/vm/pooled_allocator.h#L60-L65. This causes a program crash when doing repeated inferences with dynamic batch size. See https://github.com/apache/tvm/issues/8233#issuecomment-862664330 for a minimal repro.
It seems there are two issues with it: 1. `AllocDataSpace` can be called outside of `PooledAllocator`, by `NDArray::Empty(...)` https://github.com/apache/tvm/blob/4d9bc9b4a3e9e8d3420efe60a52964fcd4c29c8d/src/runtime/ndarray.cc#L196-L197. That call is not protected by try/catch, so if almost all memory are held by `PooledAllocator` and `NDArray::Empty` is called, the program crashes with the following error: ``` terminate called after throwing an instance of 'tvm::runtime::InternalError' what(): [19:12:54] /home/masa/projects/dev/tvm/src/runtime/vulkan/vulkan_stream.cc:123: --------------------------------------------------------------- An error occurred during the execution of TVM. For more information, please see: https://tvm.apache.org/docs/errors.html --------------------------------------------------------------- Check failed: (__e == VK_SUCCESS) is false: Vulkan Error, code=-13: Unknown Vulkan error code Stack trace: 0: tvm::runtime::vulkan::VulkanStream::Synchronize() 1: _ZN3tvm7runtime6vulkan15VulkanDeviceAPI13FreeDataSpac 2: tvm::runtime::NDArray::Internal::DefaultDeleter(tvm::runtime::Object*) 3: tvm::runtime::NDArray::CopyTo(DLDevice const&) const 4: tvm::runtime::vm::CopyTo(tvm::runtime::ObjectRef, DLDevice const&) 5: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::vm::VirtualMachine::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_6>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&) 6: TVMFuncCall ``` 2. Even if I fix the above problem by making sure that all allocations go through `PooledAllocator`, my program still crashes due to too much allocation of host memory (haven't looked into why so much host memory is allocated when I'm running on a GPU target). Also, if I use the CPU target, the program is just killed after reaching the memory limit and before `try/catch` succeeds in catching memory allocation faiulure. So I think we need a better way to decide when to call `ReleaseAll()` early if necessary. Should we add a device API to query the max available memory and call `ReleaseAll()` when we reach say 90% ? This doesn't work if other memory-hungry processes are in use... cc @ganler @comaniac @yuchenj @trevor-m for thought. --- [Visit Topic](https://discuss.tvm.apache.org/t/vm-vm-pooledallocator-memory-release-strategy/10865/1) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/9b733dce186f082dcb84e7369188f4fc237e866015ad33ee7def2e185a197d66).