@masahi Yea, I also found this issue a few months ago. If there's an OOM, the 
exception will just flee... So I added another try/catch block and tried to fix 
that by calling `ReleaseAll` when OOM. The exception issue is very weird and I 
was not able to debug it (the exception just fled away and I cannot catch it 
during GDB).

I am not sure if calling `ReleaseAll` in advance could help. What about 
creating a global memory state per device (but it gonna be a big change)? Or 
simply unifying all memory allocation into a "PoolAllocator" (just like what 
TensorFlow did) which also enables users to control the memory limit. Or let's 
say the memory pool should not hold a super huge memory chunk (e.g., 1 GB).

See also: 

https://github.com/apache/tvm/pull/8285

https://discuss.tvm.apache.org/t/logfatal-may-skip-some-important-errors-exceptions/10281





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/vm-vm-pooledallocator-memory-release-strategy/10865/2)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/9f652b5dd346708b65f394df3ddd524f0bed66abd61325d080a87ac17c35e838).

Reply via email to