Hi Weston, thank you for the inputs! I was watching the memory usage using "top -p `pidof test`", the size of the residence memory is not reduced.
With the new counter I saw the memory is freed immediately on arrow side. So this is related to my allocator. I actually disbaled jemalloc/mimalloc during arrow build but didn't realize the glibc allocator will also have similar behavior. I'll try to do more debugging on the allocator side then. Thanks again! thanks, -yuan On Fri, Jun 18, 2021 at 10:21 AM Weston Pace <weston.p...@gmail.com> wrote: > The only owner of input_batch that I can see here is the shared_ptr > that you are resetting so I would expect the memory to be freed. > > How are you measuring memory usage? The dynamic allocators (mimalloc > / jemalloc) don't always release memory as soon as they possibly can. > Even malloc will sometimes be forced to hang onto memory due to > fragmentation issues, etc. Can you try measuring memory usage with > arrow::default_memory_pool()->bytes_allocated(); ? > > On Thu, Jun 17, 2021 at 3:48 PM ZHOU Yuan <dunk...@gmail.com> wrote: > > > > Hi Arrow developers, > > > > ran into a memory footprint issue after releasing the record batch > > manually. The logic of my program is: > > 0. read many record batches > > 1. process on these batches > > 2. dump the intermediate results on disk > > 3. close the batches > > 4. logics for other operations > > > > I expect the memory footprint will drop after stage #3, however it looks > > like the memory is not released. > > I then write a small test program to check the behavior. Running with GDB > > the de-constructor of recordbatch > > is indeedly called in the "input_batch.reset()", but the memory is not > > released until I cancel the whole program. > > > > I understand the lifetime of recodrbatch is controlled by # of owners of > > shared_ptr, so it will be released eventually, > > but are there any APIs or ways to release it manually in the middle of my > > program? > > > > attached is the testing code snip. Thanks! > > > > ======= > > auto f0 = field("f0", float64()); > > auto f1 = field("f1", uint32()); > > auto sch = arrow::schema({f0, f1}); > > > > std::vector<std::string> input_data_string = { > > "[10, NaN, 4, 50, 52, 32, 11]", > > > > "[11, 13, 5, 51, null, 33, 12]"}; > > > > > > // prepare input record Batch > > std::vector<std::shared_ptr<Array>> array_list; > > int length = -1; > > int i = 0; > > for (auto data : input_data_string) { > > std::shared_ptr<Array> a0; > > > > ASSERT_NOT_OK(arrow::ipc::internal::json::ArrayFromJSON(sch->field(i++)->type(), > > > data.c_str(), &a0)); > > if (length == -1) { > > length = a0->length(); > > } > > assert(length == a0->length()); > > array_list.push_back(a0); > > } > > > > auto input_batch = RecordBatch::Make(sch, length, > std::move(array_list)); > > > > input_batch.reset(); // should be free here? > > std::this_thread::sleep_for(std::chrono::seconds(20)); > > thanks, -yuan >