The only owner of input_batch that I can see here is the shared_ptr
that you are resetting so I would expect the memory to be freed.

How are you measuring memory usage?  The dynamic allocators (mimalloc
/ jemalloc) don't always release memory as soon as they possibly can.
Even malloc will sometimes be forced to hang onto memory due to
fragmentation issues, etc.  Can you try measuring memory usage with
arrow::default_memory_pool()->bytes_allocated(); ?

On Thu, Jun 17, 2021 at 3:48 PM ZHOU Yuan <dunk...@gmail.com> wrote:
>
> Hi Arrow developers,
>
> ran into a memory footprint issue after releasing the record batch
> manually. The logic of my program is:
> 0. read many record batches
> 1. process on these batches
> 2. dump the intermediate results on disk
> 3. close the batches
> 4. logics for other operations
>
> I expect the memory footprint will drop after stage #3, however it looks
> like the memory is not released.
> I then write a small test program to check the behavior. Running with GDB
> the de-constructor of recordbatch
> is indeedly called in the "input_batch.reset()", but the memory is not
> released until I cancel the whole program.
>
> I understand the lifetime of recodrbatch is controlled by # of owners of
> shared_ptr, so it will be released eventually,
> but are there any APIs or ways to release it manually in the middle of my
> program?
>
> attached is the testing code snip. Thanks!
>
> =======
>   auto f0 = field("f0", float64());
>   auto f1 = field("f1", uint32());
>   auto sch = arrow::schema({f0, f1});
>
>   std::vector<std::string> input_data_string = {
> "[10, NaN, 4, 50, 52, 32, 11]",
>
> "[11, 13, 5, 51, null, 33, 12]"};
>
>
>   // prepare input record Batch
>   std::vector<std::shared_ptr<Array>> array_list;
>   int length = -1;
>   int i = 0;
>   for (auto data : input_data_string) {
>     std::shared_ptr<Array> a0;
>     
> ASSERT_NOT_OK(arrow::ipc::internal::json::ArrayFromJSON(sch->field(i++)->type(),
>                                                             data.c_str(), 
> &a0));
>     if (length == -1) {
>       length = a0->length();
>     }
>     assert(length == a0->length());
>     array_list.push_back(a0);
>   }
>
>   auto input_batch = RecordBatch::Make(sch, length, std::move(array_list));
>
>   input_batch.reset(); // should be free here?
>   std::this_thread::sleep_for(std::chrono::seconds(20));
> thanks, -yuan

Reply via email to