The mailing list cannot handle attachments and images. Can you upload the flame graphs to a gist?
On Fri, May 12, 2023 at 6:55 PM SHI BEI <shibei...@foxmail.com> wrote: > What I meant is that shared_ptr has a large overhead, which is clearly > reflected in the CPU flame graph. In my testing scenario, there are 10 > Parquet files, each with a size of 1.3GB and no compression applied to the > data within the files. Each row group has 65536 rows in those files. In > each test, all files are read 10 times to facilitate capturing the CPU > flame graph. To verify the issue described above, I controlled the number > of calls to the RecordBatchReader::ReadNext interface by adjusting the > number of rows of data read each time. The CPU flame graph capture results > are as follows: > 1) batch_size = 2048 > > > 2) batch_size = 65536 > > > > ------------------------------ > > SHI BEI > shibei...@foxmail.com > > <https://wx.mail.qq.com/home/index?t=readmail_businesscard_midpage&nocheck=true&name=SHI+BEI&icon=https%3A%2F%2Fthirdwx.qlogo.cn%2Fmmopen%2Fvi_32%2FQ0j4TwGTfTIXtiatj6eqJnThGs5GrPTyWewWVqE1snw8hJDmnePicI611Zvub05AnTGjcJ5xCNlkD6uezOVoA2Gw%2F132%3Frand%3D1646387113%3Frand%3D1646387124%3Frand%3D1646387148&mail=shibei.lh%40foxmail.com&code=KAmESwJvMrwAxwnQWafGjlsCzQ9tgHLSs7s2ohGx7ou54B0-ZyrWJkTg5npy2p1LmT5WQjSlhwncoGhA6w_xb-hQTDq6tGNfwF1sIGtP_HQ> > > > > > 原始邮件 > > 发件人:"Weston Pace"< weston.p...@gmail.com >; > > 发件时间:2023/5/13 2:30 > > 收件人:"dev"< dev@arrow.apache.org >; > > 主题:Re: Reusing RecordBatch objects and their memory space > > I think there are perhaps various things being discussed here: > > * Reusing large blocks of memory > > I don't think the memory pools actually provide this kind of reuse (e.g. > they aren't like "connection pools" or "thread pools"). I'm pretty sure, > when you allocate a new buffer on a pool, it always triggers an allocation > on the underlying allocator. Now, that being said, I think this is > generally fine. Allocators themselves (e.g. malloc, jemalloc) will keep > and reuse blocks of memory before returning it to the OS. Though this can > be difficult due to things like fragmentation. > > One potential exception to the "let allocators handle the reuse" rule would > be cases where you are frequently allocating buffers that are the exact > same size (or you are ok with the buffers being larger than you need so you > can reuse them). For example, packet pools are very common in network > programming. In this case, you can perhaps be more efficient than the > allocator, since you know the buffers have the same size. > > It's not entirely clear to me that this would be useful in reading parquet. > > * shared_ptr overhead > > Everytime a shared_ptr is created there is an atomic increment of the ref > counter. Everytime it is destroyed there is an atomic decrement. These > atomic increments/decrements introduce memory fences which can foil > compiler optimizations and just be costly on their own. > > > I'm using the RecordBatchReader::ReadNext interface to read Parquet > data in my project, and I've noticed that there are a lot of temporary > object destructors being generated during usage. > > Can you clarify what you mean here? When I read this sentence I thought of > something completely different than the previous two things mentioned :) > At one time I had a suspicion that thrift was generating a lot of small > allocations reading the parquet metadata and that this was leading to > fragmentation of the system allocator (thrift's allocations do not go > through the memory pool / jemalloc and we have a bit of a habit in datasets > of keeping parquet metadata around to speed up future reads). I never did > investigate this further though. > > On Fri, May 12, 2023 at 10:48 AM David Li wrote: > > > I can't find it anymore, but there is a quite old issue that made the > same > > observation: RecordBatch's heavy use of shared_ptr in C++ can lead to a > lot > > of overhead just calling destructors. That may be something to explore > more > > (e.g. I think someone had tried to "unbox" some of the fields in > > RecordBatch). > > > > On Fri, May 12, 2023, at 13:04, Will Jones wrote: > > > Hello, > > > > > > I'm not sure if there are easy ways to avoid calling the destructors. > > > However, I would point out memory space reuse is handled through memory > > > pools; if you have one enabled it shouldn't be handing memory back to > the > > > OS between each iteration. > > > > > > Best, > > > > > > Will Jones > > > > > > On Fri, May 12, 2023 at 9:59 AM SHI BEI wrote: > > > > > >> Hi community, > > >> > > >> > > >> I'm using the RecordBatchReader::ReadNext interface to read Parquet > > >> data in my project, and I've noticed that there are a lot of temporary > > >> object destructors being generated during usage. Has the community > > >> considered providing an interface to reuse RecordBatch objects > > >> and their memory space for storing data? > > >> > > >> > > >> > > >> > > >> SHI BEI > > >> shibei...@foxmail.com > > > >