On Thu, Feb 7, 2019 at 11:21 AM Andrii Nakryiko <andrii.nakry...@gmail.com> wrote: > > On Tue, Feb 5, 2019 at 10:25 PM Alexei Starovoitov > <alexei.starovoi...@gmail.com> wrote: > > > > On Tue, Feb 05, 2019 at 09:46:14PM -0800, Andrii Nakryiko wrote: > > > On Tue, Feb 5, 2019 at 7:07 PM Alexei Starovoitov > > > <alexei.starovoi...@gmail.com> wrote: > > > > > > > > On Tue, Feb 05, 2019 at 04:29:49PM -0800, Andrii Nakryiko wrote: > > > > > This patch exposes two new APIs btf__get_raw_data_size() and > > > > > btf__get_raw_data() that allows to get a copy of raw BTF data out of > > > > > struct btf. This is useful for external programs that need to > > > > > manipulate > > > > > raw data, e.g., pahole using btf__dedup() to deduplicate BTF type info > > > > > and then writing it back to file. > > > > > > > > > > Signed-off-by: Andrii Nakryiko <andr...@fb.com> > > > > > Acked-by: Song Liu <songliubrav...@fb.com> > > > > > --- > > > > > tools/lib/bpf/btf.c | 10 ++++++++++ > > > > > tools/lib/bpf/btf.h | 2 ++ > > > > > tools/lib/bpf/libbpf.map | 2 ++ > > > > > 3 files changed, 14 insertions(+) > > > > > > > > > > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c > > > > > index 1c2ba7182400..34bfb3641aac 100644 > > > > > --- a/tools/lib/bpf/btf.c > > > > > +++ b/tools/lib/bpf/btf.c > > > > > @@ -437,6 +437,16 @@ int btf__fd(const struct btf *btf) > > > > > return btf->fd; > > > > > } > > > > > > > > > > +__u32 btf__get_raw_data_size(const struct btf *btf) > > > > > +{ > > > > > + return btf->data_size; > > > > > +} > > > > > + > > > > > +void btf__get_raw_data(const struct btf *btf, char *data) > > > > > +{ > > > > > + memcpy(data, btf->data, btf->data_size); > > > > > +} > > > > > > > > I cannot think of any other way to use this api, > > > > but to call btf__get_raw_data_size() first, > > > > then malloc that much memory and then call btf__get_raw_data() > > > > to store btf into it. > > > > > > > > If so, may be api should be single call that allocates, copies, > > > > and returns pointer to new mem and its size? > > > > Probably less error prone? > > > > > > > > > > I don't have strong preference, but providing pointer to allocated memory > > > seems more flexible and allows more clever/optimal use of memory from > > > caller > > > side. E.g., instead of doing two mallocs, you can imagine doing something > > > like: > > > > > > int max_size = max(btf__get_raw_data_size(btf), > > > btf_ext__get_raw_data_size(btf_ext)); > > > char *m = malloc(max_size); > > > btf__get_raw_data(btf, m); > > > dump_btf_section_to_file(m, some_file); > > > btf_ext__get_raw_data(btf_ext, m); > > > dump_btf_ext_section_to_file(m, some_file); > > > free(m); > > > > > > Also, pointer to memory could be mmap()'ed file, for instance. In general, > > > for a library it might be a good thing to not be prescriptive as to how > > > one > > > gets that piece of memory. > > > > Plausible, but I'd like to see pahole patches to be convinced ;) > > > > Here's a summary of proposed ways to expose raw data through new api, > with pros/cons. > > 1. Originally proposed two functions. `int btf__get_raw_data_size()` > to get size, `void btf__get_raw_data(void* buf)` to write raw data to > a provided buf. > > Pros: > - allows maximal flexibility for users of this API. They can manage > memory as it's convenient for them (e.g., reusing same buffer for > multiple btf and btf_ext raw data). > - allows using mmap()'ed memory, as allocation and memory ownership > is delegated to user > > Cons: > - has potential of buffer overflows, if user doesn't provide big enough > buffer > > > 2. Alexei's proposal to combine getting size in single function that > internally allocates new memory buffer, copies data and returns it to > users to use and later free. > > Pros: > - one less API function > - more straightforward usage, it's hard to misuse it (except for > memory leaking, if memory is not freed) > > Cons: > - always allocated for each call > - least flexible approach, doesn't allow caller to manage memory, > prevents any kind of direct write to mmap()'ed file > > 3. Daniel proposed realloc-like approach, where caller optionally > provides memory buffer, but we always call realloc() internally to > ensure we have long enough buffer. > > Pros: > - allows callers to provide their memory buffer (similar to approach > #1, but see cons below) > - prevents user error with providing too small buffer (similar to approach > #2) > > Cons: > - realloc expects that memory was allocated by previous malloc() > call, so caller can't allocate bigger chunk of memory and provide > pointer inside that area (behavior is undefined in that case). This > requirement is not immediately obvious, so this approach feels more > error prone than either of approach #1 and #2 > - still doesn't allow mmap()'ed usage, again due to realloc()'s requirements >
There is actually approach #4 - just return const void* to an internal memory buffer. This is trivial for struct btf, will require just slight changes for struct btf_ext, but it puts all the control in user's hands without imposing any unnecessary allocations. This approach seems to provide best of all approaches with no downsides. > > Approach #3 looks most subtly-error-prone, as it's too easy to just > provide pointer that's not at the beginning of malloc()'ed memory, but > this might not be detected immediately, and could potentially lead to > silent memory corruption. > > I'd still go with approach #1 as it provides least restrictive API, > even though approach #2 will provide marginally better usability for > common cases. > > Alexei, Daniel, which approach you'd prefer in the end after > considering all pros and cons?