On Mon, Feb 10, 2025 at 10:21:38AM -0600, Yan Zhai wrote:
> Hi Brian, Jiri
> 
> thanks for the comments.
> 
> On Mon, Feb 10, 2025 at 8:47 AM Brian Vazquez <bria...@google.com> wrote:
> >
> > On Mon, Feb 10, 2025 at 4:19 AM Jiri Olsa <olsaj...@gmail.com> wrote:
> > >
> > > On Sun, Feb 09, 2025 at 11:22:35PM -0800, Yan Zhai wrote:
> > > > The generic_map_lookup_batch currently returns EINTR if it fails with
> > > > ENOENT and retries several times on bpf_map_copy_value. The next batch
> > > > would start from the same location, presuming it's a transient issue.
> > > > This is incorrect if a map can actually have "holes", i.e.
> > > > "get_next_key" can return a key that does not point to a valid value. At
> > > > least the array of maps type may contain such holes legitly. Right now
> > > > these holes show up, generic batch lookup cannot proceed any more. It
> > > > will always fail with EINTR errors.
> > > >
> > > > Rather, do not retry in generic_map_lookup_batch. If it finds a non
> > > > existing element, skip to the next key. This simple solution comes with
> > > > a price that transient errors may not be recovered, and the iteration
> > > > might cycle back to the first key under parallel deletion. For example,
> > >
> > > probably stupid question, but why not keep the retry logic and when
> > > it fails then instead of returning EINTR just jump to the next key
> > >
> > > jirka
> >
> > +1, keeping the retry logic but moving to the next key on error sounds
> > like a sensible approach.
> >
> I made the trade off since retry would consistently fail for the array
> of maps, so it is merely wasting cycles to ever do so. It is already
> pretty slow to read these maps today from userspace (for us we read
> them for accounting/monitoring purposes), so it is nice to save a few
> cycles especially for sparse maps. E.g. We use inner maps to store
> protocol specific actions in an array of maps with 256 slots, but
> usually only a few common protocols like TCP/UDP/ICMP are populated,
> leaving most "holes". On the other hand, I personally feel it is
> really "fragile" if users rely heavily on this logic to survive
> concurrent lookup and deletion. Would it make more sense to provide
> concurrency guarantee with map specific ops like hash map?

Brian, any details on the EINTR path? is that just to survive concurent
batch-lookup and delete?

if that's important use case I guess the map specific function would be
possible, because it's broken for maps with holes as you described

thanks,
jirka

Reply via email to