[vpp-dev] Macros to support prefetch w/o code copy.

Christian Hopps Thu, 11 Jul 2019 21:16:00 -0700

[re-sending w/o extra email addresses]

Hi vpp-dev,


So I'm writing a plugin to support IPTFS 
(https://tools.ietf.org/html/draft-hopps-ipsecme-iptfs-01), and during this 
process I've written a macro to support prefetching the second half of N (where 
N is power of 2) buffers and acting on the first half (i.e., the standard 
pattern of prefetch code in vpp).

The macro allows one to simply write the "single buffer" case code, and not 
have to have copies of the code for each power of 2 to be supported.

One slight variation on standard VPP style is that the macro and code use 
beginning and end pointers for buffer index arrays instead of an array index 
and array length.

I'm sending this mail b/c it may be a little while before I can submit my code, 
and the arm guys are currently adding prefetch code so I thought this could be 
useful for that. Also I might get some useful feedback on the code as well. :)

Here are the prefetch macros:

 static inline int
 half_range (vlib_buffer_t **start, vlib_buffer_t **end, int limit)
 {
   u32 range = (end - start) / 2;
   return range < limit ? range : limit;
 }

 #define FOREACH_PREFETCH(b, eb)                                              \
   do                                                                         \
     {                                                                        \
       vlib_buffer_t **bp, **ebp;                                             \
       for (u32 half = half_range ((b), (eb), CLIB_N_PREFETCHES); (b) < (eb); \
            half = half_range ((b), (eb), CLIB_N_PREFETCHES))                 \
         {                                                                    \
                                                                              \
           /* prefetch second half */                                         \
           if (half)                                                          \
             for (bp = (b) + half, ebp = bp + half; bp < ebp; bp++)           \
               vlib_prefetch_buffer_header (*bp, LOAD);                       \
           else                                                               \
             half = 1; /* Process at least one next */                        \
                                                                              \
           /* process first half or last one */                               \
           for (ebp = (b) + half; (b) < ebp && to < eto; (b)++)

 #define END_FOREACH_PREFETCH \
   }                          \
   }                          \
   while (0)

Here are the macros I use to handle filling up next frames in parallel
and working with arrays using end pointers vs. indices and lengths.

/* in node_funcs.h */

 #define 
vlib_get_next_frame_macro_p(vm,node,next_index,vectors,evectors,alloc_new_frame)
 \
   do {                                                              \
     vlib_frame_t * _f                                               \
       = vlib_get_next_frame_internal ((vm), (node), (next_index),   \
                                       (alloc_new_frame));           \
     u32 _n = _f->n_vectors;                                         \
     (vectors) = vlib_frame_vector_args (_f) + _n * sizeof ((vectors)[0]);\
     (evectors) = (vectors) + (VLIB_FRAME_SIZE - _n);                \
   } while (0)

 #define vlib_get_next_frame_p(vm,node,next_index,vectors,evectors)  \
   vlib_get_next_frame_macro_p (vm, node, next_index,                \
                                vectors, evectors,                   \
                                /* alloc new frame */ 0)

/* in iptfs code.. */

 #define vlib_put_get_next_frame(vm, node, ni, to, eto)              \
   do                                                                \
     {                                                               \
       /* Put the frame if it is full */                             \
       if ((to) && (eto) != (to))                                    \
         ;                                                           \
       else                                                          \
         {                                                           \
           if ((to))                                                 \
             vlib_put_next_frame ((vm), (node), (ni), (eto) - (to)); \
           vlib_get_next_frame_p ((vm), (node), (ni), (to), (eto));  \
         }                                                           \
     }                                                               \
   while (0)

 #define vlib_put_get_next_frame_a(vm, node, ni, toa, etoa) \
   vlib_put_get_next_frame (vm, node, ni, (toa)[(ni)], (etoa)[(ni)])

Finally here's the payoff, a slightly trimmed down use of the above using only 
a single copy of the functional code. In this case it's going to start with 
prefetching 8 buffers and handling 8, then 4, then 2, then 1, then none (it 
starts with 8 b/c CLIB_N_PREFETCHES is 16, but any power of 2 would work). 
Anyway the normal pattern in VPP would require 5 copies of the code in this 
case instead of the single copy used here.

 iptfs_encap_node_inline (vlib_main_t *vm, vlib_node_runtime_t *node,
                          vlib_frame_t *frame, int is_tun)
 {
   vlib_buffer_t *bufs[VLIB_FRAME_SIZE];
   vlib_buffer_t **b = bufs, **eb = bufs + frame->n_vectors;
   vlib_get_buffers (vm, vlib_frame_vector_args (frame), bufs,
                     frame->n_vectors);
   u32 *to[IPTFS_ENCAP_N_NEXT] = {};
   u32 *eto[IPTFS_ENCAP_N_NEXT] = {};

   while (b < eb)
     {
       FOREACH_PREFETCH (b, eb)
       {
         /*
          * Handle single buffer found in (*b)
          */

         u8 *data = vlib_buffer_get_current (*b);

         /* ... do stuff with packet in buffer (*b) ... */

         if (send to SOME_NEXT_NODE condition)
           {
             ni = SOME_NEXT_NODE_INDEX;
             vlib_put_get_next_frame_a (vm, node, ni, to, eto);
             *to[ni]++ = vlib_get_buffer_index (vm, *b);
             continue;
           }
         if (drop condition)
           {
           dropit:
             vlib_put_get_next_frame_a (vm, node, IPTFS_ENCAP_NEXT_DROP, to,
                                        eto);
             *to[IPTFS_ENCAP_NEXT_DROP]++ = vlib_get_buffer_index (vm, *b);
             (*b)->error = node->errors[IPTFS_ENCAP_ERROR_Q_FULL];
             continue;
           }
       }
       END_FOREACH_PREFETCH;
     }

   /* Put any next frames we've used */
   for (uint i = 0; i < IPTFS_ENCAP_N_NEXT; i++)
     if (to[i])
       vlib_put_next_frame (vm, node, i, eto[i] - to[i]);

   return frame->n_vectors;
 }


Thanks,
Chris.

signature.asc
Description: PGP signature

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#13497): https://lists.fd.io/g/vpp-dev/message/13497
Mute This Topic: https://lists.fd.io/mt/32438895/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

[vpp-dev] Macros to support prefetch w/o code copy.

Reply via email to