> -----Original Message----- > From: Neil Horman [mailto:nhorman at tuxdriver.com] > Sent: Thursday, May 29, 2014 6:48 AM > To: Richardson, Bruce > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v2 2/5] distributor: new packet distributor > library > > > + > > +/* flush the distributor, so that there are no outstanding packets in > > flight or > > + * queued up. */ > Its not clear to me that this is a distributor only function. You modified > the > comments to indicate that lcores can't preform double duty as both a worker > and > a distributor, which is fine, but it implies that there is a clear distinction > between functions that are 'worker' functions and 'distributor' functions. > While its for the most part clear-ish (workers call rte_distributor_get_pkt > and > rte_distibutor_return_pkt, distibutors calls rte_distributor_create/process. > This is in a grey area. the analogy I'm thinking of here are kernel > workqueues. > Theres a specific workqueue thread that processes the workqueue, but any > process > can sync or flush the workqueue, leading me to think this process can be > called > by a worker lcore.
I can update comments here further, but I was hoping the way things were right now was clear enough. In the header and C files, I have the functions explicitly split up into distributor and worker function sets, with a big block of text in the header at the start of each section explaining the threading use of the follow functions. > > > +int > > +rte_distributor_flush(struct rte_distributor *d) > > +{ > > + unsigned wkr, total_outstanding = 0; > > + unsigned flushed = 0; > > + unsigned ret_start = d->returns.start, > > + ret_count = d->returns.count; > > + > > + for (wkr = 0; wkr < d->num_workers; wkr++) > > + total_outstanding += d->backlog[wkr].count + > > + !!(d->in_flight_tags[wkr]); > > + > > + wkr = 0; > > + while (flushed < total_outstanding) { > > + > > + if (d->in_flight_tags[wkr] != 0 || d->backlog[wkr].count) { > > + const int64_t data = d->bufs[wkr].bufptr64; > > + uintptr_t oldbuf = 0; > > + > > + if (data & RTE_DISTRIB_GET_BUF) { > > + flushed += (d->in_flight_tags[wkr] != 0); > > + if (d->backlog[wkr].count) { > > + d->bufs[wkr].bufptr64 = > > + backlog_pop(&d- > >backlog[wkr]); > > + /* we need to mark something as being > > + * in-flight, but it doesn't matter what > > + * as we never check it except > > + * to check for non-zero. > > + */ > > + d->in_flight_tags[wkr] = 1; > > + } else { > > + d->bufs[wkr].bufptr64 = > > + > RTE_DISTRIB_GET_BUF; > > + d->in_flight_tags[wkr] = 0; > > + } > > + oldbuf = data >> RTE_DISTRIB_FLAG_BITS; > > + } else if (data & RTE_DISTRIB_RETURN_BUF) { > > + if (d->backlog[wkr].count == 0 || > > + move_backlog(d, wkr) == 0) { > > + /* only if we move backlog, > > + * process this packet */ > > + d->bufs[wkr].bufptr64 = 0; > > + oldbuf = data >> > RTE_DISTRIB_FLAG_BITS; > > + flushed++; > > + d->in_flight_tags[wkr] = 0; > > + } > > + } > > + > > + store_return(oldbuf, d, &ret_start, &ret_count); > > + } > > + > I know the comments for move_backlog say you use that function here rather > than > what you do in distributor_process because you're tracking the flush count > here. > That said, if you instead recomputed the total_outstanding count on each loop > iteration, and tested it for 0, I think you could just reduce the flush > operation to a looping call to rte_distributor_process. It would save you > having to maintain the flush code and the move_backlog code separately, which > would be a nice savings. Yes, agreed, I should have spotted that myself. I'll look to rework this as soon as I can. > > > + if (++wkr == d->num_workers) > > + wkr = 0; > Nit: wkr = ++wkr % d->num_workers avoids the additional branch in your loop > a) branch should be easily predictable as the number of workers doesn't change. So long as branch doesn't mis-predict there should be little or no perf penalty to having it. b) The compare plus update can also be done branchless using a "cmov" instruction if we want branch free code. c) The modulus operator is very slow and takes far more cycles than a branch would do. If we could limit the number of workers to a power of two, then an & operation would work nicely, but that is too big a restriction to have. So, in short, I think I'll keep this snippet as-is. :-)