Re: How to parallelization

Christian Schulte Mon, 23 Dec 2024 11:27:18 -0800

On 12/23/24 19:43, Christian Schulte wrote:
> On 12/23/24 17:37, Geoff Steckel wrote:
>> On 12/23/24 11:20 AM, Gábor LENCSE wrote:
>>> Under Linux, one can use the isolcpus kernel command line
>>> parameter to exclude certain cores from the scheduler.
>>> I use the DPDK rte_eal_remote_launch() function to start a thread on
>>> an isolated CPU core.
>>>
>>> Is there anything similar under OpenBSD?
>>>
>> Is there any reason why multiple processes with shared memory segments
>> won't work as well?
>> In my experience there's little if any performance difference.
>> Unless the application is truly SIMD, the additional security
>> and ease of debugging pay off quickly.
>>
>> In my experience, OpenBSD does a good job spreading compute
>> bound processes over all available CPUs.
>>
> 
> Not criticizing OpenBSD in any way. Let me try to explain a common use
> case. There is a data source capable of providing X bytes per second at
> max. The application needs to be setup in a way it can receive those X
> bytes per second without spin locking or waiting for data. If it would
> be "polling" too fast, it would slow down the whole system waiting for
> data. If it would be "polling" too slow, it would not be able to process
> those bytes fast enough. Those bytes need to be processed. So there is a
> receiving process which needs to be able to consume exactly those X
> bytes per second. That consumer also needs to be defined in a way it can
> process those bytes in parallel as fast as possible. Sizing the consumer
> too small, the producer will start spin locking or such and cannot keep
> up with the data rate it needs to process, because the consumer does not
> process the data fast enough. Sizing the consumer too big, the consumer
> will start spin locking or such waiting for the producer to provide more
> data. I am searching for an API to make the application adhere to those
> situations automatically. Data rate on the receiving part decreases,
> consumer part does not need to use Y processes in parallel all spin
> locking waiting for more data. Data rate on the receiving part
> increases, consumer needs to increase compute to not slow down the
> receiver. Does this make things more clear?
>


Hmm. I am using this:

<https://mongoose.ws/>

to handle a websocket. Every websocket message received needs to be
processed in parallel. So there is a queue the receiving thread is
enqueuing messages to and there are multiple threads dequeuing messages
from that queue. I'd like to use some kind of API to size the capacity
of the queue and the number of threads dequeuing automatically based on
the data rate of the websocket. Very few bytes per second, small queue
capacity and a single thread processing the websocket data will do
without any locking. Lots of bytes per second - burst - enlarge queue
capacity and start enough threads to dequeue things in time. I am very
sure I am failing to describe the issue I am having. Just make
everything "big enough" to be able to handle those bursts slows things
down in every other situation. Maybe the APIs I am using just do not
provide the features I am heading for.

-- 
Christian

Re: How to parallelization

Reply via email to