On Mon, 12 Jun 2023 at 20:24, Andres Freund <and...@anarazel.de> wrote: > > Hi, > > On 2023-06-12 16:23:14 +0400, Pavel Borisov wrote: > > Is the following true or not? > > > > 1. If we switch processes to threads but leave the amount of session > > local variables unchanged, there would be hardly any performance gain. > > False. > > > > 2. If we move some backend's local variables into shared memory then > > the performance gain would be very near to what we get with threads > > having equal amount of session-local variables. > > False. > > > > In other words, the overall goal in principle is to gain from less > > memory copying wherever it doesn't add the burden of locks for > > concurrent variables access? > > False. > > Those points seems pretty much unrelated to the potential gains from switching > to a threading model. The main advantages are:
I think that they're practical performance-related questions about the benefits of performing a technical migration that could involve significant development time, take years to complete, and uncover problems that cause reliability issues for a stable, proven database management system. > 1) We'd gain from being able to share state more efficiently (using normal > pointers) and more dynamically (not needing to pre-allocate). That'd remove > a good amount of complexity. As an example, consider the work we need to do > to ferry tuples from one process to another. Even if we just continue to > use shm_mq, in a threading world we could just put a pointer in the queue, > but have the tuple data be shared between the processes etc. > > Eventually this could include removing the 1:1 connection<->process/thread > model. That's possible to do with processes as well, but considerably > harder. This reads like a code quality argument: that's worthwhile, but I don't see how it supports your 'False' assertions. Do two queries running in separate processes spend much time allocating and waiting on resources that could be shared within a single thread? > 2) Making context switches cheaper / sharing more resources at the OS and > hardware level. That seems valid. Even so, I would expect that for many queries, I/O access and row processing time is the bulk of the work, and that context-switches to/from other query processes is relatively negligible.