Thomas Hallgren wrote:
Hi all,
And thanks for very good input regarding a remote alternative to
PL/Java (thread titled "Shared Memory"). I'm convinced that such an
alternative would be a great addition to PL/Java and increase the
number of users. The work to create such a platform that has the
stability and quality of todays PL/Java is significant (I really do
think it is a production-grade product today). So significant in fact,
that I'm beginning to think of a third alternative. An alternative
that would combine the performance of using in-process calls with the
benefits of sharing a JVM. The answer is of course to make the backend
multi-threaded.
This question has been debated before and always promptly rejected.
One major reason is of course that it will not bring any benefits over
the current multi-process approach on a majority of the platforms
where PostgreSQL is used. A process-switch is just as fast as a
thread-switch on Linux based systems. Over the last year however,
something has happen that certainly speaks in the favor of
multi-threading. PostgreSQL is getting widely adopted on Windows. On
Windows, a process-switch is at least 5 times more expensive then a
thread-switch. In order to appropriate locking, PostgreSQL is forced
to do a fair amount of switching during transaction processing so the
gain in using a multi-threaded approach on Windows is probably
significant. The same is true for other OS'es where process-switching
is relatively expensive.
There are other benefits as well. PostgreSQL would no longer need
shared memory and semaphores and lot more resources could be shared
between backend processes. The one major drawback of a multi-threaded
approach (the one that's been the main argument for the defenders of
the current approach) is vulnerability. If one thread is messing
things up, then the whole system will be brought to a halt (on the
other hand, that can be said about the current shared-memory approach
as well). The cure for this is to have a system that, to the extent
possible, prevents this from happening. How would that be possible?
Well, such systems are widely used today. Huge companies use them in
mission critical applications all over the world. They are called
Virtual Machines. Two types in particular are gaining more an more
ground. The .NET based CLR and the Java VM.
Although there's an Open Source initiative called Mono that implements
the CLR, I still don't see it as a viable alternative to create a
production-grade multi-platform database. Microsofts CLR is of course
confined to Microsoft platforms. The Java VM's are however a different
matter altogether. And with the java.nio.channels package that was
introduced in Java 1.4 and the java.util.concurrent package from Java
5.0, Java has taken a major steps forward in being a very feasible
platform for a database implementation. There's actually nothing
stopping you from doing a high-performance MVCC system using Java
today. A SQL parser would be based on JavaCC technology (the grammar
is already written although it needs small adjustments to comply with
the PostgreSQL dialect). Lots of technology is there out-of-the-box
such as regular expressions, hash-maps, linked lists, etc. Not to
forget an exceptionally great threading system, now providing atomic
operations, semaphores, copy-on-write arrays etc. In short, everything
that a database implementor could ever wish for.
The third alternative for PL/Java, an approach that gets more viable
every minute I think about it, is to implement the PostgreSQL backend
completely in Java. I'm involved in the development of one of the
commercial JVM's. I know that an enormous amount of resources are
constantly devoted to performance optimizations. The days when a
complex system written in C or C++ could outperform a JVM have passed.
A static optimizer can only do so well. A JVM, that collects
heuristics, communicates with the CPU about cache usage etc., can be a
great deal smarter on how the final machine code will be optimized,
and re-optimized should the conditions change. It would be great if
PostgreSQL could benefit from all this research.
If a commercial JVM is perceived as a problem, then combine^h^h^hpile
the code with GNU gcj instead of gcc like today.
The list of advantages can be made a mile long. There's no point in
listing everything here. From my own standpoint, I'm of course
thinking first and foremost about the advantages with PL/Java. It will
become the absolute most efficient PL of them all. Other languages,
for which no good Java implementation exists (I'm thinking Jython for
Python, etc.), can be implemented using JNI. The most common functions
used by say, PL/Perl could probably be implemented as callbacks into
the Java domain in order to make the changes in the respective PL
minimal.
We already do use threads on Windows to a limited extent to do things
like timers and pseudo-signal handling.
If this were a greenfields project then your arguments would have force.
But for how long would you like to suspend Postgres development activity
while we re-implement everything in Java? Not to mention the effort to
recruit new developers to replace those who leave because they can't or
don't want to be part of the effort.
For better or worse, PostgreSQL is written in C, and I can't see that
changing.
It might be interesting to take a frozen code base for PostgreSQL and
reimplement it in Java, and then run some comparisons, both for
performance and crash stability. I just counted roughly 100k lines of
source code, so a reimplementation effort would be distinctly non-trivial.
cheers
andrew
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly