Re: [HACKERS] Remote PL/Java, Summary

Andrew Dunstan Sat, 01 Apr 2006 05:13:37 -0800


Thomas Hallgren wrote:

Hi all,
And thanks for very good input regarding a remote alternative toPL/Java (thread titled "Shared Memory"). I'm convinced that such analternative would be a great addition to PL/Java and increase thenumber of users. The work to create such a platform that has thestability and quality of todays PL/Java is significant (I really dothink it is a production-grade product today). So significant in fact,that I'm beginning to think of a third alternative. An alternativethat would combine the performance of using in-process calls with thebenefits of sharing a JVM. The answer is of course to make the backendmulti-threaded.
This question has been debated before and always promptly rejected.One major reason is of course that it will not bring any benefits overthe current multi-process approach on a majority of the platformswhere PostgreSQL is used. A process-switch is just as fast as athread-switch on Linux based systems. Over the last year however,something has happen that certainly speaks in the favor ofmulti-threading. PostgreSQL is getting widely adopted on Windows. OnWindows, a process-switch is at least 5 times more expensive then athread-switch. In order to appropriate locking, PostgreSQL is forcedto do a fair amount of switching during transaction processing so thegain in using a multi-threaded approach on Windows is probablysignificant. The same is true for other OS'es where process-switchingis relatively expensive.
There are other benefits as well. PostgreSQL would no longer needshared memory and semaphores and lot more resources could be sharedbetween backend processes. The one major drawback of a multi-threadedapproach (the one that's been the main argument for the defenders ofthe current approach) is vulnerability. If one thread is messingthings up, then the whole system will be brought to a halt (on theother hand, that can be said about the current shared-memory approachas well). The cure for this is to have a system that, to the extentpossible, prevents this from happening. How would that be possible?Well, such systems are widely used today. Huge companies use them inmission critical applications all over the world. They are calledVirtual Machines. Two types in particular are gaining more an moreground. The .NET based CLR and the Java VM.
Although there's an Open Source initiative called Mono that implementsthe CLR, I still don't see it as a viable alternative to create aproduction-grade multi-platform database. Microsofts CLR is of courseconfined to Microsoft platforms. The Java VM's are however a differentmatter altogether. And with the java.nio.channels package that wasintroduced in Java 1.4 and the java.util.concurrent package from Java5.0, Java has taken a major steps forward in being a very feasibleplatform for a database implementation. There's actually nothingstopping you from doing a high-performance MVCC system using Javatoday. A SQL parser would be based on JavaCC technology (the grammaris already written although it needs small adjustments to comply withthe PostgreSQL dialect). Lots of technology is there out-of-the-boxsuch as regular expressions, hash-maps, linked lists, etc. Not toforget an exceptionally great threading system, now providing atomicoperations, semaphores, copy-on-write arrays etc. In short, everythingthat a database implementor could ever wish for.
The third alternative for PL/Java, an approach that gets more viableevery minute I think about it, is to implement the PostgreSQL backendcompletely in Java. I'm involved in the development of one of thecommercial JVM's. I know that an enormous amount of resources areconstantly devoted to performance optimizations. The days when acomplex system written in C or C++ could outperform a JVM have passed.A static optimizer can only do so well. A JVM, that collectsheuristics, communicates with the CPU about cache usage etc., can be agreat deal smarter on how the final machine code will be optimized,and re-optimized should the conditions change. It would be great ifPostgreSQL could benefit from all this research.
If a commercial JVM is perceived as a problem, then combine^h^h^hpilethe code with GNU gcj instead of gcc like today.
The list of advantages can be made a mile long. There's no point inlisting everything here. From my own standpoint, I'm of coursethinking first and foremost about the advantages with PL/Java. It willbecome the absolute most efficient PL of them all. Other languages,for which no good Java implementation exists (I'm thinking Jython forPython, etc.), can be implemented using JNI. The most common functionsused by say, PL/Perl could probably be implemented as callbacks intothe Java domain in order to make the changes in the respective PLminimal.

We already do use threads on Windows to a limited extent to do thingslike timers and pseudo-signal handling.

If this were a greenfields project then your arguments would have force.But for how long would you like to suspend Postgres development activitywhile we re-implement everything in Java? Not to mention the effort torecruit new developers to replace those who leave because they can't ordon't want to be part of the effort.

For better or worse, PostgreSQL is written in C, and I can't see thatchanging.

It might be interesting to take a frozen code base for PostgreSQL andreimplement it in Java, and then run some comparisons, both forperformance and crash stability. I just counted roughly 100k lines ofsource code, so a reimplementation effort would be distinctly non-trivial.


cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Re: [HACKERS] Remote PL/Java, Summary

Reply via email to