On 28-Mar-06, at 10:48 AM, Thomas Hallgren wrote:

Hi Simon,
Thanks for your input. All good points. I actually did some work using Java stored procedures on DB2 a while back but I had managed to forget (or repress :-) ) all about the FENCED/NOT FENCED stuff. The current discussion definitely puts it in a different perspective. I think PL/Java has a pretty good 'NOT FENCED' implementation, as does many other PL's, but no PL has yet come up with a FENCED solution.

What exactly is a FENCED solution ? If it is simply a remote connection to a single JVM then pl-j already does that.

This FENCED/NOT FENCED terminology would be a good way to differentiate between the two approaches. Any chance of that syntax making it into the PostgreSQL grammar, should the need arise?

Some more comments inline:

Simon Riggs wrote:
Just some thoughts from afar: DB2 supports in-process and out-of- process external function calls (UDFs) that it refers to as UNFENCED and FENCED
procedures. For Java only, IBM have moved to supporting *only* FENCED
procedures for Java functions, i.e. having a single JVM for all
connections.
>
Are you sure about this? As I recall it a FENCED stored procedure executed in a remote JVM of it's own. A parameter could be used that either caused a new JVM to be instantiated for each stored procedure call or to be kept for the duration of the session. The former would yield really horrible performance but keep memory utilization at a minimum. The latter would get a more acceptable performance but waste more memory (in par with PL/Java today).


Each connection's Java function runs as a thread on a
single dedicated JVM-only process.
If that was true, then different threads could share dirty session data. I wanted to do that using DB2 but found it impossible. That was a while back though.

That approach definitely does increase the invocation time, but it
significantly reduces the resources associated with the JVM, as well as
allowing memory management to be more controllable (bliss...). So the
overall picture could be more CPU and memory resources for each
connection in the connection pool.
My very crude measurements indicate that the overhead of using a separate JVM is between 6-15MB of real memory per connection. Today, you get about 10MB/$ and servers configured with 4GB RAM or more are not uncommon.

I'm not saying that the overhead doesn't matter. Of course it does. But the time when you needed to be extremely conservative with memory usage has passed. It might be far less expensive to buy some extra memory then to invest in SMP architectures to minimize IPC overhead.

My point is, even fairly large app-servers (using connection pools with up to 200 simultaneous connections) can run using relatively inexpensive boxes such as an AMD64 based server with 4GB RAM and show very good throughput with the current implementation.


If you have a few small Java functions centralisation would not be good, but if you have a whole application architecture with many connections
executing reasonable chunks of code then this can be a win.
One thing to remembered is that a 'chunk of code' that executes in a remote JVM and uses JDBC will be hit by the IPC overhead on each interaction over the JDBC connection. I.e. the overhead is not just limited to the actual call of the UDF, it's also imposed on all database accesses that the UDF makes in turn.


In that environment we used Java for major database functions, with SQL
functions for small extensions.
My guess is that those major database functions did a fair amount of JDBC. Am I right?


Also the Java invocation time we should be celebrating is that by having Java in the database the Java<->DB time is much less than it would be if
we had a Java stack sitting on another server.

I think the cases when you have a Tomcat or JBoss sitting on the same physical server as the actual database are very common. One major reason being that you don't want network overhead between the middle tier and the backend. Moving logic into the database instead of keeping it in the middle tier is often done to get rid of the last hurdle, the overhead of IPC.


Regards,
Thomas Hallgren


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to