On 28-Mar-06, at 10:48 AM, Thomas Hallgren wrote:
Hi Simon,
Thanks for your input. All good points. I actually did some work
using Java stored procedures on DB2 a while back but I had managed
to forget (or repress :-) ) all about the FENCED/NOT FENCED stuff.
The current discussion definitely puts it in a different
perspective. I think PL/Java has a pretty good 'NOT FENCED'
implementation, as does many other PL's, but no PL has yet come up
with a FENCED solution.
What exactly is a FENCED solution ? If it is simply a remote
connection to a single JVM then pl-j already does that.
This FENCED/NOT FENCED terminology would be a good way to
differentiate between the two approaches. Any chance of that syntax
making it into the PostgreSQL grammar, should the need arise?
Some more comments inline:
Simon Riggs wrote:
Just some thoughts from afar: DB2 supports in-process and out-of-
process
external function calls (UDFs) that it refers to as UNFENCED and
FENCED
procedures. For Java only, IBM have moved to supporting *only* FENCED
procedures for Java functions, i.e. having a single JVM for all
connections.
>
Are you sure about this? As I recall it a FENCED stored procedure
executed in a remote JVM of it's own. A parameter could be used
that either caused a new JVM to be instantiated for each stored
procedure call or to be kept for the duration of the session. The
former would yield really horrible performance but keep memory
utilization at a minimum. The latter would get a more acceptable
performance but waste more memory (in par with PL/Java today).
Each connection's Java function runs as a thread on a
single dedicated JVM-only process.
If that was true, then different threads could share dirty session
data. I wanted to do that using DB2 but found it impossible. That
was a while back though.
That approach definitely does increase the invocation time, but it
significantly reduces the resources associated with the JVM, as
well as
allowing memory management to be more controllable (bliss...). So the
overall picture could be more CPU and memory resources for each
connection in the connection pool.
My very crude measurements indicate that the overhead of using a
separate JVM is between 6-15MB of real memory per connection.
Today, you get about 10MB/$ and servers configured with 4GB RAM or
more are not uncommon.
I'm not saying that the overhead doesn't matter. Of course it does.
But the time when you needed to be extremely conservative with
memory usage has passed. It might be far less expensive to buy some
extra memory then to invest in SMP architectures to minimize IPC
overhead.
My point is, even fairly large app-servers (using connection pools
with up to 200 simultaneous connections) can run using relatively
inexpensive boxes such as an AMD64 based server with 4GB RAM and
show very good throughput with the current implementation.
If you have a few small Java functions centralisation would not be
good,
but if you have a whole application architecture with many
connections
executing reasonable chunks of code then this can be a win.
One thing to remembered is that a 'chunk of code' that executes in
a remote JVM and uses JDBC will be hit by the IPC overhead on each
interaction over the JDBC connection. I.e. the overhead is not just
limited to the actual call of the UDF, it's also imposed on all
database accesses that the UDF makes in turn.
In that environment we used Java for major database functions,
with SQL
functions for small extensions.
My guess is that those major database functions did a fair amount
of JDBC. Am I right?
Also the Java invocation time we should be celebrating is that by
having
Java in the database the Java<->DB time is much less than it would
be if
we had a Java stack sitting on another server.
I think the cases when you have a Tomcat or JBoss sitting on the
same physical server as the actual database are very common. One
major reason being that you don't want network overhead between the
middle tier and the backend. Moving logic into the database instead
of keeping it in the middle tier is often done to get rid of the
last hurdle, the overhead of IPC.
Regards,
Thomas Hallgren
---------------------------(end of
broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend