RE: [PHP-DEV] Making parallel database queries from PHP

Arend van Beelen Sat, 10 Nov 2007 03:28:15 -0800

Hi Scott,

thanks for your thorough reply! I'm glad to see the first option appears to be 
preferred, as it was my first choise as well. The main reason I had some doubts 
was because a colleague of mine quite strongly argued (though I was still not 
entirely convinced) that I would find myself in an aweful lot of trouble when 
trying to implement a multi-threaded library inside a single-threaded PHP 
environment. Which is the problem I am hoping to solve by only exporting a 
single-threaded API.


For the record, I do have some answers to your questions...
1) At the moment we use Apache2 with the prefork MPM. However, there is some 
thought going into switching to lighttpd, so I don't think it would be a good 
choice to make hard dependencies on Apache right now. The machines we're using 
are all dual-cores and quad-cores, so yes, it would be silly not to make use of 
their power :)
2) I will be the main developer of the library. I have personal experience with 
assembly, C, C++, Java, PHP and whatnot. However, last few years I've only been 
professionaly working with PHP so I will need to make some effort to get back 
into low-level programming. Fortunately, I've got some colleagues which are 
also good C and C++ programmers, so I think the experience is there :)
3) MySQL is an absolute must. But it would be nice not to make the library too 
dependant on MySQL (there's probably little need for that as well), so it can 
be extended to other databases in the future.
4) Target platform is Gentoo Linux x86-64.

Thanks again!
Arend.

-----Oorspronkelijk bericht-----
Van: Scott A. Guyer [mailto:[EMAIL PROTECTED]
Verzonden: za 10-11-2007 3:46
Aan: Arend van Beelen
Onderwerp: RE: [PHP-DEV] Making parallel database queries from PHP
 
Hi Arend,

The first and second options differ primarily in who owns the scheduling of
tasks (DB tasks).  In the first option, you assign tasks to threads and
allow the OS to schedule threads.  In the second option, you are the
scheduler.  When you view it this way, the pros/cons are fairly clear.  The
latter option gives you all the control to schedule the way you want to
schedule.  But this control places all the risk on you as well.  And you may
very well be painting yourself into a corner with respect to being able to
take advantage of any improvements in OS/job/thread scheduling in the
future.  Which I'm betting is going to be rampant again as multi-core is
sooooo readily available.  

One sorta fuzzy intangible I can think of is this.  Your first choice is the
most common pattern for concurrency these days.  Why?  I think because it is
easier to implement and because OS thread handling is MUCH better these days
than it used to be say 15 years ago (lighter weight, better scheduling,
multi-core optimizations, etc.).  Contrast that to the second option.  Where
you are kinda hoping any 3rd party libraries (and DB libraries) are written
to support async-I/O.  This just isn't as common as you might expect these
days.  So you might be constraining yourself a little with the libraries you
could expect to use (of course, this depends a great deal on precisely which
libraries you will use, but it is a risk).  Thread safe libraries are more
common than async-I/O libraries in my experience.  Async-I/O was making a
little comeback in recent years.  But I can't say with any certainty that it
is prevalent in the libraries you might depend on.

In both cases, you could implement based on Apache APR library which would
get you up and running nicely in apache on Windows, Unix, MAC.  So that's a
plus.

Turning briefly to the 3rd option...this really only benefits you in two
cases.  (1) It completely encapsulates you code so that any failures in your
code will not bring down the PHP (or its hosting app server).  (2) You have
some one-off (perhaps proprietary or legacy) code base that you would not be
able to embed in apache/iis/php nicely (e.g., conflicts in threading, memory
management, etc.).  It many not be a bad way to prototype as you work out
some kinks in your code.  However, I don't favor this approach primarily
because it adds an install dependency and a little extra IPC overhead from
PHP to your daemon.  Additionally, this option may add a greater portability
burden if you were trying to move your daemon amongst the common OSes. So I
don't consider it a long term option.

To conclude, I would favor your option 1, ceteris paribus. I would just take
a hard look at any dependent code you are expecting to utilize in your code.
That's where the rubber will meet the road.

Other factors not fully considered which may impact your decision:
(1) If in Apache, any particular MPM?  All MPMs?  Do you have a deployment
that already uses about as many threads as your hardware can handle?
(2) Any particular skills (or lack of skills) for the developers of this
library?
(3) Which DBs are a must for you?  Which are nice-to-haves?
(4) Target platform(s)?

Hope that helps.  Cheers,
-Scott

PS - sorry for length :-(



-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Arend van
Beelen
Sent: Friday, November 09, 2007 8:27 PM
To: internals@lists.php.net
Subject: [PHP-DEV] Making parallel database queries from PHP

Hi there,

I am researching the possibility of developing a shared library which can
perform database queries in parallel to multiple databases. One important
requirement is that I will be able to use this functionality from PHP.
Because I know PHP is not thread-safe due to other libraries, I am wondering
what would be the best way to implement this. Right now I can imagine three
solutions:

- Use multiple threads to connect to the databases, but let the library
export a blocking single-threaded API. So, PHP calls a function in the
library, this function spawns new threads, which do the real work. Meanwhile
the function waits for the threads to finish, and when all threads are done
it returns the final result back to PHP.
- Use a single thread and asynchronous socket communication. So, PHP calls
the library function and this function handles all connections within the
same thread using asynchronous communication, and returns the result to PHP
when all communication is completed.
- Use a daemon on the localhost. Make a connection from PHP to the daemon,
the daemon handles all the connections to the databases and passes the
result back to the connection made from PHP.

Can someone give me some advise about advantages of using one approach or
another? Please keep in mind that I'm hoping for a solution which will be
both stable and minimizes overhead.

Thanks,
Arend.

-- 
Arend van Beelen jr.
"If you want my address, it's number one at the end of the bar."

RE: [PHP-DEV] Making parallel database queries from PHP

Reply via email to