Hi Scott, thanks for your thorough reply! I'm glad to see the first option appears to be preferred, as it was my first choise as well. The main reason I had some doubts was because a colleague of mine quite strongly argued (though I was still not entirely convinced) that I would find myself in an aweful lot of trouble when trying to implement a multi-threaded library inside a single-threaded PHP environment. Which is the problem I am hoping to solve by only exporting a single-threaded API.
For the record, I do have some answers to your questions... 1) At the moment we use Apache2 with the prefork MPM. However, there is some thought going into switching to lighttpd, so I don't think it would be a good choice to make hard dependencies on Apache right now. The machines we're using are all dual-cores and quad-cores, so yes, it would be silly not to make use of their power :) 2) I will be the main developer of the library. I have personal experience with assembly, C, C++, Java, PHP and whatnot. However, last few years I've only been professionaly working with PHP so I will need to make some effort to get back into low-level programming. Fortunately, I've got some colleagues which are also good C and C++ programmers, so I think the experience is there :) 3) MySQL is an absolute must. But it would be nice not to make the library too dependant on MySQL (there's probably little need for that as well), so it can be extended to other databases in the future. 4) Target platform is Gentoo Linux x86-64. Thanks again! Arend. -----Oorspronkelijk bericht----- Van: Scott A. Guyer [mailto:[EMAIL PROTECTED] Verzonden: za 10-11-2007 3:46 Aan: Arend van Beelen Onderwerp: RE: [PHP-DEV] Making parallel database queries from PHP Hi Arend, The first and second options differ primarily in who owns the scheduling of tasks (DB tasks). In the first option, you assign tasks to threads and allow the OS to schedule threads. In the second option, you are the scheduler. When you view it this way, the pros/cons are fairly clear. The latter option gives you all the control to schedule the way you want to schedule. But this control places all the risk on you as well. And you may very well be painting yourself into a corner with respect to being able to take advantage of any improvements in OS/job/thread scheduling in the future. Which I'm betting is going to be rampant again as multi-core is sooooo readily available. One sorta fuzzy intangible I can think of is this. Your first choice is the most common pattern for concurrency these days. Why? I think because it is easier to implement and because OS thread handling is MUCH better these days than it used to be say 15 years ago (lighter weight, better scheduling, multi-core optimizations, etc.). Contrast that to the second option. Where you are kinda hoping any 3rd party libraries (and DB libraries) are written to support async-I/O. This just isn't as common as you might expect these days. So you might be constraining yourself a little with the libraries you could expect to use (of course, this depends a great deal on precisely which libraries you will use, but it is a risk). Thread safe libraries are more common than async-I/O libraries in my experience. Async-I/O was making a little comeback in recent years. But I can't say with any certainty that it is prevalent in the libraries you might depend on. In both cases, you could implement based on Apache APR library which would get you up and running nicely in apache on Windows, Unix, MAC. So that's a plus. Turning briefly to the 3rd option...this really only benefits you in two cases. (1) It completely encapsulates you code so that any failures in your code will not bring down the PHP (or its hosting app server). (2) You have some one-off (perhaps proprietary or legacy) code base that you would not be able to embed in apache/iis/php nicely (e.g., conflicts in threading, memory management, etc.). It many not be a bad way to prototype as you work out some kinks in your code. However, I don't favor this approach primarily because it adds an install dependency and a little extra IPC overhead from PHP to your daemon. Additionally, this option may add a greater portability burden if you were trying to move your daemon amongst the common OSes. So I don't consider it a long term option. To conclude, I would favor your option 1, ceteris paribus. I would just take a hard look at any dependent code you are expecting to utilize in your code. That's where the rubber will meet the road. Other factors not fully considered which may impact your decision: (1) If in Apache, any particular MPM? All MPMs? Do you have a deployment that already uses about as many threads as your hardware can handle? (2) Any particular skills (or lack of skills) for the developers of this library? (3) Which DBs are a must for you? Which are nice-to-haves? (4) Target platform(s)? Hope that helps. Cheers, -Scott PS - sorry for length :-( -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Arend van Beelen Sent: Friday, November 09, 2007 8:27 PM To: internals@lists.php.net Subject: [PHP-DEV] Making parallel database queries from PHP Hi there, I am researching the possibility of developing a shared library which can perform database queries in parallel to multiple databases. One important requirement is that I will be able to use this functionality from PHP. Because I know PHP is not thread-safe due to other libraries, I am wondering what would be the best way to implement this. Right now I can imagine three solutions: - Use multiple threads to connect to the databases, but let the library export a blocking single-threaded API. So, PHP calls a function in the library, this function spawns new threads, which do the real work. Meanwhile the function waits for the threads to finish, and when all threads are done it returns the final result back to PHP. - Use a single thread and asynchronous socket communication. So, PHP calls the library function and this function handles all connections within the same thread using asynchronous communication, and returns the result to PHP when all communication is completed. - Use a daemon on the localhost. Make a connection from PHP to the daemon, the daemon handles all the connections to the databases and passes the result back to the connection made from PHP. Can someone give me some advise about advantages of using one approach or another? Please keep in mind that I'm hoping for a solution which will be both stable and minimizes overhead. Thanks, Arend. -- Arend van Beelen jr. "If you want my address, it's number one at the end of the bar."