Hi Donal,

thanks for your suggestion. While I think this approach might provide some 
quick solutions short-term, there actually is a much bigger problem we are 
trying to attack. I don't know exactly how much details I can give, but I will 
give some background information to get some more insight in the situation...

We are dealing with literally hundreds of webservers, and hundreds of database 
servers, and are expanding both of them on a frequent basis. Whenever we 
increase the number of webservers, the databases become our bottleneck and vice 
versa. We realize we won't magically solve any of these bottlenecks by 
introducing parallel querying on the databases. We have lots of tables which 
are divided over more than a dozen database clusters, and we are getting more 
and more tables which become so big they have to be spread out over multiple 
databases. Because of the distribution of these tables, querying them becomes 
increasingly hard, and we are approaching a limit where further distribution 
will become virtually undoable using our current approach. The current approach 
being querying the various databases serially from PHP and manually merging the 
results. If we continue down this path, our PHP application will have to do 
increasingly many queries serially, and latencies will add up more and more. 
Not to mention the code maintenance required for finding the correct databases 
to query and merging all the results. Therefore we will be needing 
parallellization techniques that will be able to transparently handle 
communication with the databases, to keep our latencies low, but also to 
relieve our PHP application from  having to deal with all the distributed 
databases.

Thanks!
Arend.


-----Oorspronkelijk bericht-----
Van: Donal McMullan [mailto:[EMAIL PROTECTED]
Verzonden: za 10-11-2007 13:43
Aan: Arend van Beelen
CC: internals@lists.php.net; Alexey Zakhlestin
Onderwerp: Re: [PHP-DEV] Making parallel database queries from PHP
 
Hi Arend -

If your webserver CPUs are already maxed out, that problem won't go away 
on its own, but once you've solved that (optimized your code or added 
more webservers), the curl_multi_* functions might help you out.

A cheap way to parallelize your database or data-object access, is to 
implement a kind of services-oriented architecture, where you have one 
PHP script* that does little except get data from a database, serialize 
that data, and return it to your main PHP script.

The main PHP script uses the curl_multi_init, curl_multi_add_handle, 
etc. functions to call this script multiple times in parallel, returning 
different data objects for each call.

Because this introduces latency into the data retrieval trip, it will be 
slower for most applications. Some circumstances that might make it 
viable include:
  * you have > 1 data store
  * you have multiple slow queries that aren't interdependent
  * you have to do expensive processing on the data you retrieve
  * you have lots of slack (CPU, RAM, processes) on the webservers

In its favor - it should take just a couple of hours to prototype. If 
you have a single canonical data store, you might find that as soon as 
you enable parallel queries against the database, your database becomes 
the bottleneck, and throughput doesn't actually increase. This technique 
should reveal that as a potential problem without much development cost.

Interested to know how you proceed.

Donal McMullan

-----------------------------------------------------------------------
Donal @ Catalyst.Net.NZ          PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/           PHYS: Level 2, 150-154 Willis St
OFFICE: +64(4)803-2372                              MOB: +64(21)661-254
-----------------------------------------------------------------------

*actually - Java's a pretty good option for this tier too.

Arend van Beelen wrote:
> While I can see the theoretical advantage of this, I wonder how much there's 
> too gain in practice (at least for us, that is).
> 
> In our current codebase, when a database query is done, PHP can only continue 
> when it has the result anyway, so it would require serious code modifications 
> to make use of such functionality. Also, while it may theoratically shorten 
> page load times, our webservers are already constraint by CPU load anyway, so 
> we would probably not be able to get more pageviews out of it either.
> 
> -----Oorspronkelijk bericht-----
> Van: Alexey Zakhlestin [mailto:[EMAIL PROTECTED]
> Verzonden: za 10-11-2007 11:31
> Aan: Arend van Beelen
> CC: internals@lists.php.net
> Onderwerp: Re: [PHP-DEV] Making parallel database queries from PHP
>  
> I would prefer to have some function, which would check, if the
> requested data is already available (if it is not, I would still be
> able to do something useful, while waiting)
> 
> On 11/10/07, Arend van Beelen <[EMAIL PROTECTED]> wrote:
>> Hi there,
>>
>> I am researching the possibility of developing a shared library which can
>> perform database queries in parallel to multiple databases. One important
>> requirement is that I will be able to use this functionality from PHP.
>> Because I know PHP is not thread-safe due to other libraries, I am wondering
>> what would be the best way to implement this. Right now I can imagine three
>> solutions:
>>
>> - Use multiple threads to connect to the databases, but let the library
>> export a blocking single-threaded API. So, PHP calls a function in the
>> library, this function spawns new threads, which do the real work. Meanwhile
>> the function waits for the threads to finish, and when all threads are done
>> it returns the final result back to PHP.
>> - Use a single thread and asynchronous socket communication. So, PHP calls
>> the library function and this function handles all connections within the
>> same thread using asynchronous communication, and returns the result to PHP
>> when all communication is completed.
>> - Use a daemon on the localhost. Make a connection from PHP to the daemon,
>> the daemon handles all the connections to the databases and passes the
>> result back to the connection made from PHP.
>>
>> Can someone give me some advise about advantages of using one approach or
>> another? Please keep in mind that I'm hoping for a solution which will be
>> both stable and minimizes overhead.
>>
>> Thanks,
>> Arend.
>>
>> --
>> Arend van Beelen jr.
>> "If you want my address, it's number one at the end of the bar."
>>
> 
> 





Reply via email to