2012/7/21 Paulo Ricardo Motta Gomes <pauloricard...@gmail.com> > Have you monitored the cpu utilization of the proxy server and the storage > nodes? I did similar tests with Swift and the proxy server exhausted its > capacity with a only few concurrent requests for very small objects.
Maximum CPU utilization on Proxy is reached to 100% on all cores at beginning. and 60~70 % on storage nodes. It seems up and down in a periodical duration. I did several system tunings , such as sysctl , ulimit etc... The concurrency request could be 500+ for 4K size though. > If you notice object servers are not overloaded, but proxy is overloaded, > a solution might be to have more proxy servers if you hav > Result still same with multiple proxy servers(2-4) . and each powerful swift-bench clients for each proxy node. Also , I did a test by swift-benc's direct client function to a particular node . I found there's a closed result . Once more objects been uploaded . The for chunk in iter(lambda: reader(self.network_chunk_size), ''): take lots of time periodically. > > It seems a problem of overload, since there are only 4 servers in the > system and a large level of concurrency. Have you tried slowly increasing > the number of concurrency to find the point where the problem starts? This > point may be the capacity of your system. > Last week , I got more servers from another HW providers with more CPU/RAM/DISKs . 12 Disks in each storage node. This deployment of swift cluster keep in better performance for longer time. Unfortunately , after 15,000,000 object . The performance reduced to half and the Failure appeared. I concerned about that if the (total number objs/disk numbers) = ? will cause such affect in large deployment.(aka. cloud storage provider , telecom , bank etc.) Really confusing ...... > > Also, are you using persistent connections to the proxy server to send the > object? If so, maybe try to renew them once in a while. > Renew connections for each round in swift-bench as I know. Well , swift-bench create a connection pool with concurrency = x connections . I think that connections been renew in every round. Something strange is that the performance back to beginning while I flush all data on storage nodes. (whatever by format disk / r m ) > > Cheers, > > Paulo > Thanks for your reply > > 2012/7/20 Kuo Hugo <tonyt...@gmail.com> > >> Hi Sam , and all openstacker >> >> This is Hugo . I'm facing an issue about the performance *degradation* of >> swift . >> I tried to figure out the problem of the issue which I faced in recent >> days. >> >> Environment : >> Swift version : master branch . latest code. >> Tried on Ubuntu 12.04/11.10 >> 1 Swift-proxy : 32GB-ram / CPU 4*2 / 1Gb NIC*2 >> 3 Storage-nodes : each for 32GB-ram / CPU 4*2 / 2TB*7 / 1Gb NIC*2 >> >> storage nodes runs only main workers(object-server , container-server , >> account-server) >> >> I'm in testing with 4K size objects by swift-bench. >> >> Per round bench.conf >> object_size = 4096 >> Concurrency : 200 >> Object number: 200000 >> Containers : 200 >> no delete objects .. >> >> At beginning , everything works fine in my environment. The average >> speed of PUT is reached to 1200/s . >> After several rounds test . I found that the performance is down to >> 300~400/s >> And after more rounds , failures appeared , and ERROR in proxy's log as >> followed >> >> Jul 20 18:44:54 angryman-proxy-01 proxy-server ERROR with Object server >> 192.168.100.101:36000/DISK5 re: Trying to get final status of PUT to >> /v1/AUTH_admin/9cbb3f9336b34019a6e7651adfc06a86_51/87b48a3474c7485c95aeef95c6911afb: >> Timeout (10s) (txn: txb4465d895c9345be95d81632db9729af) (client_ip: >> 172.168.1.2) >> Jul 20 18:44:54 angryman-proxy-01 proxy-server ERROR with Object server >> 192.168.100.101:36000/DISK4 re: Trying to get final status of PUT to >> /v1/AUTH_admin/9cbb3f9336b34019a6e7651adfc06a86_50/7405e5824cff411f8bb3ecc7c52ffd5a: >> Timeout (10s) (txn: txe0efab51f99945a7a09fa664b821777f) (client_ip: >> 172.168.1.2) >> Jul 20 18:44:55 angryman-proxy-01 proxy-server ERROR with Object server >> 192.168.100.101:36000/DISK5 re: Trying to get final status of PUT to >> /v1/AUTH_admin/9cbb3f9336b34019a6e7651adfc06a86_33/f322f4c08b124666bf7903812f4799fe: >> Timeout (10s) (txn: tx8282ecb118434f828b9fb269f0fb6bd0) (client_ip: >> 172.168.1.2) >> >> >> After trace the code of object-server swift/obj/server.py and insert a >> timer on >> https://github.com/openstack/swift/blob/master/swift/obj/server.py#L591 >> >> >> for chunk in iter(lambda: reader(self.network_chunk_size), ''): >> >> >> Seems that the reader sometimes took a lot of time for receiving data >> from wsgi.input. Not every request , it looks like has a time of periods. >> >> So that I check the history of Swift , I saw your commit >> https://github.com/openstack/swift/commit/783f16035a8e251d2138eb5bbaa459e9e4486d90 >> . That's the only one which close to my issue. So that I hope that >> there's some suggestions for me. >> >> My considerations : >> >> 1. Does it possible caused by greenio switch ? >> >> 2. Does it related to the number of objects existing on storage disks ? >> >> 3. Did someone play with swift by small size + fast client request ? >> >> 4. I found that the performance would never back to 1200/s . The only way >> to do is flush all data from disk. Once disk cleaned , the performance get >> back to the best one. >> >> 5. I re-read entire workflow of object server to handle a PUT request , I >> don't understand the reason why that the number of objects will affect >> reading wsgi.input data. With 4K size objects. no need to be chunked as I >> know. >> >> >> The time consumed by *reader(self.network_chunk_size)* >> >> Jul 20 17:09:36 angryman-storage-01 object-server Reader: 0.001391 >> >> Jul 20 17:09:36 angryman-storage-01 object-server Reader: 0.001839 >> >> Jul 20 17:09:36 angryman-storage-01 object-server Reader: 0.00164 >> >> Jul 20 17:09:36 angryman-storage-01 object-server Reader: 0.002786 >> >> Jul 20 17:09:36 angryman-storage-01 object-server Reader: 2.716707 >> >> Jul 20 17:09:36 angryman-storage-01 object-server Reader: 1.005659 >> >> Jul 20 17:09:36 angryman-storage-01 object-server Reader: 0.055982 >> >> Jul 20 17:09:36 angryman-storage-01 object-server Reader: 0.002205 >> >> >> Jul 20 18:39:14 angryman-storage-01 object-server WTF: 0.000968 >> >> Jul 20 18:39:14 angryman-storage-01 object-server WTF: 0.001328 >> >> Jul 20 18:39:14 angryman-storage-01 object-server WTF: 10.003368 >> >> Jul 20 18:39:14 angryman-storage-01 object-server WTF: 0.001243 >> >> Jul 20 18:39:14 angryman-storage-01 object-server WTF: 0.001562 >> >> >> Jul 20 17:52:41 angryman-storage-01 object-server WTF: 0.001067 >> >> Jul 20 17:52:41 angryman-storage-01 object-server WTF: 13.804413 >> >> Jul 20 17:52:41 angryman-storage-01 object-server WTF: 5.301166 >> >> Jul 20 17:52:41 angryman-storage-01 object-server WTF: 0.001167 >> >> >> >> >> Would it be a bug of eventlet or SWIFT ? Please feel free to let me >> know that should I file a bug for Swift . >> >> Appreciate ~ >> >> -- >> +Hugo Kuo+ >> tonyt...@gmail.com >> + <tonyt...@gmail.com>886 935004793 >> >> >> _______________________________________________ >> Mailing list: https://launchpad.net/~openstack >> Post to : openstack@lists.launchpad.net >> Unsubscribe : https://launchpad.net/~openstack >> More help : https://help.launchpad.net/ListHelp >> >> > > > -- > Paulo Ricardo > > -- > European Master in Distributed Computing*** > Royal Institute of Technology - KTH > * > *Instituto Superior Técnico - IST* > *http://paulormg.com* > > -- +Hugo Kuo+ tonyt...@gmail.com + <tonyt...@gmail.com>886 935004793
_______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp