Hi, all I have issue of thrift-performance in python, does anyone has an experience on thrift-in-python?
My question in stackoverflow: http://stackoverflow.com/questions/14171227/why-is-thrift-binary-protocol-serialization-so-much-slow Copy the question to here(open stackoverflow to check pretty-print question content): I'm newbie on thrift. I wrote a thrift server in python, also client in python too. Here is my thrift defination: struct RatingByReport { 1: required string ticker, 2: required i32 cnt_institution, 3: optional list<string> strong_buy, 4: optional list<string> buy, 5: optional list<string> neutral, 6: optional list<string> sell, 7: optional list<string> strong_sell, 8: optional i32 cnt_maintain, 9: optional i32 cnt_upgrade, 10: optional i32 cnt_downgrade, 11: optional i32 avg_score, 12: optional string adjustment } struct TableRatingByReport { 1: required list<string> head, 2: required list<RatingByReport> body, 3: optional struct.CadaTranslation translation } service china{ void ping(), TableRatingByReport rating_byreport(1:string ticker) throws (1:struct.CadaInternalError error) } Here is my server side: handler = StockChinaHandler() processor = china.Processor(handler) #startup() transport = TSocket.TServerSocket(port=30303) tfactory = TTransport.TBufferedTransportFactory() pfactory = TBinaryProtocol.TBinaryProtocolFactory() server = TServer.TSimpleServer(processor, transport, tfactory, pfactory) #server = TProcessPoolServer.TProcessPoolServer(processor, transport, # tfactory, pfactory) print "Start server..." import cProfile print >>open('/tmp/test.log', 'w'), cProfile.run('server.serve()', sort='cumulative') #server.serve() print "done!" Client side: # Make socket transport = TSocket.TSocket('localhost', 30303) # Buffering is critical. Raw sockets are very slow transport = TTransport.TBufferedTransport(transport) # Wrap in a protocol protocol = TBinaryProtocol.TBinaryProtocol(transport) # Create a client to use the protocol encoder client = china.Client(protocol) # Connect! transport.open() client.ping() print "ping()" print msg msg = client.rating_byreport('2012-01-04') print msg transport.close() cProfile result: ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 230.968 230.968 <string>:1(<module>) 1 0.000 0.000 230.968 230.968 TServer.py:74(serve) 3 0.000 0.000 225.967 75.322 TSocket.py:172(accept) 3 0.000 0.000 225.967 75.322 socket.py:194(accept) 3 225.967 75.322 225.967 75.322 {method 'accept' of '_socket.socket' objects} 5 0.003 0.001 4.993 0.999 china.py:140(process) 1 0.000 0.000 3.200 3.200 china.py:177(process_rating_byreport) 1 0.000 0.000 2.366 2.366 china.py:500(write) 1 0.003 0.003 2.366 2.366 ttypes.py:515(write) 1455 0.261 0.000 2.363 0.002 ttypes.py:364(write) 155556 0.246 0.000 1.995 0.000 TCompactProtocol.py:38(nested) 145880 0.298 0.000 1.640 0.000 TCompactProtocol.py:255(__writeString) 18 1.370 0.076 1.370 0.076 {method 'recv' of '_socket.socket' objects} 5 0.000 0.000 1.292 0.258 TCompactProtocol.py:306(readMessageBegin) 13 0.000 0.000 1.292 0.099 TCompactProtocol.py:286(__readUByte) 26 0.000 0.000 1.291 0.050 TTransport.py:54(readAll) 26 0.000 0.000 1.291 0.050 TTransport.py:154(read) 5 0.000 0.000 1.291 0.258 TSocket.py:101(read) In my case, TableRatingByReport instance got a body with 1400 rows(list\), and It cost over 3 seconds(function *process_rating_byreport*, which is auto generate by thift) to generate binnary content. I don't know why it is so slow. Using json to serialize the same data, It's only spend less than 200 ms. I'm wondering did I use the uncorrect way to manipulate thrift? Thanks. -- http://mail.python.org/mailman/listinfo/python-list