Hi Vratko. Have you looked at [0]?
[0] https://gerrit.fd.io/r/c/vpp/+/30350 On Wed, Feb 3, 2021 at 9:44 AM Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) <vrpo...@cisco.com> wrote: > > Hello people interested in PAPI (VPP's Python API client library). > > > > Hello again. > > This is an update e-mail, adding some information, > > while still asking basically the same questions. > > > > Since my first e-mail, there was some private communication, > > mostly related to reasons the vanilla performance is not good, > > and how improvements to VAT [2] can help. > > > > > The exact code change is [0], but that may be hard to review. > > > > The current patch set [3] is a little better. > > > > > For that I have created [1], which shows the changed VPP PAPI code. > > > > Still mostly unfinished, I need to familiarize better with shmem transport. > > > > The main inputs came from Peter, who expressed > > dislike [4] on how brittle the fast binary message generation is, > > and he prefers "we will call a vector operation and PAPI just executes it". > > > > Let me summarize the current options as I see them. > > > > 1. Keep the status quo. > > That means using VAT for some operations (e.g. adding multiple routes [5]), > > and creating "exec scripts" [6] for operations without VAT one-liner. > > Pros: No work needed, good speed, old VPP versions are supported. > > Cons: Relying on VAT functionality (outside API compatibility rules). > > > > 2. Support "vector operations" in VPP via binary API. > > This will probably need a new VPP plugin to host the implementations. > > Pros: Fast speed, small CSIT work, guarded by API compatibility rules. > > Cons: New VPP plugin of questionable usefulness outside CSIT, > > plugin maintainer needed, old VPP versions not supported. > > > > 3. VPP PAPI improvements only. > > No changes to VPP API, just changes to PAPI to allow better speed for > socket interactions. > > CSIT would need a fast way to synthetize binary messages. > > Pros: Small VPP work, good speed, only "official" VPP API is used. > > Cons: Brittle CSIT message handling, old VPP versions not supported. > > > > 4. CSIT hacks only (Gerrit 26019). > > No changes to VPP API nor PAPI. CSIT code messes with PAPI internals. > > CSIT needs a fast way to synthetize binary messages. > > Pros: Code is ready, good speed, old VPP versions are supported. > > Cons: Brittle CSIT message handling, risky with respect to VPP PAPI > changes. > > > > The open questions: > > Do you see any other options? > > Did I miss some important pros or cons? > > Which option do you prefer? > > > > Vratko. > > > > [2] https://lists.fd.io/g/vpp-dev/topic/78362835#18092 > > [3] https://gerrit.fd.io/r/c/csit/+/26019/140 > > [4] > https://gerrit.fd.io/r/c/csit/+/26019/140#message-314d168d8951b539e588e644a875624f5ca3fb77 > > [5] > https://github.com/FDio/csit/blob/b5073afc4a941ea33ce874e016fe86384ae7a60d/resources/templates/vat/vpp_route_add.vat > > [6] > https://github.com/FDio/csit/blob/b5073afc4a941ea33ce874e016fe86384ae7a60d/resources/libraries/python/TestConfig.py#L121-L150 > > > > *From:* vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> *On Behalf Of *Vratko > Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io > *Sent:* Thursday, 2020-May-07 18:35 > *To:* vpp-dev@lists.fd.io > *Cc:* csit-...@lists.fd.io > *Subject:* [vpp-dev] Faster PAPI > > > > Hello people interested in PAPI (VPP's Python API client library). > > > > In CSIT, our tests are using PAPI to interact with VPP. > > We are using socket transport (instead of shared memory transport), > > as VPP is running on machines separate from machines running the tests. > > We use SSH to forward the socket between the machines. > > > > Some of our scale tests need to send high number of commands towards VPP. > > The largest test sends 4 million commands (ip_route_add_del with ip6 > addresses). > > You can imagine that can take a while. > > Even using PAPI in asynchronous mode, it takes tens of minutes per million > commands. > > > > I was able to speed that up considerably, just by changing code on CSIT > side. > > The exact code change is [0], but that may be hard to review. > > Gerrit does not even recognize the new PapiSocketExecutor.py > > to be an edited copy of the old PapiExecutor.py file. > > > > That code relies on the fact that Python is quite permissive language, > > not really distinguishing private fields and methods from public ones. > > So the current code is vulnerable to refactors of VPP PAPI code. > > Also, pylint (static code analysis tool CSIT uses) is complaining. > > > > The proper fix is to change the VPP PAPI code, > > so that it exposes the inner parts the new CSIT code needs to access > > (or some abstractions of them). > > > > For that I have created [1], which shows the changed VPP PAPI code. > > Commit message contains a simplified example of how the new features can > be used. > > > > The changed VPP code allows three performance improvements. > > > > 1. Capturing raw bytes sent. > > For complicated commands, many CPU cycles are spent serializing > > command arguments (basically nested python dicts) into bytes. > > If user (CSIT code) has access to the message as serialized by PAPI (VPP > code), > > the user can choose a faster method to create subsequent serialized data. > > Implementing this on CSIT side improved the speed, but not greatly enough. > > (See bytes_template.py in [0] for the faster data generator.) > > The VPP code [1] introduces fields remember_sent and last_sent. > > > > 2. Reading replies without de-serializing them. > > This was already possible by calling transport.q.get(), > > but had next to no effect on PAPI speed. > > Replies are usually short, so deserialization does not take too many > cycles. > > > > 3. Disabling the background reader thread. > > By default, socket transport creates (upon connect) a background thread, > > which select()s on the socket, reads any messages, > > and put()s them to transport.q (multiprocessing.Queue). > > I am not sure whether it is the multithreading (waiting for Python > interpreter > > to switch between threads), or Queue (locks, its own reader thread), > > but overall this was the remaining bottleneck. > > The VPP code exposes public methods for stopping and starting the thread. > > > > Back to point 2: > > With the reading thread stopped, transport.q is not filled, > > so another way to read the reply is needed. > > The VPP code contained transport._read(), > > the underscore hinting this is an internal method > > (leading to the abovementioned pylint complaints). > > The VPP change [1] renames that method to read_message(), > > adding a docstring explaining it has to be used > > when the reading thread is stopped. > > > > Finally, with all 3 improvements, CSIT will be able > > to execute million PAPI commands in around 15 seconds. > > > > Even if something like [1] is merged to VPP, > > CSIT will still use [0] for some time, > > so we are able to test older VPP versions. > > > > So, any comments on [1], or other ideas > > on what changes are needed on VPP side > > so users can achieve good PAPI speed using public PAPI methods? > > > > Vratko. > > > > [0] https://gerrit.fd.io/r/c/csit/+/26019/108 > > [1] https://gerrit.fd.io/r/c/vpp/+/26946/1 >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#18657): https://lists.fd.io/g/vpp-dev/message/18657 Mute This Topic: https://lists.fd.io/mt/80354555/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-