Hi Vratko.

Have you looked at [0]?

[0] https://gerrit.fd.io/r/c/vpp/+/30350

On Wed, Feb 3, 2021 at 9:44 AM Vratko Polak -X (vrpolak - PANTHEON TECH SRO
at Cisco) <vrpo...@cisco.com> wrote:

> > Hello people interested in PAPI (VPP's Python API client library).
> Hello again.
> This is an update e-mail, adding some information,
> while still asking basically the same questions.
> Since my first e-mail, there was some private communication,
> mostly related to reasons the vanilla performance is not good,
> and how improvements to VAT [2] can help.
> > The exact code change is [0], but that may be hard to review.
> The current patch set [3] is a little better.
> > For that I have created [1], which shows the changed VPP PAPI code.
> Still mostly unfinished, I need to familiarize better with shmem transport.
> The main inputs came from Peter, who expressed
> dislike [4] on how brittle the fast binary message generation is,
> and he prefers "we will call a vector operation and PAPI just executes it".
> Let me summarize the current options as I see them.
> 1. Keep the status quo.
> That means using VAT for some operations (e.g. adding multiple routes [5]),
> and creating "exec scripts" [6] for operations without VAT one-liner.
> Pros: No work needed, good speed, old VPP versions are supported.
> Cons: Relying on VAT functionality (outside API compatibility rules).
> 2. Support "vector operations" in VPP via binary API.
> This will probably need a new VPP plugin to host the implementations.
> Pros: Fast speed, small CSIT work, guarded by API compatibility rules.
> Cons: New VPP plugin of questionable usefulness outside CSIT,
> plugin maintainer needed, old VPP versions not supported.
> 3. VPP PAPI improvements only.
> No changes to VPP API, just changes to PAPI to allow better speed for
> socket interactions.
> CSIT would need a fast way to synthetize binary messages.
> Pros: Small VPP work, good speed, only "official" VPP API is used.
> Cons: Brittle CSIT message handling, old VPP versions not supported.
> 4. CSIT hacks only (Gerrit 26019).
> No changes to VPP API nor PAPI. CSIT code messes with PAPI internals.
> CSIT needs a fast way to synthetize binary messages.
> Pros: Code is ready, good speed, old VPP versions are supported.
> Cons: Brittle CSIT message handling, risky with respect to VPP PAPI
> changes.
> The open questions:
> Do you see any other options?
> Did I miss some important pros or cons?
> Which option do you prefer?
> Vratko.
> [2] https://lists.fd.io/g/vpp-dev/topic/78362835#18092
> [3] https://gerrit.fd.io/r/c/csit/+/26019/140
> [4]
> https://gerrit.fd.io/r/c/csit/+/26019/140#message-314d168d8951b539e588e644a875624f5ca3fb77
> [5]
> https://github.com/FDio/csit/blob/b5073afc4a941ea33ce874e016fe86384ae7a60d/resources/templates/vat/vpp_route_add.vat
> [6]
> https://github.com/FDio/csit/blob/b5073afc4a941ea33ce874e016fe86384ae7a60d/resources/libraries/python/TestConfig.py#L121-L150
> *From:* vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> *On Behalf Of *Vratko
> Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io
> *Sent:* Thursday, 2020-May-07 18:35
> *To:* vpp-dev@lists.fd.io
> *Cc:* csit-...@lists.fd.io
> *Subject:* [vpp-dev] Faster PAPI
> Hello people interested in PAPI (VPP's Python API client library).
> In CSIT, our tests are using PAPI to interact with VPP.
> We are using socket transport (instead of shared memory transport),
> as VPP is running on machines separate from machines running the tests.
> We use SSH to forward the socket between the machines.
> Some of our scale tests need to send high number of commands towards VPP.
> The largest test sends 4 million commands (ip_route_add_del with ip6
> addresses).
> You can imagine that can take a while.
> Even using PAPI in asynchronous mode, it takes tens of minutes per million
> commands.
> I was able to speed that up considerably, just by changing code on CSIT
> side.
> The exact code change is [0], but that may be hard to review.
> Gerrit does not even recognize the new PapiSocketExecutor.py
> to be an edited copy of the old PapiExecutor.py file.
> That code relies on the fact that Python is quite permissive language,
> not really distinguishing private fields and methods from public ones.
> So the current code is vulnerable to refactors of VPP PAPI code.
> Also, pylint (static code analysis tool CSIT uses) is complaining.
> The proper fix is to change the VPP PAPI code,
> so that it exposes the inner parts the new CSIT code needs to access
> (or some abstractions of them).
> For that I have created [1], which shows the changed VPP PAPI code.
> Commit message contains a simplified example of how the new features can
> be used.
> The changed VPP code allows three performance improvements.
> 1. Capturing raw bytes sent.
> For complicated commands, many CPU cycles are spent serializing
> command arguments (basically nested python dicts) into bytes.
> If user (CSIT code) has access to the message as serialized by PAPI (VPP
> code),
> the user can choose a faster method to create subsequent serialized data.
> Implementing this on CSIT side improved the speed, but not greatly enough.
> (See bytes_template.py in [0] for the faster data generator.)
> The VPP code [1] introduces fields remember_sent and last_sent.
> 2. Reading replies without de-serializing them.
> This was already possible by calling transport.q.get(),
> but had next to no effect on PAPI speed.
> Replies are usually short, so deserialization does not take too many
> cycles.
> 3. Disabling the background reader thread.
> By default, socket transport creates (upon connect) a background thread,
> which select()s on the socket, reads any messages,
> and put()s them to transport.q (multiprocessing.Queue).
> I am not sure whether it is the multithreading (waiting for Python
> interpreter
> to switch between threads), or Queue (locks, its own reader thread),
> but overall this was the remaining bottleneck.
> The VPP code exposes public methods for stopping and starting the thread.
> Back to point 2:
> With the reading thread stopped, transport.q is not filled,
> so another way to read the reply is needed.
> The VPP code contained transport._read(),
> the underscore hinting this is an internal method
> (leading to the abovementioned pylint complaints).
> The VPP change [1] renames that method to read_message(),
> adding a docstring explaining it has to be used
> when the reading thread is stopped.
> Finally, with all 3 improvements, CSIT will be able
> to execute million PAPI commands in around 15 seconds.
> Even if something like [1] is merged to VPP,
> CSIT will still use [0] for some time,
> so we are able to test older VPP versions.
> So, any comments on [1], or other ideas
> on what changes are needed on VPP side
> so users can achieve good PAPI speed using public PAPI methods?
> Vratko.
> [0] https://gerrit.fd.io/r/c/csit/+/26019/108
> [1] https://gerrit.fd.io/r/c/vpp/+/26946/1
Links: You receive all messages sent to this group.
View/Reply Online (#18657): https://lists.fd.io/g/vpp-dev/message/18657
Mute This Topic: https://lists.fd.io/mt/80354555/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]

  • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io
    • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io
      • ... Paul Vinciguerra
      • ... Paul Vinciguerra
        • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io
          • ... Paul Vinciguerra
      • ... Ole Troan
        • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io
          • ... Paul Vinciguerra

Reply via email to