Re: [vpp-dev] [csit-dev] Faster PAPI

Ole Troan Wed, 03 Feb 2021 23:47:06 -0800

> 1. Keep the status quo.
> That means using VAT for some operations (e.g. adding multiple routes [5]),
> and creating "exec scripts" [6] for operations without VAT one-liner.
> Pros: No work needed, good speed, old VPP versions are supported.
> Cons: Relying on VAT functionality (outside API compatibility rules).


1b) Pre-generate a JSON file with all commands and load that into VPP with VAT2.
1c) Use a debug CLI script file and "exec"

> 2. Support "vector operations" in VPP via binary API.
> This will probably need a new VPP plugin to host the implementations.
> Pros: Fast speed, small CSIT work, guarded by API compatibility rules.
> Cons: New VPP plugin of questionable usefulness outside CSIT,
> plugin maintainer needed, old VPP versions not supported.

Could you ellaborate on this idea?

> 3. VPP PAPI improvements only.
> No changes to VPP API, just changes to PAPI to allow better speed for socket 
> interactions.
> CSIT would need a fast way to synthetize binary messages.
> Pros: Small VPP work, good speed, only "official" VPP API is used.
> Cons: Brittle CSIT message handling, old VPP versions not supported.

What would this be?
E.g. we could do serializations in C with a small Python wrapper.

> 4. CSIT hacks only (Gerrit 26019).
> No changes to VPP API nor PAPI. CSIT code messes with PAPI internals.
> CSIT needs a fast way to synthetize binary messages.
> Pros: Code is ready, good speed, old VPP versions are supported.
> Cons: Brittle CSIT message handling, risky with respect to VPP PAPI changes.

This is the idea explained below, where you serialize message once and replays 
right?
In addition to tweaking the reader thread etc?

5) Paul's proposal. I don't know if he has measured performance impact on that.

6) The add 4 million routes that CSIT uses, generates routes according to a set 
of parameters.
   We could add a small plugin on the VPP side that exposes an API of the sort
   "create <n> routes from <prefix>".

Is it the serialisation, or the message RTT or both?
If serialisation, doing it in C should help.
If it's the RTT, larger messages is one option.
Uploading all messages to VPP and then ask VPP to process them, receiving a 
single reply is also an option.
Is this your "vector" idea?

> The open questions:
> Do you see any other options?
> Did I miss some important pros or cons?
> Which option do you prefer?

The lowest hanging fruit is likely 6. But longer term I'd prefer a more generic 
solution.

Best regards,
Ole

> [2] https://lists.fd.io/g/vpp-dev/topic/78362835#18092
> [3] https://gerrit.fd.io/r/c/csit/+/26019/140
> [4] 
> https://gerrit.fd.io/r/c/csit/+/26019/140#message-314d168d8951b539e588e644a875624f5ca3fb77
> [5] 
> https://github.com/FDio/csit/blob/b5073afc4a941ea33ce874e016fe86384ae7a60d/resources/templates/vat/vpp_route_add.vat
> [6] 
> https://github.com/FDio/csit/blob/b5073afc4a941ea33ce874e016fe86384ae7a60d/resources/libraries/python/TestConfig.py#L121-L150
> 
> From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Vratko Polak -X 
> (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io
> Sent: Thursday, 2020-May-07 18:35
> To: vpp-dev@lists.fd.io
> Cc: csit-...@lists.fd.io
> Subject: [vpp-dev] Faster PAPI
> 
> Hello people interested in PAPI (VPP's Python API client library).
> 
> In CSIT, our tests are using PAPI to interact with VPP.
> We are using socket transport (instead of shared memory transport),
> as VPP is running on machines separate from machines running the tests.
> We use SSH to forward the socket between the machines.
> 
> Some of our scale tests need to send high number of commands towards VPP.
> The largest test sends 4 million commands (ip_route_add_del with ip6 
> addresses).
> You can imagine that can take a while.
> Even using PAPI in asynchronous mode, it takes tens of minutes per million 
> commands.
> 
> I was able to speed that up considerably, just by changing code on CSIT side.
> The exact code change is [0], but that may be hard to review.
> Gerrit does not even recognize the new PapiSocketExecutor.py
> to be an edited copy of the old PapiExecutor.py file.
> 
> That code relies on the fact that Python is quite permissive language,
> not really distinguishing private fields and methods from public ones.
> So the current code is vulnerable to refactors of VPP PAPI code.
> Also, pylint (static code analysis tool CSIT uses) is complaining.
> 
> The proper fix is to change the VPP PAPI code,
> so that it exposes the inner parts the new CSIT code needs to access
> (or some abstractions of them).
> 
> For that I have created [1], which shows the changed VPP PAPI code.
> Commit message contains a simplified example of how the new features can be 
> used.
> 
> The changed VPP code allows three performance improvements.
> 
> 1. Capturing raw bytes sent.
> For complicated commands, many CPU cycles are spent serializing
> command arguments (basically nested python dicts) into bytes.
> If user (CSIT code) has access to the message as serialized by PAPI (VPP 
> code),
> the user can choose a faster method to create subsequent serialized data.
> Implementing this on CSIT side improved the speed, but not greatly enough.
> (See bytes_template.py in [0] for the faster data generator.)
> The VPP code [1] introduces fields remember_sent and last_sent.
> 
> 2. Reading replies without de-serializing them.
> This was already possible by calling transport.q.get(),
> but had next to no effect on PAPI speed.
> Replies are usually short, so deserialization does not take too many cycles.
> 
> 3. Disabling the background reader thread.
> By default, socket transport creates (upon connect) a background thread,
> which select()s on the socket, reads any messages,
> and put()s them to transport.q (multiprocessing.Queue).
> I am not sure whether it is the multithreading (waiting for Python interpreter
> to switch between threads), or Queue (locks, its own reader thread),
> but overall this was the remaining bottleneck.
> The VPP code exposes public methods for stopping and starting the thread.
> 
> Back to point 2:
> With the reading thread stopped, transport.q is not filled,
> so another way to read the reply is needed.
> The VPP code contained transport._read(),
> the underscore hinting this is an internal method
> (leading to the abovementioned pylint complaints).
> The VPP change [1] renames that method to read_message(),
> adding a docstring explaining it has to be used
> when the reading thread is stopped.
> 
> Finally, with all 3 improvements, CSIT will be able
> to execute million PAPI commands in around 15 seconds.
> 
> Even if something like [1] is merged to VPP,
> CSIT will still use [0] for some time,
> so we are able to test older VPP versions.
> 
> So, any comments on [1], or other ideas
> on what changes are needed on VPP side
> so users can achieve good PAPI speed using public PAPI methods?
> 
> Vratko.
> 
> [0] https://gerrit.fd.io/r/c/csit/+/26019/108
> [1] https://gerrit.fd.io/r/c/vpp/+/26946/1
> 
> 
>

signature.asc
Description: Message signed with OpenPGP

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18667): https://lists.fd.io/g/vpp-dev/message/18667
Mute This Topic: https://lists.fd.io/mt/80375128/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] [csit-dev] Faster PAPI

Reply via email to