It seems to me that the root of the problem is points 2/6. Can we address
the issue by adding support for an iterator/generator type to the API and
pass it across the wire and have the api service construct the concrete
commands on the VPP side?

If I misunderstand and these are all separate problems with papi, then we
are better to address the deficiencies in a way that is consistent across
all clients rather than look for side-channel alternatives.

Paul



On Thu, Apr 8, 2021 at 9:29 AM Vratko Polak -X (vrpolak - PANTHEON TECH SRO
at Cisco) <vrpo...@cisco.com> wrote:

> Now back to this branch of the conversation.
>
> 1a) would be:
> >> That means using VAT for some operations (e.g. adding multiple routes
> [5]),
> >> and creating "exec scripts" [6] for operations without VAT one-liner.
>
> > 1c) Use a debug CLI script file and "exec"
>
> Very similar to 1a), but replacing VAT1 with exec and CLI.
> I think lately VAT1 is actually more stable than CLI,
> and one-liners are faster than long scripts,
> so no real pros for 1c) (unless VAT1 is going to be deprecated).
>
> > 1b) Pre-generate a JSON file with all commands and load that into VPP
> with VAT2.
>
> After thinking about this, I think it is worth a try.
> VAT2 uses simple logic to deserialize data and call binary API,
> making it fairly resistant behavior changes
> (as we have crcchecker to guard production APIs).
> On CSIT side the code would not look much different
> from the 1a) case we support today.
> I will probably create a proof of concept in Gerrit
> to see what the performance is.
>
> Contrary to pure PAPI solutions,
> many keywords in CSIT would need to know how to emit
> their command for the json+VAT2 call (not just for PAPI+socket),
> but that is not much trouble.
>
> >> 2. Support "vector operations" in VPP via binary API.
> >
> > Could you ellaborate on this idea?
>
> I think your 6) below is the same thing.
>
> >> 3. VPP PAPI improvements only.
> >> No changes to VPP API, just changes to PAPI to allow better speed for
> socket interactions.
>
> This is what https://gerrit.fd.io/r/c/vpp/+/31920
> (mentioned in the other branch of this conversation)
> is an example for.
>
> >> CSIT would need a fast way to synthetize binary messages.
> >
> > What would this be?
> > E.g. we could do serializations in C with a small Python wrapper.
>
> Multiple possibilities. The main thing is
> this would be done purely using CSIT code,
> so no dependency on VPP PAPI (or any other) code.
> (So does not need to be decided in this conversation.)
>
>
> https://gerrit.fd.io/r/c/csit/+/26019/140/resources/libraries/python/bytes_template.py
> is an example. It is Python, but fast enough for CSIT purposes.
>
> >> 4. CSIT hacks only (Gerrit 26019).
> >
> > This is the idea explained below, where you serialize message once and
> replays right?
> > In addition to tweaking the reader thread etc?
>
> > 5) Paul's proposal. I don't know if he has measured performance impact
> on that.
>
> Which one, vapi+swig?
> I am not sure what the complete solution looks like.
> I believe vapi needs to be on VPP machine,
> but CSIT Python stuff (for swig) is on a separate machine.
> I do not see swig having any transport capabilities,
> so we would need to insert them (socket, file transfer, something else)
> somewhere, and I do not see a good place.
>
> > 6) The add 4 million routes that CSIT uses, generates routes according
> to a set of parameters.
> >    We could add a small plugin on the VPP side that exposes an API of
> the sort
> >    "create <n> routes from <prefix>".
>
> Subtypes:
> 6a) Official plugin in VPP repo, needs a maintainer.
> 6b) Plugin outside VPP repo (perhaps in CSIT). Needs to be built and
> deployed.
>
> > Is it the serialisation, or the message RTT or both?
> > If serialisation, doing it in C should help.
> > If it's the RTT, larger messages is one option.
>
> There are three known bottlenecks. Two for RTT, one for serialization.
> I do not recall the real numbers,
> but for the sake of discussion, you can assume
> removing bottlenecks in the following order,
> each bottleneck removed halves the configuration time.
>
> First bottleneck: Sync commands.
> Solution: Send multiple commands before reading replies.
> No support from VPP needed, maybe except reporting
> how big the last message sent was (to avoid UDS buffers getting full).
>
> Second bottleneck: Background threads VPP PAPI code uses
> to read replies asynchronously from socket, deserialize them,
> and put to a queue for user to read.
> Solution: Hack vpp_transport_socket.VppTransport after connect
> to stop message_thread and read from socket directly.
> Or use a different VppTransport implementation
> that does not start the thread (nor multiprocessing queues) in first place.
> 26019 does the former, 31920 does the latter.
> This bottleneck is the primary reason I started this conversation.
>
> Third bottleneck: Serialization of many commands (and deserialization of
> responses).
> I do not recall if the slower part is VPP PAPI code (vpp_serializer)
> or CSIT preparing arguments to serialize. Say, it is both.
> Many solutions to this one, important from CSIT reviewers point of view,
> but not that important for VPP developers (PAPI or otherwise).
>
> Fourth bottleneck: SSH forwarding volume of data between
> socket endpoints on different machines.
> Solution: Forward just a single command, and have a utility/plugin/whatever
> on VPP machine to execute the implied bulk work quickly.
> This sidesteps all the previous bottlenecks,
> so it is the secondary reason for this conversation.
>
> Personally, I do not like putting CSIT utilities on VPP machine,
> but sometimes it is the least evil (e.g. we do that for reading stats
> segment).
>
> 1a) with VAT1 is a fourth bottleneck solution,
> that is why we still use it.
> 6a) is also a fourth bottleneck solution, but with stable API.
>
> 26019 contains solutions for first, second and third bottleneck.
> 31920 is a solution for second bottleneck, assuming first bottleneck
> is solved as in 26019 and for third bottleneck we use 26019 or something
> else.
>
> 1b) solves (avoids) first and second bottleneck,
> but it has its own third bottleneck.
> Luckily, many binary data serialization solutions
> could be adapted for json data serialization.
>
> Vratko.
>
> -----Original Message-----
> From: otr...@employees.org <otr...@employees.org>
> Sent: Thursday, 2021-February-04 08:47
> To: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) <
> vrpo...@cisco.com>
> Cc: Paul Vinciguerra <pvi...@vinciconsulting.com>; Peter Mikus -X (pmikus
> - PANTHEON TECH SRO at Cisco) <pmi...@cisco.com>; csit-...@lists.fd.io;
> vpp-dev@lists.fd.io
> Subject: Re: [csit-dev] Faster PAPI
>
> > 1. Keep the status quo.
> > That means using VAT for some operations (e.g. adding multiple routes
> [5]),
> > and creating "exec scripts" [6] for operations without VAT one-liner.
> > Pros: No work needed, good speed, old VPP versions are supported.
> > Cons: Relying on VAT functionality (outside API compatibility rules).
>
> 1b) Pre-generate a JSON file with all commands and load that into VPP with
> VAT2.
> 1c) Use a debug CLI script file and "exec"
>
> > 2. Support "vector operations" in VPP via binary API.
> > This will probably need a new VPP plugin to host the implementations.
> > Pros: Fast speed, small CSIT work, guarded by API compatibility rules.
> > Cons: New VPP plugin of questionable usefulness outside CSIT,
> > plugin maintainer needed, old VPP versions not supported.
>
> Could you ellaborate on this idea?
>
> > 3. VPP PAPI improvements only.
> > No changes to VPP API, just changes to PAPI to allow better speed for
> socket interactions.
> > CSIT would need a fast way to synthetize binary messages.
> > Pros: Small VPP work, good speed, only "official" VPP API is used.
> > Cons: Brittle CSIT message handling, old VPP versions not supported.
>
> What would this be?
> E.g. we could do serializations in C with a small Python wrapper.
>
> > 4. CSIT hacks only (Gerrit 26019).
> > No changes to VPP API nor PAPI. CSIT code messes with PAPI internals.
> > CSIT needs a fast way to synthetize binary messages.
> > Pros: Code is ready, good speed, old VPP versions are supported.
> > Cons: Brittle CSIT message handling, risky with respect to VPP PAPI
> changes.
>
> This is the idea explained below, where you serialize message once and
> replays right?
> In addition to tweaking the reader thread etc?
>
> 5) Paul's proposal. I don't know if he has measured performance impact on
> that.
>
> 6) The add 4 million routes that CSIT uses, generates routes according to
> a set of parameters.
>    We could add a small plugin on the VPP side that exposes an API of the
> sort
>    "create <n> routes from <prefix>".
>
> Is it the serialisation, or the message RTT or both?
> If serialisation, doing it in C should help.
> If it's the RTT, larger messages is one option.
> Uploading all messages to VPP and then ask VPP to process them, receiving
> a single reply is also an option.
> Is this your "vector" idea?
>
> > The open questions:
> > Do you see any other options?
> > Did I miss some important pros or cons?
> > Which option do you prefer?
>
> The lowest hanging fruit is likely 6. But longer term I'd prefer a more
> generic solution.
>
> Best regards,
> Ole
>
> > [2] https://lists.fd.io/g/vpp-dev/topic/78362835#18092
> > [3] https://gerrit.fd.io/r/c/csit/+/26019/140
> > [4]
> https://gerrit.fd.io/r/c/csit/+/26019/140#message-314d168d8951b539e588e644a875624f5ca3fb77
> > [5]
> https://github.com/FDio/csit/blob/b5073afc4a941ea33ce874e016fe86384ae7a60d/resources/templates/vat/vpp_route_add.vat
> > [6]
> https://github.com/FDio/csit/blob/b5073afc4a941ea33ce874e016fe86384ae7a60d/resources/libraries/python/TestConfig.py#L121-L150
> >
> > From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Vratko
> Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io
> > Sent: Thursday, 2020-May-07 18:35
> > To: vpp-dev@lists.fd.io
> > Cc: csit-...@lists.fd.io
> > Subject: [vpp-dev] Faster PAPI
> >
> > Hello people interested in PAPI (VPP's Python API client library).
> >
> > In CSIT, our tests are using PAPI to interact with VPP.
> > We are using socket transport (instead of shared memory transport),
> > as VPP is running on machines separate from machines running the tests.
> > We use SSH to forward the socket between the machines.
> >
> > Some of our scale tests need to send high number of commands towards VPP.
> > The largest test sends 4 million commands (ip_route_add_del with ip6
> addresses).
> > You can imagine that can take a while.
> > Even using PAPI in asynchronous mode, it takes tens of minutes per
> million commands.
> >
> > I was able to speed that up considerably, just by changing code on CSIT
> side.
> > The exact code change is [0], but that may be hard to review.
> > Gerrit does not even recognize the new PapiSocketExecutor.py
> > to be an edited copy of the old PapiExecutor.py file.
> >
> > That code relies on the fact that Python is quite permissive language,
> > not really distinguishing private fields and methods from public ones.
> > So the current code is vulnerable to refactors of VPP PAPI code.
> > Also, pylint (static code analysis tool CSIT uses) is complaining.
> >
> > The proper fix is to change the VPP PAPI code,
> > so that it exposes the inner parts the new CSIT code needs to access
> > (or some abstractions of them).
> >
> > For that I have created [1], which shows the changed VPP PAPI code.
> > Commit message contains a simplified example of how the new features can
> be used.
> >
> > The changed VPP code allows three performance improvements.
> >
> > 1. Capturing raw bytes sent.
> > For complicated commands, many CPU cycles are spent serializing
> > command arguments (basically nested python dicts) into bytes.
> > If user (CSIT code) has access to the message as serialized by PAPI (VPP
> code),
> > the user can choose a faster method to create subsequent serialized data.
> > Implementing this on CSIT side improved the speed, but not greatly
> enough.
> > (See bytes_template.py in [0] for the faster data generator.)
> > The VPP code [1] introduces fields remember_sent and last_sent.
> >
> > 2. Reading replies without de-serializing them.
> > This was already possible by calling transport.q.get(),
> > but had next to no effect on PAPI speed.
> > Replies are usually short, so deserialization does not take too many
> cycles.
> >
> > 3. Disabling the background reader thread.
> > By default, socket transport creates (upon connect) a background thread,
> > which select()s on the socket, reads any messages,
> > and put()s them to transport.q (multiprocessing.Queue).
> > I am not sure whether it is the multithreading (waiting for Python
> interpreter
> > to switch between threads), or Queue (locks, its own reader thread),
> > but overall this was the remaining bottleneck.
> > The VPP code exposes public methods for stopping and starting the thread.
> >
> > Back to point 2:
> > With the reading thread stopped, transport.q is not filled,
> > so another way to read the reply is needed.
> > The VPP code contained transport._read(),
> > the underscore hinting this is an internal method
> > (leading to the abovementioned pylint complaints).
> > The VPP change [1] renames that method to read_message(),
> > adding a docstring explaining it has to be used
> > when the reading thread is stopped.
> >
> > Finally, with all 3 improvements, CSIT will be able
> > to execute million PAPI commands in around 15 seconds.
> >
> > Even if something like [1] is merged to VPP,
> > CSIT will still use [0] for some time,
> > so we are able to test older VPP versions.
> >
> > So, any comments on [1], or other ideas
> > on what changes are needed on VPP side
> > so users can achieve good PAPI speed using public PAPI methods?
> >
> > Vratko.
> >
> > [0] https://gerrit.fd.io/r/c/csit/+/26019/108
> > [1] https://gerrit.fd.io/r/c/vpp/+/26946/1
> >
> > 
> >
>
>
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19168): https://lists.fd.io/g/vpp-dev/message/19168
Mute This Topic: https://lists.fd.io/mt/80375128/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

  • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io
    • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io
      • ... Paul Vinciguerra
      • ... Paul Vinciguerra
        • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io
          • ... Paul Vinciguerra
      • ... Ole Troan
        • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io
          • ... Paul Vinciguerra

Reply via email to