It seems to me that the root of the problem is points 2/6. Can we address the issue by adding support for an iterator/generator type to the API and pass it across the wire and have the api service construct the concrete commands on the VPP side?
If I misunderstand and these are all separate problems with papi, then we are better to address the deficiencies in a way that is consistent across all clients rather than look for side-channel alternatives. Paul On Thu, Apr 8, 2021 at 9:29 AM Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) <vrpo...@cisco.com> wrote: > Now back to this branch of the conversation. > > 1a) would be: > >> That means using VAT for some operations (e.g. adding multiple routes > [5]), > >> and creating "exec scripts" [6] for operations without VAT one-liner. > > > 1c) Use a debug CLI script file and "exec" > > Very similar to 1a), but replacing VAT1 with exec and CLI. > I think lately VAT1 is actually more stable than CLI, > and one-liners are faster than long scripts, > so no real pros for 1c) (unless VAT1 is going to be deprecated). > > > 1b) Pre-generate a JSON file with all commands and load that into VPP > with VAT2. > > After thinking about this, I think it is worth a try. > VAT2 uses simple logic to deserialize data and call binary API, > making it fairly resistant behavior changes > (as we have crcchecker to guard production APIs). > On CSIT side the code would not look much different > from the 1a) case we support today. > I will probably create a proof of concept in Gerrit > to see what the performance is. > > Contrary to pure PAPI solutions, > many keywords in CSIT would need to know how to emit > their command for the json+VAT2 call (not just for PAPI+socket), > but that is not much trouble. > > >> 2. Support "vector operations" in VPP via binary API. > > > > Could you ellaborate on this idea? > > I think your 6) below is the same thing. > > >> 3. VPP PAPI improvements only. > >> No changes to VPP API, just changes to PAPI to allow better speed for > socket interactions. > > This is what https://gerrit.fd.io/r/c/vpp/+/31920 > (mentioned in the other branch of this conversation) > is an example for. > > >> CSIT would need a fast way to synthetize binary messages. > > > > What would this be? > > E.g. we could do serializations in C with a small Python wrapper. > > Multiple possibilities. The main thing is > this would be done purely using CSIT code, > so no dependency on VPP PAPI (or any other) code. > (So does not need to be decided in this conversation.) > > > https://gerrit.fd.io/r/c/csit/+/26019/140/resources/libraries/python/bytes_template.py > is an example. It is Python, but fast enough for CSIT purposes. > > >> 4. CSIT hacks only (Gerrit 26019). > > > > This is the idea explained below, where you serialize message once and > replays right? > > In addition to tweaking the reader thread etc? > > > 5) Paul's proposal. I don't know if he has measured performance impact > on that. > > Which one, vapi+swig? > I am not sure what the complete solution looks like. > I believe vapi needs to be on VPP machine, > but CSIT Python stuff (for swig) is on a separate machine. > I do not see swig having any transport capabilities, > so we would need to insert them (socket, file transfer, something else) > somewhere, and I do not see a good place. > > > 6) The add 4 million routes that CSIT uses, generates routes according > to a set of parameters. > > We could add a small plugin on the VPP side that exposes an API of > the sort > > "create <n> routes from <prefix>". > > Subtypes: > 6a) Official plugin in VPP repo, needs a maintainer. > 6b) Plugin outside VPP repo (perhaps in CSIT). Needs to be built and > deployed. > > > Is it the serialisation, or the message RTT or both? > > If serialisation, doing it in C should help. > > If it's the RTT, larger messages is one option. > > There are three known bottlenecks. Two for RTT, one for serialization. > I do not recall the real numbers, > but for the sake of discussion, you can assume > removing bottlenecks in the following order, > each bottleneck removed halves the configuration time. > > First bottleneck: Sync commands. > Solution: Send multiple commands before reading replies. > No support from VPP needed, maybe except reporting > how big the last message sent was (to avoid UDS buffers getting full). > > Second bottleneck: Background threads VPP PAPI code uses > to read replies asynchronously from socket, deserialize them, > and put to a queue for user to read. > Solution: Hack vpp_transport_socket.VppTransport after connect > to stop message_thread and read from socket directly. > Or use a different VppTransport implementation > that does not start the thread (nor multiprocessing queues) in first place. > 26019 does the former, 31920 does the latter. > This bottleneck is the primary reason I started this conversation. > > Third bottleneck: Serialization of many commands (and deserialization of > responses). > I do not recall if the slower part is VPP PAPI code (vpp_serializer) > or CSIT preparing arguments to serialize. Say, it is both. > Many solutions to this one, important from CSIT reviewers point of view, > but not that important for VPP developers (PAPI or otherwise). > > Fourth bottleneck: SSH forwarding volume of data between > socket endpoints on different machines. > Solution: Forward just a single command, and have a utility/plugin/whatever > on VPP machine to execute the implied bulk work quickly. > This sidesteps all the previous bottlenecks, > so it is the secondary reason for this conversation. > > Personally, I do not like putting CSIT utilities on VPP machine, > but sometimes it is the least evil (e.g. we do that for reading stats > segment). > > 1a) with VAT1 is a fourth bottleneck solution, > that is why we still use it. > 6a) is also a fourth bottleneck solution, but with stable API. > > 26019 contains solutions for first, second and third bottleneck. > 31920 is a solution for second bottleneck, assuming first bottleneck > is solved as in 26019 and for third bottleneck we use 26019 or something > else. > > 1b) solves (avoids) first and second bottleneck, > but it has its own third bottleneck. > Luckily, many binary data serialization solutions > could be adapted for json data serialization. > > Vratko. > > -----Original Message----- > From: otr...@employees.org <otr...@employees.org> > Sent: Thursday, 2021-February-04 08:47 > To: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) < > vrpo...@cisco.com> > Cc: Paul Vinciguerra <pvi...@vinciconsulting.com>; Peter Mikus -X (pmikus > - PANTHEON TECH SRO at Cisco) <pmi...@cisco.com>; csit-...@lists.fd.io; > vpp-dev@lists.fd.io > Subject: Re: [csit-dev] Faster PAPI > > > 1. Keep the status quo. > > That means using VAT for some operations (e.g. adding multiple routes > [5]), > > and creating "exec scripts" [6] for operations without VAT one-liner. > > Pros: No work needed, good speed, old VPP versions are supported. > > Cons: Relying on VAT functionality (outside API compatibility rules). > > 1b) Pre-generate a JSON file with all commands and load that into VPP with > VAT2. > 1c) Use a debug CLI script file and "exec" > > > 2. Support "vector operations" in VPP via binary API. > > This will probably need a new VPP plugin to host the implementations. > > Pros: Fast speed, small CSIT work, guarded by API compatibility rules. > > Cons: New VPP plugin of questionable usefulness outside CSIT, > > plugin maintainer needed, old VPP versions not supported. > > Could you ellaborate on this idea? > > > 3. VPP PAPI improvements only. > > No changes to VPP API, just changes to PAPI to allow better speed for > socket interactions. > > CSIT would need a fast way to synthetize binary messages. > > Pros: Small VPP work, good speed, only "official" VPP API is used. > > Cons: Brittle CSIT message handling, old VPP versions not supported. > > What would this be? > E.g. we could do serializations in C with a small Python wrapper. > > > 4. CSIT hacks only (Gerrit 26019). > > No changes to VPP API nor PAPI. CSIT code messes with PAPI internals. > > CSIT needs a fast way to synthetize binary messages. > > Pros: Code is ready, good speed, old VPP versions are supported. > > Cons: Brittle CSIT message handling, risky with respect to VPP PAPI > changes. > > This is the idea explained below, where you serialize message once and > replays right? > In addition to tweaking the reader thread etc? > > 5) Paul's proposal. I don't know if he has measured performance impact on > that. > > 6) The add 4 million routes that CSIT uses, generates routes according to > a set of parameters. > We could add a small plugin on the VPP side that exposes an API of the > sort > "create <n> routes from <prefix>". > > Is it the serialisation, or the message RTT or both? > If serialisation, doing it in C should help. > If it's the RTT, larger messages is one option. > Uploading all messages to VPP and then ask VPP to process them, receiving > a single reply is also an option. > Is this your "vector" idea? > > > The open questions: > > Do you see any other options? > > Did I miss some important pros or cons? > > Which option do you prefer? > > The lowest hanging fruit is likely 6. But longer term I'd prefer a more > generic solution. > > Best regards, > Ole > > > [2] https://lists.fd.io/g/vpp-dev/topic/78362835#18092 > > [3] https://gerrit.fd.io/r/c/csit/+/26019/140 > > [4] > https://gerrit.fd.io/r/c/csit/+/26019/140#message-314d168d8951b539e588e644a875624f5ca3fb77 > > [5] > https://github.com/FDio/csit/blob/b5073afc4a941ea33ce874e016fe86384ae7a60d/resources/templates/vat/vpp_route_add.vat > > [6] > https://github.com/FDio/csit/blob/b5073afc4a941ea33ce874e016fe86384ae7a60d/resources/libraries/python/TestConfig.py#L121-L150 > > > > From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Vratko > Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io > > Sent: Thursday, 2020-May-07 18:35 > > To: vpp-dev@lists.fd.io > > Cc: csit-...@lists.fd.io > > Subject: [vpp-dev] Faster PAPI > > > > Hello people interested in PAPI (VPP's Python API client library). > > > > In CSIT, our tests are using PAPI to interact with VPP. > > We are using socket transport (instead of shared memory transport), > > as VPP is running on machines separate from machines running the tests. > > We use SSH to forward the socket between the machines. > > > > Some of our scale tests need to send high number of commands towards VPP. > > The largest test sends 4 million commands (ip_route_add_del with ip6 > addresses). > > You can imagine that can take a while. > > Even using PAPI in asynchronous mode, it takes tens of minutes per > million commands. > > > > I was able to speed that up considerably, just by changing code on CSIT > side. > > The exact code change is [0], but that may be hard to review. > > Gerrit does not even recognize the new PapiSocketExecutor.py > > to be an edited copy of the old PapiExecutor.py file. > > > > That code relies on the fact that Python is quite permissive language, > > not really distinguishing private fields and methods from public ones. > > So the current code is vulnerable to refactors of VPP PAPI code. > > Also, pylint (static code analysis tool CSIT uses) is complaining. > > > > The proper fix is to change the VPP PAPI code, > > so that it exposes the inner parts the new CSIT code needs to access > > (or some abstractions of them). > > > > For that I have created [1], which shows the changed VPP PAPI code. > > Commit message contains a simplified example of how the new features can > be used. > > > > The changed VPP code allows three performance improvements. > > > > 1. Capturing raw bytes sent. > > For complicated commands, many CPU cycles are spent serializing > > command arguments (basically nested python dicts) into bytes. > > If user (CSIT code) has access to the message as serialized by PAPI (VPP > code), > > the user can choose a faster method to create subsequent serialized data. > > Implementing this on CSIT side improved the speed, but not greatly > enough. > > (See bytes_template.py in [0] for the faster data generator.) > > The VPP code [1] introduces fields remember_sent and last_sent. > > > > 2. Reading replies without de-serializing them. > > This was already possible by calling transport.q.get(), > > but had next to no effect on PAPI speed. > > Replies are usually short, so deserialization does not take too many > cycles. > > > > 3. Disabling the background reader thread. > > By default, socket transport creates (upon connect) a background thread, > > which select()s on the socket, reads any messages, > > and put()s them to transport.q (multiprocessing.Queue). > > I am not sure whether it is the multithreading (waiting for Python > interpreter > > to switch between threads), or Queue (locks, its own reader thread), > > but overall this was the remaining bottleneck. > > The VPP code exposes public methods for stopping and starting the thread. > > > > Back to point 2: > > With the reading thread stopped, transport.q is not filled, > > so another way to read the reply is needed. > > The VPP code contained transport._read(), > > the underscore hinting this is an internal method > > (leading to the abovementioned pylint complaints). > > The VPP change [1] renames that method to read_message(), > > adding a docstring explaining it has to be used > > when the reading thread is stopped. > > > > Finally, with all 3 improvements, CSIT will be able > > to execute million PAPI commands in around 15 seconds. > > > > Even if something like [1] is merged to VPP, > > CSIT will still use [0] for some time, > > so we are able to test older VPP versions. > > > > So, any comments on [1], or other ideas > > on what changes are needed on VPP side > > so users can achieve good PAPI speed using public PAPI methods? > > > > Vratko. > > > > [0] https://gerrit.fd.io/r/c/csit/+/26019/108 > > [1] https://gerrit.fd.io/r/c/vpp/+/26946/1 > > > > > > > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#19168): https://lists.fd.io/g/vpp-dev/message/19168 Mute This Topic: https://lists.fd.io/mt/80375128/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-