I've also observed unexplained latency when it comes to transformations, I
think it's time that we dig into this more. The reason we're observing an
increase in latency without a corresponding increase in CPU load is because
TS simply isn't doing anything, it appears that it's just rescheduling
transformations in certain situations.

Does anyone have cycles to investigate?

On Wed, Mar 11, 2015 at 12:31 AM, Brian Rectanus <brect...@gmail.com> wrote:

> All,
>
> I am looking for advice on tuning performance of a plugin. As some may
> know, I have a plugin for trafficserver (using 4.2.2 w/hwloc) that does a
> lot of inspection of the http traffic (github/ironbee). As such, it can
> introduce a fair amount of latency due to what should be high CPU usage
> parsing, normalizing and looking for various patterns in the HTTP. This is
> what I expect to see at least, but that is not how the server is acting.
>
> What I am seeing:
>
> * Without plugin loaded I see great performance and machine basically idle
> (4% cpu or so) - I am using Ixia's ixLoad to generate a very consistent
> load.
>
> * With plugin loaded and fully configured, the machine is slightly more
> than idle (12% cpu or so), but transaction per sec drops by 25x.
>
> * Default threads settings of 1.5 * cores seems to be very poor setting
> (50x slower). Setting manually obscenely higher threads (200) works a bit
> better, but best setting is 2 threads (1 is bad, 3 is bad, but 2 works some
> 7-10 times faster). Using accept threads is also very poor.
>
> Best balance (far better than others) is:
>
> CONFIG proxy.config.exec_thread.autoconfig INT 0
> CONFIG proxy.config.exec_thread.autoconfig.scale FLOAT 1.5
> CONFIG proxy.config.exec_thread.limit INT 2
> CONFIG proxy.config.accept_threads INT 0
> CONFIG proxy.config.exec_thread.affinity INT 2
> CONFIG proxy.config.task_threads INT 3
>
> Everything else is pretty much default - caching is disabled. The above is
> about 15x faster (in tx/s) than the default settings.
>
> * Profiling (perf and Zoom profiler) with the 2 thread max setting shows
> that two threads are active, one far more than the other
>
> * Profiling with the 1.5 x cores (e.g., 12 in this case as there are 8
> cores) shows 4-5 threads active, but far less active than with the 2 core
> max setting - most threads are always idle
>
> * First thought was blocking and lock contention, but there does not seem
> to be any seen with the profiler.
>
> * Next thought was malloc() speed issues, so tried jemalloc (and tcmalloc)
> which helps slightly, but not much (we use memory pools, so much is
> pre-allocated anyhow)
>
> Attached a screenshot of the profiler timeline, but not sure it will come
> through on the list. The plugin does not block, but should be using lots of
> CPU for parsing, running regex, etc. It also uses a lot of extra RAM for
> normalizing HTTP, etc. However I am not seeing high CPU nor am I seeing
> high RAM usage. It is like it just cannot get CPU, but the system is idle -
> more threads I add, the less it gets CPU as if the extra accounting is
> getting in the way.
>
> * I expect high CPU utilization, but the machine is mostly idle.
> * I expect all the cores (8 of them) to get used, but really only 1-2 are
> somewhat used.
> * I expect the threads to be saturated with work, but they are mostly idle.
>
> Any ideas why the complete lack of CPU/thread utilization?
>
> Any ideas what to look at?
>
> Any ideas what I can enable (tools I could use) to see more insight into
> what is happening?
>
> Cheers!
> -B
>
> --
> Brian Rectanus
>

Reply via email to