Hi Willy -

Apologies if this comes through multiple times; I'm having some mail
difficulties.

On Tue, Apr 7, 2015 at 7:11 PM, Willy Tarreau <[email protected]> wrote:
> Just a question, did you find any benefit in doing it with a new tag
> compared to %[path] ? It may just be a matter of convenience, I'm just
> wondering.

When I attempt to use %[path] in a log-format directive, I get the
following errors:

  parsing [/etc/haproxy/haproxy.cfg:83] : 'log-format' : sample fetch
<path> may not be reliably used here because it needs 'HTTP request
headers' which is not available here.

Limited testing confirms that the path is not printed in the log when
%[path] is included in log-format either. Perhaps I'm missing a
configuration option that allows the use of that sample fetch in the
logs?

> Other comments on the code below :
>
>> +char* strip_uri_params(char *str) {
>> +     int spaces = 0;
>> +     int end = strlen(str);
>> +
>> +     int i;
>> +     int path_end = end;
>> +     for (i = 0; i < end; i++) {
>> +             if (str[i] == ' ' && spaces == 0) {
>> +                     spaces++;
>> +             } else if (str[i] == '?' || (str[i] == ' ' && spaces > 0)) {
>> +                     path_end = i;
>> +                     break;
>> +             }
>> +     }
>
> What's the purpose of counting spaces to stop at the second one ? You
> cannot not have any in the path, so I'm a bit puzzled.

It does seem counter-intuitive, but the uri variable at src/log.c#1546
is actually the
full HTTP request line. Consider the following requests:

  GET /foo HTTP/1.1
  GET /foo?bar HTTP/1.1

They should both produce an identical path (as the only difference is
the query string), but they will not unless we strip the line at either
the first question mark *or* the second space. Stripping at only the
first question mark would produce logs like:

  GET /foo HTTP/1.1
  GET /foo

...which is undesirable.


> Here I don't understand, you seem to be doing malloc+strncpy+strlen+free
> just to get an integer which is the position of the question mark in the
> chain that is determined very earély in your strip function, is that it ?
> If so, that's totally overkill and not even logical!
>
> Thanks,
> Willy
>

You're right - this patch was not the most efficient or elegant. Chalk
it up to not fully understanding the logging system, and my general lack
of knowledge in C. I've included another patch that I think solves the
problem much more elegantly, building on what you suggested.

I've also attached an updated patch file in case my mail client messes
up tabs/spaces or something.

Thank you so much!

- Andrew Hayworth

>From 8cda7475e6b456636f61c48c2132ecf32f4c23b1 Mon Sep 17 00:00:00 2001
From: Andrew Hayworth <[email protected]>
Date: Tue, 7 Apr 2015 21:42:53 +0000
Subject: [PATCH] Add a new log format variable "%p" that spits out the
 sanitized request path

It's often undesirable to log query params - and in some cases, it can
create legal compliance problems. This commit adds a new log format
variable that logs the HTTP verb and the path requested sans query
string (and additionally ommitting the protocol). For example, the
following HTTP request line:

  GET /foo?bar=baz HTTP/1.1

becomes:

  GET /foo

with this log format variable.
---
 include/types/log.h |  1 +
 src/log.c           | 24 ++++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/include/types/log.h b/include/types/log.h
index c7e47ea..3205ce6 100644
--- a/include/types/log.h
+++ b/include/types/log.h
@@ -90,6 +90,7 @@ enum {
        LOG_FMT_HDRREQUESTLIST,
        LOG_FMT_HDRRESPONSLIST,
        LOG_FMT_REQ,
+       LOG_FMT_PATH,
        LOG_FMT_HOSTNAME,
        LOG_FMT_UNIQUEID,
        LOG_FMT_SSL_CIPHER,
diff --git a/src/log.c b/src/log.c
index 1a5ad25..af89e00 100644
--- a/src/log.c
+++ b/src/log.c
@@ -108,6 +108,7 @@ static const struct logformat_type logformat_keywords[] = {
        { "hs", LOG_FMT_HDRRESPONS, PR_MODE_TCP, LW_RSPHDR, NULL },
/* header response */
        { "hsl", LOG_FMT_HDRRESPONSLIST, PR_MODE_TCP, LW_RSPHDR, NULL
},  /* header response list */
        { "ms", LOG_FMT_MS, PR_MODE_TCP, LW_INIT, NULL },       /*
accept date millisecond */
+       { "p", LOG_FMT_PATH, PR_MODE_HTTP, LW_REQ, NULL },  /* path */
        { "pid", LOG_FMT_PID, PR_MODE_TCP, LW_INIT, NULL }, /* log pid */
        { "r", LOG_FMT_REQ, PR_MODE_HTTP, LW_REQ, NULL },  /* request */
        { "rc", LOG_FMT_RETRIES, PR_MODE_TCP, LW_BYTES, NULL },  /* retries */
@@ -1539,6 +1540,29 @@ int build_logline(struct session *s, char *dst,
size_t maxsize, struct list *lis


                                last_isspace = 0;
                                break;

+                       case LOG_FMT_PATH: // %p
+                               if (tmp->options & LOG_OPT_QUOTE)
+                                       LOGCHAR('"');
+                               uri = txn->uri ? txn->uri : "<BADREQ>";
+                               ret = encode_string(tmplog, dst + maxsize,
+                                                      '#',
url_encode_map, uri);
+                               if (ret == NULL || *ret != '\0')
+                                       goto out;
+
+                               // Cut off request line at first
occurrence of '?' which signals the beginning of
+                               // request params (and end of path).
If no params are present, cut off at last space
+                               // which otherwise signals the end of the path.
+                               uri = strchr(tmplog, '?');
+                               if (uri == NULL) {
+                                       uri = strrchr(tmplog, ' ');
+                               }
+                               tmplog = uri ? uri : ret;
+
+                               if (tmp->options & LOG_OPT_QUOTE)
+                                       LOGCHAR('"');
+                               last_isspace = 0;
+                               break;
+
                        case LOG_FMT_PID: // %pid
                                if (tmp->options & LOG_OPT_HEXA) {
                                        iret = snprintf(tmplog, dst +
maxsize - tmplog, "%04X", pid);
--
2.1.3

Attachment: 0001-Add-a-new-log-format-variable-p-that-spits-out-the-s.patch
Description: Binary data

Reply via email to