I did some research on encodings as a follow up.  It seems that both '+'
and '%20' are considered valid encodings for spaces.  There are several
sources for this information, here are a few:

http://www.w3schools.com/tags/ref_urlencode.asp
http://stackoverflow.com/questions/1634271/url-encoding-the-space-character-or-20
http://en.wikipedia.org/wiki/Percent-encoding

Given this information, I modified MHD's 'MHD_http_unescape' function to
accept the '+' sign as a space, and it worked as expected.  It was just an
additional 'case' at the top of the switch (see below).  5 lines of code.
It could be shortened a bit if the 'default' clause is changed (so the
wpos++ and rpos++ were outside the switch), but I didn't want to be
presumptuous.

In 'internal.c':
----------------------------------
size_t
MHD_http_unescape (void *cls,
                   struct MHD_Connection *connection,
                   char *val)
{
  char *rpos = val;
  char *wpos = val;
  char *end;
  unsigned int num;
  char buf3[3];

  while ('\0' != *rpos)
    {
      switch (*rpos)
        {
        case '+':
          *wpos = ' ';
          wpos++;
          rpos++;
          break;
        case '%':
          if ( ('\0' == rpos[1]) ||
               ('\0' == rpos[2]) )
          {
            *wpos = '\0';
            return wpos - val;
          }
          buf3[0] = rpos[1];
          ....
----------------------------------


In  url enoding, +'s are encoded with "%2B", so this solution really should
just work all the time.  (i.e., it's not going to inadvertently remove a
'+').

That said, I'm not sure this is the correct solution.  Thoughts/comments?
Worthwhile addition to MHD, or is this wrong for some reason?

I can't think of why this would be a bad thing to include, but I'm
certainly open to other ideas and/or just not using MHD's post processor at
all.


Ken



On Wed, Sep 17, 2014 at 8:44 AM, Kenneth Mastro <[email protected]>
wrote:

> All,
>
> I'm using MHD's post-processor to process form data and several AJAX
> requests.  I have noticed that when the encoding is
> 'application/x-www-form-urlencoded', strings with spaces contain a '+' sign
> instead of the spaces.
>
> For form data, if I explicitly set the encoding to 'multipart/form-data',
> the strings are parsed properly and there are no '+'s, which is how I've
> been getting around the problem (I assumed I was doing something wrong and
> haven't had time to dig into it).  However, this isn't working for my AJAX
> requests - setting the encoding to 'multipart/form-data' breaks things in
> ways I haven't fully investigated, yet.  I consider that a hack anyway, so
> I don't really want to pursue it.  I need to figure out why
> 'application/x-www-form-urlencoded' isn't working for me.
>
> In looking at the 'Content-Type' the server is receiving for the AJAX
> requests, it is 'application/x-www-form-urlencoded; charset=UTF-8'.  I
> thought the charset might be causing an issue, but I'm having trouble
> getting jQuery to not use UTF-8.  From the jQuery ajax page: "The W3C
> XMLHttpRequest specification dictates that the charset is always UTF-8;
> specifying another charset will not force the browser to change the
> encoding."  I.e., I'm stuck with UTF-8 because it's the standard, which I'm
> fine with.  Regardless, MHD successfully creates the post processor, so
> it's seeing the actual base encoding (this works because it only compares
> the first chunk of chars of the content type - essentially ignoring the
> charset part).
>
> MHD does not seem to provide an option for REPLACING a header (i.e., using
> MHD_set_connection_value only ADDS a header - it won't replace the existing
> Content-Type header), so even if I actually could be sure the data was
> ASCII, I can't fix this in the server without doing my own POST
> processing.  I doubt that would work anyway unless I could get the web page
> / browser to not do UTF-8 somehow.  (Although I think ASCII is a subset of
> UTF-8, maybe there are differences even in those low-numbered characters
> I'm not aware of?)
>
> Anyway - In short - my question is: Is the MHD post processor just failing
> on 'application/x-www-form-urlencoded' data?  I.e., it's not parsing out
> the +'s when it should?  Or, does MHD not work with UTF-8 encoded data
> (despite the all the characters being in the ASCII range) and I need to do
> my own POST processing?  Or, does this actually work and I'm just doing
> something wrong?
>
>
> Thanks much,
> Ken
>
>

Reply via email to