I've tracked this back to the following commit:

commit fa8b9b971453e960062a7e677bb09a7849e59744
Author: Greg Farnum <gr...@hq.newdream.net>
Date:   Fri Apr 2 13:14:12 2010 -0700
    rgw: convert + to space in url_decode

diff --git a/src/rgw/rgw_common.cc b/src/rgw/rgw_common.cc
index 6330fe2..da9debc 100644
--- a/src/rgw/rgw_common.cc
+++ b/src/rgw/rgw_common.cc
@@ -122,7 +122,12 @@ bool url_decode(string& src_str, string& dest_str)

   while (*src) {
     if (*src != '%') {
-      dest[pos++] = *src++;
+      if (*src != '+') {
+       dest[pos++] = *src++;
+      } else {
+       dest[pos++] = ' ';
+       ++src;
+      }
     } else {
       src++;
       char c1 = hex_to_num(*src++);


Though, I'm not sure why this was implemented. I would guess that this function needs to deal with URL parameters as well as file paths, but I don't understand the code enough to tell.
On 6/30/2014 5:41 PM, Brian Rak wrote:
Just for reference, I've opened http://tracker.ceph.com/issues/8702

On 6/26/2014 10:18 PM, Brian Rak wrote:
My current workaround plan is to just upload both versions of the file... I think this is probably the simplest solution with the least possibility of breaking later on.
On 6/26/2014 6:35 PM, Craig Lewis wrote:
Note that wget did URL encode the space ("test file" became "test%20file"), because it knows that a space is never valid. It can't know if you meant an actual plus, or a encoded space in "test+file", so it left it alone.
I will say that I would prefer that the + be left alone.  If I have 
a static "test+file", Apache will serve that static file correctly.


How badly do you need this to work, right now? If you need it now, I can suggest a work around. This is dirty hack, and I'm not saying it's a good idea. It's more of a thought exercise.
A quick google indicates that mod_rewrite might help: 
http://stackoverflow.com/questions/459667/how-to-encode-special-characters-using-mod-rewrite-apache 
.
But that might make the problem worse for other characters... If it 
does, I'm sure I could get it working by installing an Apache hook. 
 Off the top of my head, I'd try a hook in 
http://perl.apache.org/docs/2.0/user/handlers/http.html#PerlFixupHandler to 
replace all + characters with the correct escape sequence, %2B. I 
know mod_python can hook into Apache too.  I don't know if nginx has 
a similar capability.

As with all dirty hacks, if you actually implement it, you'll want to watch the release notes. Once you work around a bug, someone will fix the bug and break your hack.



On Thu, Jun 26, 2014 at 8:54 AM, Brian Rak <b...@gameservers.com <mailto:b...@gameservers.com>> wrote:
    Going back to my first post, I linked to this
    http://stackoverflow.com/questions/1005676/urls-and-plus-signs

    Per the defintion of application/x-www-form-urlencoded:
    http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1

    "Control names and values are escaped. Space characters are
    replaced by`+', and then reserved characters are escaped as
    described in[RFC1738]
    <http://www.w3.org/TR/html401/references.html#ref-RFC1738>,"

    The whole +=space thing is only for the query portion of the
    URL, not the filename.

    I've done some testing with nginx, and this is how it behaves:

    On the server, somewhere in the webroot:

    echo space > "test file"

    Then, from a client:
    $ wget --spider "http://example.com/test/test file"
    <http://example.com/test/testfile>

    Spider mode enabled. Check if remote file exists.
    --2014-06-26 11:46:54-- http://example.com/test/test%20file
    Connecting to example.com:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 6 [application/octet-stream]
    Remote file exists.

    $ wget --spider "http://example.com/test/test+file";
    <http://example.com/test/test+file>

    Spider mode enabled. Check if remote file exists.
    --2014-06-26 11:46:57-- http://example.com/test/test+file
    Connecting to example.com:80... connected.
    HTTP request sent, awaiting response... 404 Not Found

    Remote file does not exist -- broken link!!!

    These tests were done just with the standard filesystem.  I
    wasn't using radosgw for this.  Feel free to repeat with the web
    server of your choice, you'll find the same thing happens.

    URL decoding the path is not the correct behavior.



    On 6/26/2014 11:36 AM, Sylvain Munaut wrote:
    Hi,

    Based on the debug log, radosgw is definitely the software that's
    incorrectly parsing the URL.  For example:


    2014-06-25 17:30:37.383134 7f7c6cfa9700 20
    REQUEST_URI=/ubuntu/pool/main/a/adduser/adduser_3.113+nmu3ubuntu3_all.deb
    2014-06-25 17:30:37.383199 7f7c6cfa9700 10
    s->object=ubuntu/pool/main/a/adduser/adduser_3.113 nmu3ubuntu3_all.deb
    s->bucket=ubuntu

    I'll dig into this some more, but it definitely looks like radosgw is the
    one that's unencoding the + character here.  How else would it be receiving
    the request_uri with the + in it, but then a little bit later the request
    has a space in it instead?
    Note that AFAIK, in fastcgi, REQUEST_URI is _supposed_ to be an URL
    encoded version and should be URL-decoded by the fastcgi handler. So
    converting the + to ' ' seems valid to me.


    Cheers,

        Sylvain

    _______________________________________________
    ceph-users mailing list
    ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to