I've tracked this back to the following commit:
commit fa8b9b971453e960062a7e677bb09a7849e59744
Author: Greg Farnum <gr...@hq.newdream.net>
Date: Fri Apr 2 13:14:12 2010 -0700
rgw: convert + to space in url_decode
diff --git a/src/rgw/rgw_common.cc b/src/rgw/rgw_common.cc
index 6330fe2..da9debc 100644
--- a/src/rgw/rgw_common.cc
+++ b/src/rgw/rgw_common.cc
@@ -122,7 +122,12 @@ bool url_decode(string& src_str, string& dest_str)
while (*src) {
if (*src != '%') {
- dest[pos++] = *src++;
+ if (*src != '+') {
+ dest[pos++] = *src++;
+ } else {
+ dest[pos++] = ' ';
+ ++src;
+ }
} else {
src++;
char c1 = hex_to_num(*src++);
Though, I'm not sure why this was implemented. I would guess that this
function needs to deal with URL parameters as well as file paths, but I
don't understand the code enough to tell.
On 6/30/2014 5:41 PM, Brian Rak wrote:
Just for reference, I've opened http://tracker.ceph.com/issues/8702
On 6/26/2014 10:18 PM, Brian Rak wrote:
My current workaround plan is to just upload both versions of the
file... I think this is probably the simplest solution with the least
possibility of breaking later on.
On 6/26/2014 6:35 PM, Craig Lewis wrote:
Note that wget did URL encode the space ("test file" became
"test%20file"), because it knows that a space is never valid. It
can't know if you meant an actual plus, or a encoded space in
"test+file", so it left it alone.
I will say that I would prefer that the + be left alone. If I have
a static "test+file", Apache will serve that static file correctly.
How badly do you need this to work, right now? If you need it now,
I can suggest a work around. This is dirty hack, and I'm not saying
it's a good idea. It's more of a thought exercise.
A quick google indicates that mod_rewrite might help:
http://stackoverflow.com/questions/459667/how-to-encode-special-characters-using-mod-rewrite-apache
.
But that might make the problem worse for other characters... If it
does, I'm sure I could get it working by installing an Apache hook.
Off the top of my head, I'd try a hook in
http://perl.apache.org/docs/2.0/user/handlers/http.html#PerlFixupHandler to
replace all + characters with the correct escape sequence, %2B. I
know mod_python can hook into Apache too. I don't know if nginx has
a similar capability.
As with all dirty hacks, if you actually implement it, you'll want
to watch the release notes. Once you work around a bug, someone
will fix the bug and break your hack.
On Thu, Jun 26, 2014 at 8:54 AM, Brian Rak <b...@gameservers.com
<mailto:b...@gameservers.com>> wrote:
Going back to my first post, I linked to this
http://stackoverflow.com/questions/1005676/urls-and-plus-signs
Per the defintion of application/x-www-form-urlencoded:
http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1
"Control names and values are escaped. Space characters are
replaced by`+', and then reserved characters are escaped as
described in[RFC1738]
<http://www.w3.org/TR/html401/references.html#ref-RFC1738>,"
The whole +=space thing is only for the query portion of the
URL, not the filename.
I've done some testing with nginx, and this is how it behaves:
On the server, somewhere in the webroot:
echo space > "test file"
Then, from a client:
$ wget --spider "http://example.com/test/test file"
<http://example.com/test/testfile>
Spider mode enabled. Check if remote file exists.
--2014-06-26 11:46:54-- http://example.com/test/test%20file
Connecting to example.com:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6 [application/octet-stream]
Remote file exists.
$ wget --spider "http://example.com/test/test+file";
<http://example.com/test/test+file>
Spider mode enabled. Check if remote file exists.
--2014-06-26 11:46:57-- http://example.com/test/test+file
Connecting to example.com:80... connected.
HTTP request sent, awaiting response... 404 Not Found
Remote file does not exist -- broken link!!!
These tests were done just with the standard filesystem. I
wasn't using radosgw for this. Feel free to repeat with the web
server of your choice, you'll find the same thing happens.
URL decoding the path is not the correct behavior.
On 6/26/2014 11:36 AM, Sylvain Munaut wrote:
Hi,
Based on the debug log, radosgw is definitely the software that's
incorrectly parsing the URL. For example:
2014-06-25 17:30:37.383134 7f7c6cfa9700 20
REQUEST_URI=/ubuntu/pool/main/a/adduser/adduser_3.113+nmu3ubuntu3_all.deb
2014-06-25 17:30:37.383199 7f7c6cfa9700 10
s->object=ubuntu/pool/main/a/adduser/adduser_3.113 nmu3ubuntu3_all.deb
s->bucket=ubuntu
I'll dig into this some more, but it definitely looks like radosgw is the
one that's unencoding the + character here. How else would it be receiving
the request_uri with the + in it, but then a little bit later the request
has a space in it instead?
Note that AFAIK, in fastcgi, REQUEST_URI is _supposed_ to be an URL
encoded version and should be URL-decoded by the fastcgi handler. So
converting the + to ' ' seems valid to me.
Cheers,
Sylvain
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com