String concatenate, format, comparison, indexing a dictionary*, all work with mixed types, unless the conversion from unicode to byte string can't be done with the current codec when forced to bytestring, in which case you get a suitable exception.
Besides, this wouldn't explain why the value of subdir_path is coming up u'http' when his url is ...?subdir=public_html . (Officially, attribute names, though used to do lookups in __dict__, are not permitted to have non-ASCII characters, according to the PEP that Karen often quotes.) On Wed, Apr 7, 2010 at 2:56 PM, Daniel Roseman <dan...@roseman.org.uk> wrote: > On Apr 7, 9:40 am, Alexey Vlasov <ren...@renton.name> wrote: >> Hi. >> >> There's a simple code in urls.py: >> ============== >> def ls (request): >> import os >> >> out_html = '' >> home_path = '/home/www/test-django' >> # subdir_path = request.GET.get ('subdir') >> subdir_path = 'public_html' >> >> for root, dirs, files in os.walk (os.path.join (home_path, subdir_path)): >> out_html += "%s<br/>\n" % root >> >> return HttpResponse (out_html) >> ============== >> >> There's a catalogue in "home_path/subdir_path" which name >> includes cyrillic symbols ( ): >> $ pwd >> /home/www/test-django/public_html >> $ ls -la >> drwx---r-x 4 test-django test-django 111 Apr 6 20:26 . >> drwx--x--- 13 test-django test-django 4096 Apr 6 20:26 .. >> -rw-r--r-- 1 test-django test-django 201 Apr 6 17:43 .htaccess >> -rwxr-xr-x 1 test-django test-django 911 Apr 6 16:38 index.fcgi >> lrwxrwxrwx 1 test-django test-django 66 Mar 28 17:34 media -> ../ >> python/lib64/python2.5/site-packages/django/contrib/admin/media >> drwxr-xr-x 2 test-django test-django 6 Apr 6 15:48 >> >> My code works correct, here's the result: >> $ curl -shttp://test-django.example.com/ls/ >> /home/www/test-django/public_html <br/> >> /home/www/test-django/public_html/ <br/> >> >> But if I change "subdir_path = 'public_html'" to >> "subdir_path = request.GET.get ('subdir')" then the request: >> $ curl -shttp://test-django.example.com/ls/\?subdir=public_html >> leads to an error: >> >> Request Method: GET >> Request URL: http:// test-django.example.com/ls/ >> Django Version: 1.0.2 final >> Python Version: 2.5.2 >> Installed Applications: >> ['django.contrib.auth', >> 'django.contrib.contenttypes', >> 'django.contrib.sessions', >> 'django.contrib.sites'] >> Installed Middleware: >> ('django.middleware.common.CommonMiddleware', >> 'django.contrib.sessions.middleware.SessionMiddleware', >> 'django.contrib.auth.middleware.AuthenticationMiddleware') >> >> Traceback: >> File "/home/www/test-django/python/lib64/python2.5/ >> site-packages/django/core/handlers/base.py" in get_response >> 86. response = callback(request, *callback_args, >> **callback_kwargs) >> File "/home/www/test-django/django/demo/urls.py" in ls >> 40. for root, dirs, files in os.walk (os.path.join (home_path, >> subdir_path)): >> File "/usr/lib64/python2.5/os.py" in walk >> 293. if isdir(join(top, name)): >> File "/usr/lib64/python2.5/posixpath.py" in isdir >> 195. st = os.stat(path) >> >> Exception Type: UnicodeEncodeError at /ls/ >> Exception Value: 'ascii' codec can't encode characters in position >> 45-48: ordinal not in range(128) >> >> I don't understand it why "subdir_path" getting the same very value in one >> case works perfectly and in the >> +other fails. >> >> Django runs following the instuctions >> +http://docs.djangoproject.com/en/dev/howto/deployment/fastcgi/#runnin... >> +h-apache >> >> -- >> BRGDS. Alexey Vlasov. > > I think I know the reason for the difference in hard-coding the string > vs getting it from request.GET. > > Django always uses unicode strings internally, and this includes GET > parameters. So your 'public_html' string is actually being converted > to u'public_html', as you can see if you print the contents of > request.GET. But your hard-coded string is a bytestring. If you used > the unicode version - subdir_path = u'public_html' - you would see > the same result as with the GET version. > > As to why this is causing a problem when combined with os.walk and > os.path.join, this is because of the rather strange behaviour of the > functions in the os module. If you pass a unicode path parameter to > them, they return results in unicode. But if you pass a bytestring > parameter, the results are bytestrings. And since you have not > declared a particular encoding, Python assumes it is ascii - and of > course your Cyrillic filenames are not valid in ASCII. > > The problem should go away if you are careful to define *all* your > strings as unicode - it is the mixture of unicode and bytestrings that > is causing the problem. This means: > out_html = u'' > home_path = u'/home/www/test-django' > ... > out_html += u"%s<br/>\n" % root > > -- > DR > > -- > You received this message because you are subscribed to the Google Groups > "Django users" group. > To post to this group, send email to django-us...@googlegroups.com. > To unsubscribe from this group, send email to > django-users+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/django-users?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-us...@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.