Xah Lee wrote:
> Thanks. Is it true that any unicode chars can also be used inside regex
> literally?
>
> e.g.
> re.search(ur'â+',mystring,re.U)
>
> I tested this case and apparently i can.
Yes. In fact, when you write u"\u2003" or u"â" doesn't matter
to re.search. Either way you get a Unicode
Xah Lee wrote:
> how to represent the unicode "em space" in regex?
You will have to pass a Unicode literal as the regular expression,
e.g.
fracture=re.split(u'\u2003*\\|\u2003*',myline,re.U)
Notice that, in raw Unicode literals, you can still use \u to
escape characters, e.g.
fracture=re.spli
Steve Horsley wrote:
It is my understanding that the BOM (U+feff) is actually the Unicode
character "Non-breaking zero-width space".
My understanding is that this used to be the case. According to
http://www.unicode.org/faq/utf_bom.html#38
the application should now specify specific processing,
Mike Brown wrote:
Very strange how it only shows up after the 1st import attempt seems to
succeed, and it doesn't ever show up if I run the code directly or run the
code in the command-line interpreter.
The reason for that is that the Python byte code stores the Unicode
literal in UTF-8. The firs
Francis Girard wrote:
Well, no text files can't be concatenated ! Sooner or later, someone will use
"cat" on the text files your application did generate. That will be a lot of
fun for the new unicode aware "super-cat".
Well, no. For example, Python source code is not typically concatenated,
nor
Francis Girard wrote:
If I understand well, into the UTF-8 unicode binary representation, some
systems add at the beginning of the file a BOM mark (Windows?), some don't.
(Linux?). Therefore, the exact same text encoded in the same UTF-8 will
result in two different binary files, and of a slightl
Kent Johnson wrote:
Could this be handled with a try / except in unicode()? Something like
this:
Perhaps. However, this would cause a significant performance hit, and
possbibly undesired side effects. So due process would require that the
interface of __unicode__ first, and then change the actual
Steven Bethard wrote:
Yeah, I agree it's weird. I suspect if someone supplied a patch for
this behavior it would be accepted -- I don't think this should break
backwards compatibility (much).
Notice that the "right" thing to do would be to pass encoding and errors
to __unicode__. If the string o
[EMAIL PROTECTED] wrote:
Do you know this for a fact?
I'm going by newsgroup messages from around the time that I was
proposing to put together a standard block cipher module for Python.
Ah, newsgroup messages. Anybody could respond, whether they have insight
or not.
The PSF does comply with the
Luis P. Mendes wrote:
From your experience, do you think that if this wrong XML code could be
meant to be read only by somekind of Microsoft parser, the error will
not occur?
This is very unlikely. MSXML would never do this incorrectly.
Regards,
Martin
--
http://mail.python.org/mailman/listinfo/py
Ricardo Bugalho wrote:
thanks for the information. But what I was really looking for was
informaion on when and why Python started doing it (previously, it always
used sys.getdefaultencoding())) and why it was done only for 'print' when
stdout is a terminal instead of always.
It does that since 2.
Torsten Mohr wrote:
Geht sowas auch in Python?
Nicht direkt. Es ist Ãblich, dass Funktionen, die Ergebnisse
(RÃckgabewerte) liefern, dies mittels return tun:
def vokale(string):
result = [c for c in string if c in "aeiou"]
return "".join(result)
x = "Hallo, Welt"
x = vokale(x)
Falls man meh
12 matches
Mail list logo