Using "textwrap" package for unwrappable languages (Japanese)

2023-08-30 Thread c.buhtz--- via Python-list

Hi,

I do use "textwrap" package to wrap longer texts passages. Works well 
with English.
But the source string used is translated via gettext before it is 
wrapped.


Using languages like Japanese or Chinese would IMHO result in unwrapped 
text. Japanese rules do allow to break a line nearly where ever you 
want.


How can I handle it with "textwrap"?

At runtime I don't know which language is really used. So I'm not able 
to decide using "textwrap" or just inserting "\n" every 65 characters.


Another approach would be to let the translators handle the line breaks. 
But I would like to avoid it because some of them don't know what "\n" 
means and they don't know the length rule (in my case 65 characters).


Any ideas about it?

Kind
Christian
--
https://mail.python.org/mailman/listinfo/python-list


Re: Using "textwrap" package for unwrappable languages (Japanese)

2023-08-30 Thread Peter J. Holzer via Python-list
On 2023-08-30 11:32:02 +, c.buhtz--- via Python-list wrote:
> I do use "textwrap" package to wrap longer texts passages. Works well with
> English.
> But the source string used is translated via gettext before it is wrapped.
> 
> Using languages like Japanese or Chinese would IMHO result in unwrapped
> text. Japanese rules do allow to break a line nearly where ever you want.
> 
> How can I handle it with "textwrap"?
> 
> At runtime I don't know which language is really used. So I'm not able to
> decide using "textwrap" or just inserting "\n" every 65 characters.

I don't have a solution but want to add another caveat: Japanese
characters are usually double-width. So (unless your line length is 130
characters for English) you would want to add that line break every 32
characters. (unicodedata.east_asian_width() seems to be the canonical
name to find the width of a character, but it returns a code (like 'W'
or 'Na') not a number.)

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Using "textwrap" package for unwrappable languages (Japanese)

2023-08-30 Thread c.buhtz--- via Python-list

Dear Peter,

thanks for your reply. That is a very interesting topic.

I was a bit wrong. I realized that textwrap.wrap() do insert linebreaks 
when "words" are to long. So even a string without any blank space well 
get wrapped.


Am 30.08.2023 14:07 schrieb Peter J. Holzer via Python-list:

another caveat: Japanese
characters are usually double-width. So (unless your line length is 130
characters for English) you would want to add that line break every 32
characters.


I don't get your calculation here. Original line length is 130 but for 
"double-with" characters you would break at 32 instead of 65 ?



(


Then I will do something like this

unicodedata.east_asian_width(mystring[0])

W is "wide". But there is also "F" (full-width).
What is the difference between "wide" and "full-width"?

My application do support (currently 46) languages including Simplified 
and Traditional Chinese, Vietnamese, Korean, Japanese, Cyrylic.

--
https://mail.python.org/mailman/listinfo/python-list


Re: Using "textwrap" package for unwrappable languages (Japanese)

2023-08-30 Thread Peter J. Holzer via Python-list
On 2023-08-30 13:18:25 +, c.buhtz--- via Python-list wrote:
> Am 30.08.2023 14:07 schrieb Peter J. Holzer via Python-list:
> > another caveat: Japanese characters are usually double-width. So
> > (unless your line length is 130 characters for English) you would
> > want to add that line break every 32 characters.
> 
> I don't get your calculation here. Original line length is 130 but for
> "double-with" characters you would break at 32 instead of 65 ?

No, I wrote "*unless* your original line length was 130 characters".

I assumed that you want your line to be 65 latin characters wide since
this is what fits nicely on an A4 (or letter) page with a bit of a
margin on both sides. Or on an 80 character terminal screen or window.
And it's also generally considered to be a good line length for
readability.

But Asian "full width" or "wide" characters are twice as wide, so you
can fit only half as many in a single line. Hence 65 // 2 = 32.

But that was only my assumption. I considered it possible that you
started with 130 characters per line (many terminals back in the day had
a 132 character mode, and that's also approximately the line length in
landscape mode or when using a compressed typeface - so 132 is also a
common length limit, although rarely for text (too wide to read
comfortably) and more for code, tables, etc.), divided that by two and
arrived at 65 Japanese characters per line that way. So I mentioned that
to indicate that I had considered the possibility but concluded that it
probably wasn't what you meant.

(And as usual when I write a short sentence to clarify something
I wind up writing 4 paragraphs clarifying the clarification :-/)

> Then I will do something like this
> 
> unicodedata.east_asian_width(mystring[0])
> 
> W is "wide". But there is also "F" (full-width).
> What is the difference between "wide" and "full-width"?

I'm not an expert on Japanese typography by any means. But they have
some full width variants of latin characters and halfwidth variants of
katakana characters. I assume that the categories 'F' and 'H' are for
those, while "normal" Japanese characters are "W":

>>> unicodedata.east_asian_width("\N{DIGIT ONE}")
'Na'
>>> unicodedata.east_asian_width("\N{FULLWIDTH DIGIT ONE}")
'F'
>>> unicodedata.east_asian_width("\N{KATAKANA LETTER ME}")
'W'
>>> unicodedata.east_asian_width("\N{HALFWIDTH KATAKANA LETTER ME}")
'H'

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


f-string error message

2023-08-30 Thread Rob Cliffe via Python-list

I am currently using Python 3.11.4.
First I want to say: f-strings are great!  I use them all the time, 
mostly but by no means exclusively for debug messages.  And in 3.12 they 
will get even better.
And the improved error messages in Python (since 3.9) are great too!  
Keep up the good work.
However the following error message confused me for a while when it 
happened in real code:



import decimal
x=42
f"{x:3d}"

' 42'

x=decimal.Decimal('42')
f"{x:3d}"

Traceback (most recent call last):
  File "", line 1, in 
ValueError: invalid format string

I understand that this is an error: I'm telling the f-string to expect 
an integer when in fact I'm giving it a Decimal.

And indeed f"{x:3}" gives ' 42' whether x is an int or a Decimal.
However, to my mind it is not the format string that is invalid, but the 
value supplied to it.

Would it be possible to have a different error message, something like

ValueError: int expected in format string but decimal.Decimal found

Or am I missing something?
Best wishes
Rob Cliffe

--
https://mail.python.org/mailman/listinfo/python-list


Re: f-string error message

2023-08-30 Thread Random832 via Python-list
On Sun, Aug 27, 2023, at 17:19, Rob Cliffe via Python-list wrote:
> I understand that this is an error: I'm telling the f-string to expect 
> an integer when in fact I'm giving it a Decimal.
> And indeed f"{x:3}" gives ' 42' whether x is an int or a Decimal.
> However, to my mind it is not the format string that is invalid, but the 
> value supplied to it.
> Would it be possible to have a different error message, something like
>
> ValueError: int expected in format string but decimal.Decimal found
>
> Or am I missing something?

It's up to the type what format strings are valid for it, so you can't really 
go "int expected". However, a more detailed error string like "invalid format 
string '3d' for object Decimal('42')" might be useful.

right now we have some inconsistencies:
- float object [same for str, int, etc]
ValueError: Unknown format code 'd' for object of type 'float' [if it thinks 
it's identified a single-letter 'code' in the usual microlanguage]
ValueError: Invalid format specifier '???' for object of type '[type]'
- arbitrary object that doesn't override __format__, ipaddress
TypeError: unsupported format string passed to [type].__format__
- datetime, decimal
ValueError: Invalid format string

neither shows the value of the offending object, only its type. incidentally, 
ipaddress and object don't support the usual field width, padding, etc 
specifiers

[int supports code 'f' just fine, by the way, but has the same message as float 
if you give it 's']

Going beyond that, it *might* be viable to have some sort of "guess what 
numeric type the format string was intended for", shared across at least all 
numeric types of objects. Alternatively, we could require all numeric types to 
support all numeric formats, even ones that don't make a lot of sense.
-- 
https://mail.python.org/mailman/listinfo/python-list