[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

Greg Price Sun, 04 Aug 2019 20:56:36 -0700


New submission from Greg Price <[email protected]>:


I spent some time yesterday on #18236, and I have a patch for it.

Most of that work happens in the script Tools/unicode/makeunicode.py , and 
along the way I made several changes there that I found made it somewhat nicer 
to work on, and I think will help other people reading that script too.  I'd 
like to try to merge those improvements first.

The main changes are:

 * As the script has grown over the years, it's gained many copies and 
reimplementations of logic to parse the standard format of the Unicode 
character database.  I factored those out into a single place, which makes the 
parsing code shorter and the interesting parts stand out more easily.

 * The main per-character record type in the script's data structures is a 
length-18 tuple.  Using the magic of dataclasses, I converted this so that e.g. 
the code says `record.numeric_value` instead of `record[8]`.

There's no radical restructuring or rewrite here; this script has served us 
well.  I've kept these changes focused where there's a high ratio of value, in 
future ease of development, to cost, in a reviewer's effort as well as mine.

I'll send PRs of my changes shortly.

----------
components: Unicode
messages: 349020
nosy: Greg Price, ezio.melotti, vstinner
priority: normal
severity: normal
status: open
title: Refactor makeunicodedata.py: dedupe parsing, use dataclass
type: enhancement
versions: Python 3.9

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue37760>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37760] Refactor makeunicodedata.py: dedupe parsing, use dataclass

Reply via email to