2012/7/8 David Kastrup <d...@gnu.org>: > Thomas Morley <thomasmorle...@googlemail.com> writes: > >> Hi, >> >> together with Arnold I worked on a method how to compress or stretch a >> text, limiting it to the space between characters, i.e. the letters >> itself shouldn't be scaled. >> (Comes out of a discussion at the german LilyPond-Forum: >> http://www.lilypondforum.de/index.php?topic=1152.0 ) >> >> The difficulty is to achieve a functionality which turns a string into >> a list of single strings and works with accented letters, german >> Umlaute, non-europian fonts etc. >> p.e.: >> "áèçäöüテスト" → '("á" "è" "ç" "ä" "ö" "ü" "テ" "ス" "ト") >> >> We're coming up with the attached code. >> >> Problems: >> UNICODE is increasing, so the code needs updating from time to time. >> Once LilyPond uses guile 2.0 the situation may be completely >> different. (I've not a clue about guile 2.0) >> >> What do you think? >> Or let me ask different: Are there any objections to turn it into a >> patch? > > Several observations: > > a) guilev2 is going to become a definite issue this year. We may either > decide to support both guilev1 or guilev2, or ditch guilev1 support > completely. > > So it does not make sense to design a solution that is not easy to > support with guilev2. > > b) LilyPond's lexer goes to considerable length to not let any invalid > utf8 pass into strings. It would be reasonably straightforward, if > required, to make sure that this also holds for embedded Scheme. In > that case, the only way to arrive at invalid utf-8 would be > synthesizing strings in Scheme from bytes. So I'd not bother about > invalid utf-8. This means that, diacriticals apart, you can just > split the string before any byte outside the range 80-bf. > > This can basically be done using charsets. I tried doing this with > regexps, but curiously enough, in contrast to Guile proper, those appear > to be already utf-8 aware, so > > #(use-modules (ice-9 regex)) > > #(define (utf8-substrings str) > (define char-pat (make-regexp ".")) > (map match:substring (list-matches char-pat str))) > > #(write (utf8-substrings "áèçäöüテスト")) > > works just fine (if you overlook the fact that write misbehaves, writing > some byte codes quoted as \xhh inside of a string and others literally). > > -- > David Kastrup > > > _______________________________________________ > lilypond-devel mailing list > lilypond-devel@gnu.org > https://lists.gnu.org/mailman/listinfo/lilypond-devel
Wow! Following your suggestion I managed to drop about 300 lines, reducing it to a quarter of the original. You definitly should earn more money!! Of course I had to redefine `string-list->string'. I used recursion, which was the best I could think of. (`string-list->string' isn't used here, but I need it elsewhere) Do you agree If I turn it into a patch? I think `string->string-list' and `string-list->string' are very useful tools and `char-space' might be of interest, too. Thanks a lot, Harm
utf-8-strings-rev-02.ly
Description: Binary data
_______________________________________________ lilypond-devel mailing list lilypond-devel@gnu.org https://lists.gnu.org/mailman/listinfo/lilypond-devel