Hi Alexander,
thank you for that pointer. This made my day!
Am 12.03.19 um 14:54 schrieb Alexander Kobel:
Hi,
On 12.03.19 10:43, Urs Liska wrote:
Am 12.03.19 um 01:14 schrieb Aaron Hill:
On 2019-03-11 3:40 pm, David Kastrup wrote:
Urs Liska <li...@openlilylib.org> writes:
[...]
Also, I should have been clear before. David's code should work for
most cases. I was just being pedantic that /./ would not work if
the input has combining characters. For instance, if you type
U+0308 (Combining Diaeresis) after an 'a', you'll get an ä. But
the simple regex would not treat that as a single grapheme. The
result would be "T a ̈ s t".
I did understand it that way, and it would not be an issue in the
project I'm working on. There it's just some umlauts.
given that Aaron, my undisputed hero of Lily-UTFxy-workarounds, is
active in this thread: I'm surprised to see no mention of his
wonderful example of such a workaround from
https://lists.gnu.org/archive/html/lilypond-user/2018-10/msg00473.html
IIRC, the essentials of the approach is to encode stuff as UTF-32
(more or less brute force), and handle individual characters as chunks
of 4 consecutive bytes / 0..255-integers in a list.
It's not the ultimate solution to all imaginable troubles with
encodings, but should be good enough for almost every *practical* use
case.
In his modified center-lyrics-ignoring-punctuation.ily from that
thread, you'll find the two main utility functions as string->utf32
and utf32->string. I presume you could call string->utf32, slice in a
'(0 0 0 32) after each 4 entries, convert back via utf32->string, et
voilà.
For future reference here is my solution:
\version "2.19.50" %% and higher
% Coded by Aaron Hill
%
% Referenced from
% https://lists.gnu.org/archive/html/lilypond-user/2019-03/msg00150.html
%
%% UTF8 workaround - see the following...
%% http://lists.gnu.org/archive/html/lilypond-user/2018-10/msg00468.html
#(define (utf8->utf32 lst)
"Converts a list of UTF8-encoded characters into UTF32."
(if (null? lst) '()
(let ((ch (char->integer (car lst))))
(cond
;; Characters 0x00-0x7F
((< ch #b10000000) (cons ch (utf8->utf32 (cdr lst))))
;; Characters 0x80-0x7FF
((eqv? (logand ch #b11100000) #b11000000)
(cons (let ((ch2 (char->integer (cadr lst))))
(logior (ash (logand ch #b11111) 6)
(logand ch2 #b111111)))
(utf8->utf32 (cddr lst))))
;; Characters 0x800-0xFFFF
((eqv? (logand ch #b11110000) #b11100000)
(cons (let ((ch2 (char->integer (cadr lst)))
(ch3 (char->integer (caddr lst))))
(logior (ash (logand ch #b1111) 12)
(ash (logand ch2 #b111111) 6)
(logand ch3 #b111111)))
(utf8->utf32 (cdddr lst))))
;; Characters 0x10000-0x10FFFF
((eqv? (logand ch #b111110000) #b11110000)
(cons (let ((ch2 (char->integer (cadr lst)))
(ch3 (char->integer (caddr lst)))
(ch4 (char->integer (cadddr lst))))
(logior (ash (logand ch #b111) 18)
(ash (logand ch2 #b111111) 12)
(ash (logand ch3 #b111111) 6)
(logand ch4 #b111111)))
(utf8->utf32 (cddddr lst))))
;; Ignore orphaned continuation characters
((eqv? (logand ch #b11000000) #b10000000) (utf8->utf32 (cdr lst)))
;; Error on all else
(else (error "Unexpected character:" ch))))))
#(define (utf32->utf8 lst)
"Converts a list of UTF32-encoded characters into UTF8."
(if (null? lst) '()
(let ((ch (car lst)))
(append (cond
;; Characters 0x00-0x7F
((< ch #x80) (list (integer->char ch)))
;; Characters 0x80-0x7FF
((< ch #x800) (list
(integer->char (logior #b11000000 (logand (ash ch -6) #b11111)))
(integer->char (logior #b10000000 (logand ch #b111111)))))
;; Characters 0x800-0xFFFF
((< ch #x10000) (list
(integer->char (logior #b11100000 (logand (ash ch -12) #b1111)))
(integer->char (logior #b10000000 (logand (ash ch -6) #b111111)))
(integer->char (logior #b10000000 (logand ch #b111111)))))
;; Characters 0x10000-0x10FFFF
(else (list
(integer->char (logior #b11110000 (logand (ash ch -18) #b111)))
(integer->char (logior #b10000000 (logand (ash ch -12) #b111111)))
(integer->char (logior #b10000000 (logand (ash ch -6) #b111111)))
(integer->char (logior #b10000000 (logand ch #b111111))))))
(utf32->utf8 (cdr lst))))))
#(define (string->utf32 s) (utf8->utf32 (string->list s)))
#(define (utf32->string l) (list->string (utf32->utf8 l)))
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% \letterspaced markup command
% simply inserting spaces between letters
#(define-markup-command
(letterspaced layout props text)(string?)
(let*
(;; convert string to list of raw UTF-32 numbers
(utf-chars (string->utf32 text))
;; Interleave spaces (32) between the characters
(spaced-list
(let ((result '()))
(for-each
(lambda (elt)
(set! result (cons elt result))
(set! result (cons 32 result)))
utf-chars)
;; reverse the list (which has been reversed through the cons-es
;; and strip the surplus space that is now at the head of the list
(reverse result)))
(spaced-text (utf32->string spaced-list))
)
(interpret-markup layout props
(markup (utf32->string spaced-list)))))
%%%%%%%%%%%
% Example
{
s1 ^\markup { Foo \letterspaced "Hey łäſ" bar }
}
_______________________________________________
lilypond-user mailing list
lilypond-user@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-user