On 26/04/2015 23:02, Karljurgen Feuerherm wrote: > b) assuming a certain level of Xe(La)TeX competence at the Œpresenting¹ > level, what recommendations would experts on this list make to Œupping the > ante¹ I.e. progress toward a more insider understanding of the software?
OK, I can answer this but I shall add a few other scattered thoughts as well. :-) First I also want to thank Khaled for his fantastic work in keeping xetex maintained, bugs fixed, and users happy. Until recently I presumed that Khaled was a professional software and/or typography person; I didn't realise that xetex was essentially a free time project for him. I was even more amazed by his dedication and professionalism. Bit by bit, I seem to have picked up how TeX and xetex works, and I wish I could help with the maintainership, but I just don't have the time at present. (And given that I think the future of TeX is spelt "SILE" I don't think it's appropriate for me to either. ;-) ) How to get to know xetex? I think the first step in moving from a user to understanding the mechanics has to be Victor Eijkhout's TeX By Topic. Either buy a hard copy or download it from https://bitbucket.org/VictorEijkhout/tex-by-topic/src and read it over four or five times. It's by far the best introduction to how the TeX engine works. After that you should be able to work your way through the TeXBook; read that until you can understand the double-arrow sections. >From there, there are two directions you need to go in: the WEB program for TeX, and the xetex extensions and all the related font handling code and libraries that it uses. As Joseph and Phillip have mentioned, WEB is not an easy thing to work with, and WEB2C doesn't make it any better. But in a way there's nothing you can do about that; TeX is the WEB source. A lot of the design constraints of TeX in the early 80s don't apply any more; most of the unpleasantness around WEB comes from the fact that memory is allocated statically and that structures are hand-rolled with pointers and offsets. Rewriting the whole thing in another language wouldn't be a crazy idea (I've done it) and for long-term maintainability I think it's essential - we can't go on with statically-allocated PASCAL code for ever - but it would be a major operation outside of the bounds of maintaining the current *TeX projects. But the up side of that is that there's very little of the WEB code that you actually need to mess with. Most of it Just Works and is never going to need to change, and most of the time you can assume that if there's an problem, it's with the xetex-specific bits, rather than with Knuth's TeX. So after you have a conceptual understanding of how TeX works, the next step is to run weave on source/texk/web2c/xetexdir/xetex.web [1] and start reading. You can skim over parts 1-19, read the rest normally, and focus most of your attention on parts 37-46. In particular, you want to read over the parts which deal with native word nodes, which are (basically) hboxes containing native font characters. Look up "native_word_node" in the index at the back and read those sections. Many of the XeTeX extensions call out from Pascal into C; these are defined in the xetex.defines file. This is a bit tricky to match up because WEB2C (I think) strips the underscores from the names in the WEB file. So set_native_metrics in xetex.web gets turned into setnativemetrics, which is defined by xetex.h as measure_native_node, which you will find defined in XeTeX_ext.c - this is the key function which, takes a Pascal memory region representing a native word node (a bunch of Unicode characters), calls the font shaping functions on it, and fills in the height, width, and depth of that node back into Pascal so TeX can run its algorithms on it. Start your exploration of the C sources from that routine, and follow all the function calls until you understand what it's doing. At some point you will follow it down to the harfbuzz interface in XeTeXLayoutInterface.cpp and the FontConfig interface in XeTeXFontMgr_FC.cpp. (My feeling is that the AAT/Mac-specific stuff is dead now, and at any rate it's easier to understand FontConfig/harfbuzz anyway.) Once you get to that layer, you may be perplexed by the lack of documentation for both harfbuzz and FontConfig. Hopefully my article at http://www.simon-cozens.org/content/duffers-guide-fontconfig-and-harfbuzz might help with this. Finally, about the future of TeX. Obviously my view that a complete rewrite is a good idea is going to be a minority report for a while yet, so I'll stick to XeTeX and LuaTeX. I don't know as much as I should about LuaTeX. For me, the point of xetex is not just that it's a Unicode-compatible TeX, but also that it supports native fonts well and that it both handles native OS fonts in a simple way and supports shaping of complex scripts (the harfbuzz bit). Hans has stated that LuaTeX will not include external font shaping in the core, and Graham Douglas has done some experiments in calling out to harfbuzz in Lua for shaping, but I don't think it's a done thing yet. So if I want to use complex scripts and native fonts, (and I do) then right now xetex is where it's at for me. But in a way that's beside the point. I've come to thinking that the future of a software project has less to do with its features than with the size and dedication of its community. If xetex (or luatex) doesn't have a good core of people who can improve, debug and maintain it, then no matter how good it is, it isn't going to be the future of TeX. So I hope we can make that happen by building more understanding of the XeTeX internals, and I'm happy to mentor anyone who wants to understand the code better. If you have any questions about the code, ask on this list and I will try to answer. [1] Giving up on the change file idea was one of the smartest moves the xetex people made... Incidentally I attach weave-html.pl, which I thought was a standard TeX thing but turns out to be something I cooked up myself. It turns WEB files into pretty linked hypertext.
my $state = "limbo"; my @sections; my @sect_titles; my $sectno = 0; my $nodeno = 0; my @nodes; use strict; my %nodeparts; my %nodedefs; my %nodenames; my $data = do { local $/; <> }; $| = 1; # First pass while (length $data) { #print(substr($data,0,5)."\n"); if ($data =~ s/^(( [^\@]+| \@\@| \@[\:\.\^](.*?)@>\s*| \@!\@(.)(.*?)@>\s*| \@"[a-fA-F0-9]+| \@'[0-7]+| \@([\\\$+!\?;,\#\&\/\|])| \@([t=])(.*?)\@>| \@{(.*?)\@} )+)//sx) { $nodes[$nodeno]{$state} .= $1; } elsif ($data =~ s/^\@d\s*//s) { $state = "defs"; } elsif ($data =~ s/^\@\*\s*(.*?)\.\s*//s) { $state = "doc"; my $sectname = $1; $sectname =~ s/\\\[\d+\]\s*//; $sect_titles[++$sectno] = $sectname; print STDERR "[$sectno] "; ++$nodeno; $nodes[$nodeno]{title} = $sectname; $nodes[$nodeno]{sectno} = $sectno; } elsif ($data =~ s/^\@ \s*//s) { $state = "doc"; ++$nodeno; } elsif ($data =~ s/^\@p\s*//s) { $state = "pascal"; } elsif ($data =~ s/^\@f\s*//s) { $state = "format"; } elsif ($data =~ s/^\@<\s*([^@]+)\s*@>=//s) { my $nname = $1; if ($nname =~/\.\.\.$/) { $nname = resolve($nname); } $nodes[$nodeno]{defines} = $nname; $nname =~ s/\s+/ /gsm; #warn "Storing |$nname|"; push @{$nodeparts{$nname}}, $nodeno; $nodedefs{$nname} = $nodeno; $nodenames{$nname} = 1; $state = "pascal"; } elsif ($data =~ s/^(\@<\s*(.*?)\s*\@>)//s) { my $orig = $1; my $nname = $2; $nname =~ s/\s+/ /gsm; if ($nname =~/\.\.\.$/) { $nname = resolve($nname); } #warn "Storing |$nname|"; #push @{$nodeparts{$nname}}, $nodeno; $nodenames{$nname} = 1; $nodes[$nodeno]{$state} .= $orig; } else { die "Unparsable ". substr($data,0,25). " in node ".$nodeno; } } use Data::Dumper; print q{<html> <head> <script src="http://code.jquery.com/jquery-1.10.1.min.js"></script> <link rel="stylesheet" href="http://netdna.bootstrapcdn.com/bootstrap/3.0.3/css/bootstrap.min.css"> <!-- Optional theme --> <link rel="stylesheet" href="http://netdna.bootstrapcdn.com/bootstrap/3.0.3/css/bootstrap-theme.min.css"> <link href="http://alexgorbatchev.com/pub/sh/current/styles/shCore.css" rel="stylesheet" type="text/css" /> <link href="http://alexgorbatchev.com/pub/sh/current/styles/shThemeDefault.css" rel="stylesheet" type="text/css" /> <script src="http://alexgorbatchev.com/pub/sh/current/scripts/shCore.js" type="text/javascript"></script> <script src="http://alexgorbatchev.com/pub/sh/current/scripts/shBrushDelphi.js" type="text/javascript"></script> <!-- Latest compiled and minified JavaScript --> <script src="http://netdna.bootstrapcdn.com/bootstrap/3.0.3/js/bootstrap.min.js"></script> </head> <body> <div class="container"> <div class="row"> <div class="col-md-3"> <div class="bs-sidebar hidden-print affix" role="complementary" id="sidebar"> <ul class="nav" id="mynav"> }; for (1..$#sect_titles){ print qq{ <li><a href="#s$_" class=""> $sect_titles[$_] </a></li> } } print qq{ </ul></div> </div> <div class="col-md-9" role="main"> }; for (1..$nodeno) { my $node = $nodes[$_]; next if ($node->{limbo}); if ($node->{title}) { print "<a name=\"s$node->{sectno}\"><h1>$node->{title}</h1></a>\n"; } print "<p><b><a name=\"sect-$_\">".$_."</a></b> "; if ($node->{doc}) { print map "$_</p><p>\n", split /\n\n/, detex($node->{doc}); print "</p>"; } if ($node->{pascal}) { if ($node->{defines}) { print "<b>".detex($node->{defines})." = </b><br>"} print "<pre class=\"brush: pascal\">"; print depascal($node->{pascal}); print "</pre>"; if ($node->{defines} and $#{$nodeparts{$node->{defines}}} > 1) { print "<p><small>See also ", join ", ", map {qq{<a href=\"#sect-$_\">$_</a>} } @{$nodeparts{$node->{defines}}}; print "</small></p>" } } print "\n\n"; } print "<script></script>"; print "</div></div></div></body></html>\n"; use HTML::Entities; sub common_filtering { my $data = shift; $data =~ s/\@t.*?@>|@\///sg; # Screw indexing $data =~ s/\@[\:\.\^](.*?)@>\s*|\@!\@(.)(.*?)@>\s*|@!//sg; return $data; } sub detex { my $data = shift; 1 while $data =~ s/(\@<\s*(.*?)\s*\@>)/link_to(resolve($2))/es; $data =~ s/~/ /g; $data =~ s/\$\$.*?\$\$//gs; # Urgh $data =~ s/\\\.\{(.*?)\}/<span class="sc">$1<\/span>/g; $data =~ s/\\(TeX|XeTeX|PASCAL)\\?/$1/g; #$data =~ s/\@"([a-fA-F0-9]+)/<code>0x$1<\/code>/g; #$data =~ s/\@'([0-7]+)/<code>0$1<\/code>/g; $data =~ s/\|([^\|]+)\|/"<code>".depascal($1)."<\/code>"/eg; return common_filtering($data); } sub depascal { my $data = shift; #$data =~ s/\@,|\@+/ /g; my $new; while ($data) { if ($data=~ s/^(\@<\s*(.*?)\s*\@>)//s) { $new .= link_to(resolve($2)); next } if ($data =~ s/^(\@t.*?@>|\@[\/;\+\|\$\?])//s) { next; } if ($data =~ s/^\@,//s) { $new .= " "; next ;} if ($data =~ s/^\@\{.*?\@\}//s) { next; } # Bye bye comments if ($data =~ s/^(\@[\:\.\^](.*?)@>\s*|\@!\@(.)(.*?)@>\s*|@!)//) { next; } if ($data=~ s/^\@"([a-fA-F0-9]+)//s) { $new .= hex($1); next;} if ($data=~ s/^\@'([0-7]+)//s) { $new .= oct($1); next } if ($data =~ s/^([^@]+)//) {$new .= encode_entities($1); next } if ($data =~ s/^\@#|\@\\//) {$new .= "<br>"; next } if ($data =~ s/\@\@//) { $new .= "@"; next;} die substr($data, 0, 5); } return $new; } sub link_to { my $nname = shift; return qq{<span class="notcode"><a href="#sect-@{[ $nodedefs{$nname} ]}">$nname</a></span>}; } sub resolve { my $portion = shift; $portion =~ s/\s+/ /gsm; $portion=~ s/\.\.\.$//; for (keys %nodenames) { # Urgh #warn $_. " -> $portion "; if (/\Q$portion\E/) { return $_ } } #warn $_ for sort keys %nodeparts; #warn $nodeno; die $portion; die; }
-------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex