Re: [XeTeX] Future *TeX [was: XeTeX maintenance]

Simon Cozens Sun, 26 Apr 2015 18:36:08 -0700

On 26/04/2015 23:02, Karljurgen Feuerherm wrote:
> b) assuming a certain level of Xe(La)TeX competence at the Œpresenting¹
> level, what recommendations would experts on this list make to Œupping the
> ante¹ I.e. progress toward a more insider understanding of the software?


OK, I can answer this but I shall add a few other scattered thoughts as
well. :-)

First I also want to thank Khaled for his fantastic work in keeping
xetex maintained, bugs fixed, and users happy. Until recently I presumed
that Khaled was a professional software and/or typography person; I
didn't realise that xetex was essentially a free time project for him. I
was even more amazed by his dedication and professionalism.

Bit by bit, I seem to have picked up how TeX and xetex works, and I wish
I could help with the maintainership, but I just don't have the time at
present. (And given that I think the future of TeX is spelt "SILE" I
don't think it's appropriate for me to either. ;-) )

How to get to know xetex? I think the first step in moving from a user
to understanding the mechanics has to be Victor Eijkhout's TeX By Topic.
Either buy a hard copy or download it from
https://bitbucket.org/VictorEijkhout/tex-by-topic/src and read it over
four or five times. It's by far the best introduction to how the TeX
engine works.

After that you should be able to work your way through the TeXBook; read
that until you can understand the double-arrow sections.

>From there, there are two directions you need to go in: the WEB program
for TeX, and the xetex extensions and all the related font handling code
and libraries that it uses.

As Joseph and Phillip have mentioned, WEB is not an easy thing to work
with, and WEB2C doesn't make it any better. But in a way there's nothing
you can do about that; TeX is the WEB source. A lot of the design
constraints of TeX in the early 80s don't apply any more; most of the
unpleasantness around WEB comes from the fact that memory is allocated
statically and that structures are hand-rolled with pointers and
offsets. Rewriting the whole thing in another language wouldn't be a
crazy idea (I've done it) and for long-term maintainability I think it's
essential - we can't go on with statically-allocated PASCAL code for
ever - but it would be a major operation outside of the bounds of
maintaining the current *TeX projects.

But the up side of that is that there's very little of the WEB code that
you actually need to mess with. Most of it Just Works and is never going
to need to change, and most of the time you can assume that if there's
an problem, it's with the xetex-specific bits, rather than with Knuth's TeX.

So after you have a conceptual understanding of how TeX works, the next
step is to run weave on source/texk/web2c/xetexdir/xetex.web [1] and
start reading. You can skim over parts 1-19, read the rest normally, and
focus most of your attention on parts 37-46. In particular, you want to
read over the parts which deal with native word nodes, which are
(basically) hboxes containing native font characters. Look up
"native_word_node" in the index at the back and read those sections.

Many of the XeTeX extensions call out from Pascal into C; these are
defined in the xetex.defines file. This is a bit tricky to match up
because WEB2C (I think) strips the underscores from the names in the WEB
file. So set_native_metrics in xetex.web gets turned into
setnativemetrics, which is defined by xetex.h as measure_native_node,
which you will find defined in XeTeX_ext.c - this is the key function
which, takes a Pascal memory region representing a native word node (a
bunch of Unicode characters), calls the font shaping functions on it,
and fills in the height, width, and depth of that node back into Pascal
so TeX can run its algorithms on it. Start your exploration of the C
sources from that routine, and follow all the function calls until you
understand what it's doing. At some point you will follow it down to the
harfbuzz interface in XeTeXLayoutInterface.cpp and the FontConfig
interface in XeTeXFontMgr_FC.cpp. (My feeling is that the
AAT/Mac-specific stuff is dead now, and at any rate it's easier to
understand FontConfig/harfbuzz anyway.)

Once you get to that layer, you may be perplexed by the lack of
documentation for both harfbuzz and FontConfig. Hopefully my article at
http://www.simon-cozens.org/content/duffers-guide-fontconfig-and-harfbuzz might
help with this.

Finally, about the future of TeX. Obviously my view that a complete
rewrite is a good idea is going to be a minority report for a while yet,
so I'll stick to XeTeX and LuaTeX.

I don't know as much as I should about LuaTeX. For me, the point of
xetex is not just that it's a Unicode-compatible TeX, but also that it
supports native fonts well and that it both handles native OS fonts in a
simple way and supports shaping of complex scripts (the harfbuzz bit).
Hans has stated that LuaTeX will not include external font shaping in
the core, and Graham Douglas has done some experiments in calling out to
harfbuzz in Lua for shaping, but I don't think it's a done thing yet.

So if I want to use complex scripts and native fonts, (and I do) then
right now xetex is where it's at for me.

But in a way that's beside the point. I've come to thinking that the
future of a software project has less to do with its features than with
the size and dedication of its community. If xetex (or luatex) doesn't
have a good core of people who can improve, debug and maintain it, then
no matter how good it is, it isn't going to be the future of TeX.

So I hope we can make that happen by building more understanding of the
XeTeX internals, and I'm happy to mentor anyone who wants to understand
the code better. If you have any questions about the code, ask on this
list and I will try to answer.

[1] Giving up on the change file idea was one of the smartest moves the
xetex people made... Incidentally I attach weave-html.pl, which I
thought was a standard TeX thing but turns out to be something I cooked
up myself. It turns WEB files into pretty linked hypertext.

my $state = "limbo";
my @sections;
my @sect_titles;
my $sectno = 0;
my $nodeno = 0;
my @nodes;
use strict;
my %nodeparts;
my %nodedefs;
my %nodenames;
my $data = do { local $/; <> };
$| = 1;
# First pass
while (length $data) {
	#print(substr($data,0,5)."\n");
 	if ($data =~ s/^((
 			[^\@]+|
 			\@\@|
 			\@[\:\.\^](.*?)@>\s*|
 			\@!\@(.)(.*?)@>\s*|
			\@"[a-fA-F0-9]+|
			\@'[0-7]+|
			\@([\\\$+!\?;,\#\&\/\|])|
			\@([t=])(.*?)\@>|
			\@{(.*?)\@}
 		)+)//sx) {
		$nodes[$nodeno]{$state} .= $1;
	} elsif ($data =~ s/^\@d\s*//s) {
		$state = "defs";			
	} elsif ($data =~ s/^\@\*\s*(.*?)\.\s*//s) {
		$state = "doc";

		my $sectname = $1;
		$sectname =~ s/\\\[\d+\]\s*//;
		$sect_titles[++$sectno] = $sectname;
		print STDERR "[$sectno] ";
		++$nodeno;
		$nodes[$nodeno]{title} = $sectname;
		$nodes[$nodeno]{sectno} = $sectno;
	} elsif ($data =~ s/^\@ \s*//s) {
		$state = "doc";
		++$nodeno;
	} elsif ($data =~ s/^\@p\s*//s) {
		$state = "pascal";
	} elsif ($data =~ s/^\@f\s*//s) {
		$state = "format";

	} elsif ($data =~ s/^\@<\s*([^@]+)\s*@>=//s) {
		my $nname = $1;
		if ($nname =~/\.\.\.$/) {
			$nname = resolve($nname);
		}
		$nodes[$nodeno]{defines} = $nname;
		$nname =~ s/\s+/ /gsm;
		#warn "Storing |$nname|";
		push @{$nodeparts{$nname}}, $nodeno;
		$nodedefs{$nname} = $nodeno;

		$nodenames{$nname} = 1;
		$state = "pascal";
	} elsif ($data =~ s/^(\@<\s*(.*?)\s*\@>)//s) {
		my $orig = $1;
		my $nname = $2;
		$nname =~ s/\s+/ /gsm;
		if ($nname =~/\.\.\.$/) {
			$nname = resolve($nname);
		}
		#warn "Storing |$nname|";
		#push @{$nodeparts{$nname}}, $nodeno;
		$nodenames{$nname} = 1;
		$nodes[$nodeno]{$state} .= $orig;

	} else {
		die "Unparsable ". substr($data,0,25). " in node ".$nodeno;
	}
}
use Data::Dumper;
print q{<html>
	<head>
<script src="http://code.jquery.com/jquery-1.10.1.min.js";></script>
<link rel="stylesheet" href="http://netdna.bootstrapcdn.com/bootstrap/3.0.3/css/bootstrap.min.css";>

<!-- Optional theme -->
<link rel="stylesheet" href="http://netdna.bootstrapcdn.com/bootstrap/3.0.3/css/bootstrap-theme.min.css";>
<link href="http://alexgorbatchev.com/pub/sh/current/styles/shCore.css"; rel="stylesheet" type="text/css" />
<link href="http://alexgorbatchev.com/pub/sh/current/styles/shThemeDefault.css"; rel="stylesheet" type="text/css" />
<script src="http://alexgorbatchev.com/pub/sh/current/scripts/shCore.js"; type="text/javascript"></script>
<script src="http://alexgorbatchev.com/pub/sh/current/scripts/shBrushDelphi.js"; type="text/javascript"></script>

<!-- Latest compiled and minified JavaScript -->
<script src="http://netdna.bootstrapcdn.com/bootstrap/3.0.3/js/bootstrap.min.js";></script>
</head>
<body>
<div class="container">
    <div class="row">
        <div class="col-md-3">
            <div class="bs-sidebar hidden-print affix" role="complementary" id="sidebar">
            <ul class="nav" id="mynav">
        };
for (1..$#sect_titles){
	print qq{
                    <li><a href="#s$_" class="">
                          $sect_titles[$_]
                        </a></li>
    }
}

print qq{        
			</ul></div>
		</div>
		<div class="col-md-9" role="main">	
};


for (1..$nodeno) {
	my $node = $nodes[$_];
	next if ($node->{limbo});
	if ($node->{title}) {
		print "<a name=\"s$node->{sectno}\"><h1>$node->{title}</h1></a>\n";
	}
	print "<p><b><a name=\"sect-$_\">".$_."</a></b> ";	
	if ($node->{doc}) {
		print map "$_</p><p>\n", split /\n\n/, detex($node->{doc});
	print "</p>";
	}
	if ($node->{pascal}) {
		if ($node->{defines}) { print "<b>".detex($node->{defines})." = </b><br>"}
		print "<pre class=\"brush: pascal\">";
		print depascal($node->{pascal});
		print "</pre>";
		if ($node->{defines} and $#{$nodeparts{$node->{defines}}} > 1) {
			print "<p><small>See also ", join ", ", map {qq{<a href=\"#sect-$_\">$_</a>} } @{$nodeparts{$node->{defines}}};
			print "</small></p>"
		}
	}
	print "\n\n";
}
print "<script></script>";
print "</div></div></div></body></html>\n";
use HTML::Entities;
sub common_filtering {
	my $data = shift;
	$data =~ s/\@t.*?@>|@\///sg;
	# Screw indexing
	$data =~ s/\@[\:\.\^](.*?)@>\s*|\@!\@(.)(.*?)@>\s*|@!//sg;
	return $data;
}

sub detex {
	my $data = shift;
	1 while $data =~ s/(\@<\s*(.*?)\s*\@>)/link_to(resolve($2))/es;
	$data =~ s/~/ /g;
	$data =~ s/\$\$.*?\$\$//gs; # Urgh
	$data =~ s/\\\.\{(.*?)\}/<span class="sc">$1<\/span>/g;
	$data =~ s/\\(TeX|XeTeX|PASCAL)\\?/$1/g;
	#$data =~ s/\@"([a-fA-F0-9]+)/<code>0x$1<\/code>/g;
	#$data =~ s/\@'([0-7]+)/<code>0$1<\/code>/g;
	$data =~ s/\|([^\|]+)\|/"<code>".depascal($1)."<\/code>"/eg;
	return common_filtering($data);
}

sub depascal {
	my $data = shift;
	#$data =~ s/\@,|\@+/ /g;
	my $new;
	while ($data) {
		if ($data=~ s/^(\@<\s*(.*?)\s*\@>)//s) { $new .= link_to(resolve($2)); next }
		if ($data =~ s/^(\@t.*?@>|\@[\/;\+\|\$\?])//s) { next; }
		if ($data =~ s/^\@,//s) { $new .= " "; next ;}
		if ($data =~ s/^\@\{.*?\@\}//s) { next; } # Bye bye comments
		if ($data =~ s/^(\@[\:\.\^](.*?)@>\s*|\@!\@(.)(.*?)@>\s*|@!)//) { next; }
		if ($data=~ s/^\@"([a-fA-F0-9]+)//s) { $new .= hex($1); next;}
		if ($data=~ s/^\@'([0-7]+)//s) { $new .= oct($1); next }
		if ($data =~ s/^([^@]+)//) {$new .= encode_entities($1); next }
		if ($data =~ s/^\@#|\@\\//) {$new .= "<br>"; next }
		if ($data =~ s/\@\@//) { $new .= "@"; next;}
		die substr($data, 0, 5); 
	}
	return $new;

}
sub link_to {
	my $nname = shift;
	return qq{<span class="notcode"><a href="#sect-@{[ $nodedefs{$nname} ]}">$nname</a></span>};
}
sub resolve {
	my $portion = shift;
		$portion =~ s/\s+/ /gsm;

	$portion=~ s/\.\.\.$//;
	for (keys %nodenames) { # Urgh
		#warn $_. " -> $portion ";
		if (/\Q$portion\E/) { return $_ }
	}
	#warn $_ for sort keys %nodeparts;
	#warn $nodeno;
	die $portion;
	die;
}


--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] Future *TeX [was: XeTeX maintenance]

Reply via email to