Re: Language-neutral interface specifications (research)

David Holland Wed, 13 Jul 2022 14:17:25 -0700

On Mon, Jul 11, 2022 at 08:45:09PM -0700, Jakob Stoklund Olesen wrote:
 > I saw this project on the wiki, and it reminds me of a problem I have
 > been trying to understand better:
 >     https://wiki.netbsd.org/projects/project/language-neutral-interfaces/
 >
 > [...] This means I have to think about the difference between API
 > and ABI on POSIX platforms.


Indeed.

 > As I see it, there are three levels of interface definition to consider:

So, unlike Mouse I'd agree that there are three levels and it's useful
to think about three levels, but I think I'd divide them somewhat
differently.

There's the standards level (your level 1) - some structure exists and
has at least certain members, but not in any fixed order, there might
be other fields, the values of manifest constants are not specified,
etc.

There's the C implementation level (between that and your level 2)
where these details are all filled in; e.g. in NetBSD struct stat has
a specific list of members in a specific order, and they have specific
C data types even though the concrete representation of those types
may still vary.

Then there's the ABI level, which is what you get when you run the
implementation level through a compiler.

"API" generally means the standards level, because it's about what
applications (or in general, client software) can assume at the source
level. But there may be multiple standards-level manifestations of
notionally the same interface (e.g. C99 <stdio.h> and POSIX <stdio.h>
are not the same; the latter has more stuff but also nails down some
things that are not specified in C99) not to mention multiple versions
(e.g. snprintf isn't in C89) so in most cases you need to be specific
about exactly what you're talking about.

However, ABIs only arise from compiling specific implementations of
APIs. In principle you could construct an abstract implementation and
compile it in a way that makes no binary-level assumptions that aren't
in the specification-level API contract, but nobody does that. (Partly
because it's hard, also doesn't in general seem useful.)

That's a digression though.

I think your level 2 and "C implementation level" are possibly trying
to describe the same thing, though.

That is, these are the declarations you need to grind to figure out
what the machine-level ABI is and therefore how to interact with
functions or data structures defined by the C API.

In general, if nobody's done anything else, you can get these by
parsing the C header files, and you can only get them by parsing the C
header files. This is why the typical interpreter's foreign function
interface involves glue modules written in C; they can include
whatever C headers and export functions and values in terms of the
interpreter's internal datatypes. If you've ever looked in Python or
Perl bindings for external libraries you've seen this.

If you want to do more than this, that is, assuming your language can
represent C types and functions adequately well to interact with them
directly, generally you need to parse the C headers and digest them
into your own language's form. Doing this on the fly when compiling is
a major headache (even if you're willing to fork an external cpp, you
still need a full C parser) and if you do it once when you build the
compiler and cache the results, those results can go out of date and
it's hard to know when they need to be regenerated and hard to make
sure end-user installs actually do regenerate them when needed.

The goal of the language-neutral interfaces project is to at least
eliminate the cpp and C parser aspect of this by providing definitions
in something easily parsed, and providing some system-management
guarantees about what you see being up to date.

The way the project's been envisioned is that the language-neutral
interface definitions would become the canonical description (that is,
the source) and the C headers would be generated from them.

Reading the C headers to generate the language-neutral interfaces is
also possible, but a lot more work to implement, since you need to
implement the aforementioned C parser. The only advantage of doing
things that way is that it becomes easier to merge and/or still useful
even when not merged -- for your specific use case, since you want
something to support your compiler, you might want it this way so you
can use it everywhere and not just on NetBSD. Or if the generator tool
ends up being something that doesn't belong in NetBSD base, and
therefore can't be merged, this is a way to make it still useful.

 > Ada actually has a standardized foreign function interface that can be
 > used to interface with C and Fortran. The problem is that it interfaces
 > to the Pure-C level, not the API level. I can't use it to call stat() in
 > a portable way.

I would expect that you could define Ada-level constants and an
Ada-level struct stat, and use the FFI to call a glue function that
includes both sets of definitions and translates between them (as
Python modules do) but this is still a nuisance and it also introduces
a lot of overhead.

 > I haven't done anything concrete yet, and I agree that it is a good idea
 > to research prior art. There is a lot of it:
 > [...]

Unless you can find a nice survey article/paper (possible), or you
want to write one, my suggestion would be to restrict yourself to
those that are actually intended for use with language-level bindings,
as opposed to say RPC systems. That will cut out 90-95% of the noise.

Rust's thing might be a good place to start, if it has any kind of
whitepaper or technical article, depending on who wrote it; while
there are elements of the Rust community that are, let's say, not
entirely interested in learning from anyone else's experience, there
are also people in there who know what they're doing.

 > I am not looking to do a GSOC project or anything lke that, but I
 > would like to research this some more, and perhaps learn from your
 > expertise.

Is that project even tagged as a GSOC project? Offhand I'd say it's
too big for one :-)

The project list is there to feed GSOC, but it's far from the case
that everything on it is meant as a GSOC project. Ultimately it's
meant as a better repository for things we want to do than forgotten
tickets in the bug database.

 > In terms of existing IDLs, do you have anything in mind that you think
 > could work?

If I did, I'd have mentioned it in the project description.

We already have rpcgen(1) (the SunRPC interface builder) in base but
it's definitely not suitable.

-- 
David A. Holland
dholl...@netbsd.org

Re: Language-neutral interface specifications (research)

Reply via email to