On Mon, Jul 11, 2022 at 08:45:09PM -0700, Jakob Stoklund Olesen wrote: > I saw this project on the wiki, and it reminds me of a problem I have > been trying to understand better: > https://wiki.netbsd.org/projects/project/language-neutral-interfaces/ > > [...] This means I have to think about the difference between API > and ABI on POSIX platforms.
Indeed. > As I see it, there are three levels of interface definition to consider: So, unlike Mouse I'd agree that there are three levels and it's useful to think about three levels, but I think I'd divide them somewhat differently. There's the standards level (your level 1) - some structure exists and has at least certain members, but not in any fixed order, there might be other fields, the values of manifest constants are not specified, etc. There's the C implementation level (between that and your level 2) where these details are all filled in; e.g. in NetBSD struct stat has a specific list of members in a specific order, and they have specific C data types even though the concrete representation of those types may still vary. Then there's the ABI level, which is what you get when you run the implementation level through a compiler. "API" generally means the standards level, because it's about what applications (or in general, client software) can assume at the source level. But there may be multiple standards-level manifestations of notionally the same interface (e.g. C99 <stdio.h> and POSIX <stdio.h> are not the same; the latter has more stuff but also nails down some things that are not specified in C99) not to mention multiple versions (e.g. snprintf isn't in C89) so in most cases you need to be specific about exactly what you're talking about. However, ABIs only arise from compiling specific implementations of APIs. In principle you could construct an abstract implementation and compile it in a way that makes no binary-level assumptions that aren't in the specification-level API contract, but nobody does that. (Partly because it's hard, also doesn't in general seem useful.) That's a digression though. I think your level 2 and "C implementation level" are possibly trying to describe the same thing, though. That is, these are the declarations you need to grind to figure out what the machine-level ABI is and therefore how to interact with functions or data structures defined by the C API. In general, if nobody's done anything else, you can get these by parsing the C header files, and you can only get them by parsing the C header files. This is why the typical interpreter's foreign function interface involves glue modules written in C; they can include whatever C headers and export functions and values in terms of the interpreter's internal datatypes. If you've ever looked in Python or Perl bindings for external libraries you've seen this. If you want to do more than this, that is, assuming your language can represent C types and functions adequately well to interact with them directly, generally you need to parse the C headers and digest them into your own language's form. Doing this on the fly when compiling is a major headache (even if you're willing to fork an external cpp, you still need a full C parser) and if you do it once when you build the compiler and cache the results, those results can go out of date and it's hard to know when they need to be regenerated and hard to make sure end-user installs actually do regenerate them when needed. The goal of the language-neutral interfaces project is to at least eliminate the cpp and C parser aspect of this by providing definitions in something easily parsed, and providing some system-management guarantees about what you see being up to date. The way the project's been envisioned is that the language-neutral interface definitions would become the canonical description (that is, the source) and the C headers would be generated from them. Reading the C headers to generate the language-neutral interfaces is also possible, but a lot more work to implement, since you need to implement the aforementioned C parser. The only advantage of doing things that way is that it becomes easier to merge and/or still useful even when not merged -- for your specific use case, since you want something to support your compiler, you might want it this way so you can use it everywhere and not just on NetBSD. Or if the generator tool ends up being something that doesn't belong in NetBSD base, and therefore can't be merged, this is a way to make it still useful. > Ada actually has a standardized foreign function interface that can be > used to interface with C and Fortran. The problem is that it interfaces > to the Pure-C level, not the API level. I can't use it to call stat() in > a portable way. I would expect that you could define Ada-level constants and an Ada-level struct stat, and use the FFI to call a glue function that includes both sets of definitions and translates between them (as Python modules do) but this is still a nuisance and it also introduces a lot of overhead. > I haven't done anything concrete yet, and I agree that it is a good idea > to research prior art. There is a lot of it: > [...] Unless you can find a nice survey article/paper (possible), or you want to write one, my suggestion would be to restrict yourself to those that are actually intended for use with language-level bindings, as opposed to say RPC systems. That will cut out 90-95% of the noise. Rust's thing might be a good place to start, if it has any kind of whitepaper or technical article, depending on who wrote it; while there are elements of the Rust community that are, let's say, not entirely interested in learning from anyone else's experience, there are also people in there who know what they're doing. > I am not looking to do a GSOC project or anything lke that, but I > would like to research this some more, and perhaps learn from your > expertise. Is that project even tagged as a GSOC project? Offhand I'd say it's too big for one :-) The project list is there to feed GSOC, but it's far from the case that everything on it is meant as a GSOC project. Ultimately it's meant as a better repository for things we want to do than forgotten tickets in the bug database. > In terms of existing IDLs, do you have anything in mind that you think > could work? If I did, I'd have mentioned it in the project description. We already have rpcgen(1) (the SunRPC interface builder) in base but it's definitely not suitable. -- David A. Holland dholl...@netbsd.org