Again, another "modules" proposal, but in more steps.

(sorry, this is a very long post).


===


TL;DR: the final goal (in a long time, possibly) is to be able to do something like this:

<?php declare(module=1);
// some_file.php

export class SomeClass {
    public function someMessage() {
        myInternalFunction();
    }
}

function myInternalFunction() {
    echo "Hello world!";
}

<?php
// index.php
import SomeClass from 'some_file.php';

$o = new SomeClass();
$o->someMessage(); // Works.
myInternalFunction(); // Fatal error: undefined function "myInternalFunction".


===


Trivia:


Recently, we had proposals for "Friend classes" and "Pure PHP files". These suggestions aren't new at all, but they demonstrate some wish in the PHP ecosystem to make certain changes for PHP to be closer to other programming languages.

On my side, I've reviewed some of the discussions regarding "modules", and though it's quite messy because there are lots of different views and opinions, I think, maybe too optimistically, that there might be a way to pave the way to "PHP modules" in another way than a huge change in the entire engine.


I think we can implement actual "modules" in two steps:

- Implement *definition files* first, so they can be handled in the compilation step and have no runtime effect (see later). - Implement "*modules*" with an "import" keyword, and make "import"-ed modules in a way that a "module" is only a definition file that can "export" some of its defined structures and/or "import" structures from other modules, with a completely enclosed and standalone scope/context.


===


First step: *definition files*.

One of the proposals for "modules" implied files with a different PHP extension, to make them easily distinguishible from other files, and the recent "pure files" suggestion follows the same idea: removing the "<?php" tag so that PHP doesn't need it anymore.

However, these discussions had certain conclusions that I agree with:

- Making the extensions different will profoundly change how PHP includes files. Extensions like ".inc", ".php5" or alike were discouraged for specific reasons, and nowadays, most PHP handlers (apache, nginx, caddy, and possibly others) are defaulted to ".php" extension files, so adding a new extension means changing both the engine and the whole ecosystem. The conclusion is overall that changing the extension will not give any benefit, and only brings disadvantages. - "pure" files, as in "no open/close tags" brings no real value, because having "<?php" is similar to having a shebang line in many other file types, and even PHP files themselves can contain a shebang line, so it's already a nice indicator, and ALL tooling around PHP code needs them to distinguish PHP code from "non-PHP" whatever-they-are characters.

On my side, when I first read the "pure" proposal, I was thinking mostly about "pure" as in "has no side-effect".

Which brings another idea: what if include-ing a PHP file actually had zero side-effect, apart its compilation process?

That's where I'm coming with this idea: *definition files:*

> Notes:
> - I will often refer to "built-in" in here, and "built-in" means "built in PHP or one of the enabled extensions", which implies "accessible at compile-time" > - When I say "global scope", I also imply "global namespace scope" for every namespace defined in the file.

 * A PHP "definition" file is a file that has *ZERO global state,
   calls, or mutable statements*.
 * It is declared with a `declare(def=1);` statement at the top of it.
 * It can only contain *declarations*: `(include|require)(_once)?`
   `const`, `function`, `namespace`, `class`, `return`, `interface`,
   `trait`, `use`, `enum`, etc.
 * It can *include/require* a file, as long as this statement is a
   *string literal* (or a built-in constant)
 * Since the file must not contain statements, the global scope of the
   file must not refer to any variable, and must not define variables
   either. Even superglobals.
 * `if`/`else`/`elseif`, `switch` or `match` statements can also be
   allowed, only if they respect the previous points. This way, you can
   still define functions/constants/classes depending on PHP versions
   (see next points about constants). I'm not sure about iterators
   (`for`, `while`, `do...while` or `foreach`), because I see no proper
   use-case, but they can still be allowed if they imply no global call
   statements, though it seems very unlikely anyway.
 * `try/catch/finally` are useless if global scope has no calls, so
   they can be safely forbidden.
 * Statements like `break`, `continue` are not allowed either, because
   they implicitly expect a "parent context".
 * `new` is allowed for built-in classes only, and can only receive
   literals (because it cannot refer to variables or user-based constants.)
 * `exit/die` are also forbidden in the global scope, and should be
   replaced with exceptions.
 * `throw` is allowed, but can only throw exceptions from built-in
   eceptions. User-created exceptions are not allowed.
 * Global scope can never contain the closing tag `?>`, as a safety
   against potential "echo" calls. It /might/ use it as only last
   characters of said file, but IMO it's much easier to handle the "no
   closing tag" case than "possible as last statement of the file". The
   problem is that having a closing tag at the end can mess with IDEs
   that ensure a line feed at the end of every file (file that will
   therefore have an `echo "\n";` statement in it...)
 * The file's global scope can refer to constants defined internally by
   PHP or its extensions. This means every constant that isn't in the
   `user` key when calling `get_defined_constants(true)` in PHP, as
   well as magic constants like `__DIR__`. Only constants that are
   always available at compile-time will be checked. This way, it can't
   accidentaly trigger an `Undefined constant` warning for userland,
   but it can trigger one if a native extension isn't enabled or
   doesn't have said constant, which can also be detected at compile-time.


This concept brings more advantages than previous proposals:

 * Your file is still normal PHP, compatible as usual.
 * Can still be interpreted by all IDEs that support PHP code
 * Doesn't need a different file extension
 * The fact that it's a definition file is explicitly visible at the
   beginning of the file when you open it
 * It can still be included/required by any other file without the file
   itself knowing that it actually includes a "definition file", and is
   therefore fully eligible to be compatible with all current autoload
   setups, including Composer
 * Still allows things that frameworks do for conditional
   function/class declaration (if it relies on the PHP version
   constant, for example). Will not be able to use `version_compare()`
   though, but there are workarounds.
 * Potential compile-time built-in constant optimization (if not
   already done by the compiler, I didn't search for this yet)
 * Everything that is not global/namespace-scope (functions, classes,
   etc.) can still contain whatever code they need, and theoretically
   it can even contain the PHP closing tag, since it's compiled as an
   "ECHO" statement.
 * All potential errors when including/requiring such file will be
   compile-time errors, therefore if the file is "correct", compiling
   it definitely means that it has its place in the opcache for a very
   long time as no runtime can alter its global context.
 * Having no actual call statements in the global/namespaced scope
   ensures no "echo", but overall has absolutely zero runtime impact
   other than compile-time errors, since there cannot be notice/warning
   errors that might also pollute the current buffer. (I might have
   forgotten what else can throw a notice/warning, but feel free to
   correct me if I do).

There are only a tiny amount of drawbacks to this (from what I've thought about so far):

 * All definition files will have to begin with `<?php declare(def=1);`
   (fair enough IMO, since some static analysers are already capable of
   adding "declare(strict_types=1)` automatically...)
 * Potential tiny compile-time performance drop and/or memory
   consumption, because all global statements would have to be checked
   and analysed. And maybe a bit more if constants are also validated.


For end-users, a "definition" file has only one single advantage: it has no runtime impact when being loaded, and only when its defined structures are used. This is a guarantee of trust that can benefit all frameworks and libraries.

But this advantage paves the way to "modules" in a very interesting manner: all modules *must* be definition files in the first place.


===


Second step: *PHP modules*.

> TL;DR: loading a "module" is similar to an "include/require"_once, but at the compiler-level instead of the engine/runtime-level.

The concept of "module" in my mind in PHP is the following:

A *PHP Module* is a normal PHP file that is, at first, a *definition file*, but instead starts with `declare(module=1);`. This declaration automatically implies `declare(def=1);`, and the wrong combo `declare(module=1,def=0);` can throw a compile-time error.

On the Module side:

 * It has access to new keywords: `import` and `export`, as well as
   `import ... as ...` and `export ... as ...`.
 * Module names are useless, since the module is the file itself.
 * A module can `export` whatever is declared in said file: constants,
   functions, classes, enums, interfaces..., as long as it's only a
   declaration and not an actual call.
 * The `export` keyword must implicitly be in the global namespace,
   even if it is written inside another namespace.
   This implies that these variants have to be considered strictly
   equivalent:
   ```
   <?php declare(module=1);
   namespace My\Namespace;
   export class MyClass {};

   <?php declare(module=1);
   namespace My\Namespace {
        export class MyClass {};
   }

   <?php declare(module=1);
   namespace My\Namespace {
        class MyClass {};
   }
   export MyClass;
   ```
   is the exact same as the following:
 * Composer can add a new package type named "php-module" that accepts
   only one single PHP file as input, that file must be a module.
 * A module can import other modules.
 * Conditional, or encapsulated `import` can also be resolved at
   compile-time, and since they can only contain compiler-accessible
   statements (string literals or built-in constants), the behavior
   will be similar to an "include" statement at runtime on a definition
   file anyway.
 * Unused imports can be detected at compile-time and throw a notice
   message
 * (my opinion, so definitely optional) Two exports must never have the
   same name. By this, I mean that you could use `import someFunction,
   SomeClass, SOME_CONSTANT from 'file.php';` freely without having to
   specify *what* you are trying to import. This is of personal taste,
   to enforce users not to use the same names to avoid confusion in
   general. I see no proper use-case to allow constants and classes to
   have the same name, for example, but some people might dislike this.
   On the engine-side it will still properly define the expected
   structures from the module, and on the userland, errors will be
   thrown if said structure is used improperly anyway. To me,
   considering how big this feature is, this is just a way to
   "opinionate for better naming" :) (and it avoids ugly things like
   `import const SOME_CONST as SOME_CONST_ALIAS, class SomeClass as
   SomeClassAlias, function someFunction as someFunctionAlias from
   'file.php';`, right?)

And on the userland-side:

 * The `import` and `import ... as ...` keywords also becomes
   accessible to ANY other PHP file, whether it is a definition, a
   module, or a regular PHP file.
 * Just like in definition files, the `import` keyword can only refer
   to string literals and/or built-in constants. This restriction is
   only applied for the new `import` keyword, and not to the rest of
   the file.
 * The `import` keyword will explicitly make all imported structures
   accessible in the current file, just like if the module was loaded
   with `include`, but only `export`-ed structures are accessible.
 * Imports can be placed anywhere in the file, and will be resolved at
   compile-time, since their only drawback is "adding more structures
   in memory".
 * An import can define an alias that will only be accessible to the
   importer-file, like `import SomeClass as MyAlias from 'file.php';`.


Internally, there are other interesting things that happen:

 * All definitions from inside a module will have be prefixed in the
   symbols table with a hash corresponding to the current module file's
   hash. It can be similar to how anonymous classes are registered
   internally. The goal is to make them inaccessible (as much as
   possible) from the global scope.
 * All calls to module file's internal definitions will use this hash
   prefix to refer to said structures.
 * When using "import", it will do 3 things:
     o Analyse `import`-ed statements, to retrieve only the structures
       that are asked by the end-user
     o Load the file (from file, or from opcache, if not already in memory)
     o Create the modules definitions (if not already in memory) with
       internal hashes as previously described, and tree-shake unused
       structures as of the list of `import`-ed ones. Can be done at
       importer-compile-time too.
   This way, it would behave similarly to `(require|include)_once` but
   at a more granular level: with only the structures that were
   imported by the current file. Any subsequent call to `import` for
   the same file will do the same thing, and since these files have no
   runtime impact and only contain definitions, it should have close to
   no loading impact. Subsequent imports with the same structures will
   load only the ones that are already in memory (because they can be
   referenced with the hash-prefix), and if a "new structure" is found
   that has not been loaded already in the global space, it will create
   it in the global scope at runtime. This makes sure that module files
   with 100 exports will not load /all/ structures in memory when the
   file is `import`ed.
 * Modules have no impact on autoload, since they don't function the
   same way.
 * Since internal functions/constants/etc. are hash-prefixed, they will
   never conflict with other internal structures. This means that a
   module could define the `str_contains` function, if it wanted. And
   it could even reuse the native function by using `use function
   str_contains as base_str_contains`. Would also work for
   object-oriented structures (class, interface, etc.) as well as
   constants too.
 * We can create a *`ReflectionModule`* class, which constructor
   accepts a file name, and throws an exception if the file is not a
   module. This class would expose the list of exported structures from
   said module. Maybe it can also contain the processed hash/prefix and
   the internal structures too, but having these available kinda
   defeats the purpose of having internal structures in the first
   place... But the exported structures would be ReflectionClass,
   ReflectionConstant, etc, with the "module" flag explained in next point:
 * Other Reflection classes will contain an internal "defined in
   module" flag, as well as a nullable string corresponding to the path
   to the module file if the structure is effectively defined in a module.
 * FQCNs will always resolve to the "global public name", and never to
   the internal hash-prefixed name.
 * A new global function can be used: `spl_register_module($prefix,
   $filePath);`. This way, we could definitely imagine a flexible
   prefix to resolve to a module path for `import` statements. It
   allows Composer-package-compatible syntax like `import Request from
   '@symfony/http-foundation/Request.php';` being registered with
   something like `spl_register_module('@symfony/http-foundation',
   __DIR__.'/vendor/symfony/http-foundation/');`
 * The `spl_register_module` function can refer to structures directly,
   if needed: `spl_register_module('@symfony/http-foundation/Request',
   __DIR__.'/vendor/symfony/http-foundation/Request.php');`.
   This allows for no-extension imports like `import Request from
   '@symfony/http-foundation/Request';` (but this is just for the fancy
   looks)
   The function itself will register the input as a prefix or as a
   module path based on whether the specified path is a directory or a
   file, checked at runtime.
 * Multiple paths can be used for the same prefix, as long as they are
   all directories.
 * If a module is registered as a file instead of a directory, there
   can be only one.
 * (yeah, I know, this concept looks like a reinvention of
   include_paths, but hey, it's modules now!)
 * Autoload-like features can be used for projects using Composer:
     o The `composer.json` file can contain new field: "modules" and
       "modules_dev".
     o This field would contain a key=>value list of prefix=>file_path
       items, that Composer will register through the aforementioned
       new spl_register_module() function.
     o Composer will (sorry folks) need a way to make sure two PHP
       packages don't contain the same module prefixes resolving to two
       paths, regardless of them being directories or files. Maybe
       Packagist (sorry again) will need this too. This is important to
       avoid vendor name squatting in modules.
 * These module-autoload rules would not change anything at existing
   autoload, but they would mostly be here to map a PHP package with
   its exposed API.
 * This also makes sure that any PHP package can say "All API exposed
   as a module is covered by BC policy, all the rest is not". Easier
   for maintainers to keep their internal stuff, and a bit easier to
   make the Open/Closed principle available at a package-level instead
   of a class-level.


===


I already worked on the first step to "PHP definition files" and made a PR of it: https://github.com/php/php-src/compare/master...Pierstoval:php-src:defs

With my tiny knowledge of PHP internals, I required the help of Cursor for that, and I added a lot of `.phpt` test files to ensure the basics are covered, built the project & ran all the tests on my LMDE 7 (Debian) machine multiple times with different configs (embed, fpm, debug, etc.), everything works so far. Apart the tests, the PR seems quite light, but it obviously needs thorough review (or rewrite...) before even being converted into an RFC. It just had the advantage of being fully ready and thoroughly tested (hopefully I didn't forget anything) in less than a day...

/> Note: I did NOT use any llm to write this message, it was only used for some bits of code in the above PR, nothing more./

If you have read everything down to this line, thank you very much! It's the fruit of quite some work!


Now to you folks, it's yours to take and talk :)


--
Alex "Pierstoval" Rock
Polydisciplinary professional web development and training

Reply via email to