On Wed, 28 Jan 2009, Francesco Saverio Giudice wrote:
>> I think the .hrb modules aren't loaded on a per thread basis, so it's not 
>> like several of them could be loaded in parallel.
> This should be Przemek to clear. What happens in this case ? Are they MT  
> safe ? In uhttpd they *can be loaded* in parallel in different threads.
> One thing I may do is to serialize them to a single-specific thread that  
> can run hrb modules one after another and store them to avoid next file 
> load. Is this the way ?

We have common public function list so when one thread loads .hrb module
then all public functions in this module become visible and accessible to
all other threads. When module is unloaded then functions are removed.
It's not safe to unload .hrb module when other threads or even the same
thread is executing code from this module so programmer should avoid
such situations. This potential problem can be exploited even in single
thread program so it's not directly related to MT mode.
When .hrb module is loaded HVM allocates symbol table for this module.
This symbol table is never released even if module is unloaded. In such
case the symbol table is only marked as unused and will be reused instead
of allocating new one if the same module will be loaded again or other
module using exactly the same symbols.
It's possible to implement full module unloading but now it will be hard
to introduce it due to backward compatibility. Just simply at beginning
Harbour was not designed for such situation and addresses of function
symbols were used as constant values by different (mostly 3-rd party) code.
If we implement module unloading then such code will have to be updated.
In fact only dynamic symbol table addresses (PHB_DYNS) are really constant
and it's guarantied that they will never changed during HVM execution.
So I decided to implement only symbol table recycling but I'm open to
change it in the future. Anyhow current behavior has also few interesting
features.
When .hrb module is loaded and it uses static variables then array with
static vars then during module loading area for new statics is allocated
and attached to symbol table. This area is also never freed and is reused
with symbol table together when module is loaded again. Static variables
are initialized only once when module is loaded 1-st time. Sometimes it
can be nice and really helpful behavior f.e. module can store some data
in static variables and then access it when it's loaded again or can pass
reference to static variable to other code which will update it when
module is unloaded and then after next load retrieve result from it.
But sometimes people may expect full reinitialization. Now it's not
possible. If we want to have both functionality then we will have to
introduce some flags to HB_HRBLOAD() to control it. Similar to flags
passed to hb_threadStart() which controls memvar inheritance, f.e.:
   HB_REINIT_STATIC
But please remember that reusing module symbol table and static variables
without reinitialization has yet another very important functionality.
If .hrb modules defines new classes then when modules is unloaded and
loaded again the same class definition is also reused. Otherwise new
class will be allocated because we keep class references in static
variables inside class function.

The next thing which can be controlled by such flags is action for
duplicated public function symbols. Now when .hrb module is loaded
and in contains public function which already exists in HVM then this
function is not registered and all references to this function in .hrb
module are replaced by the function which already exists in HVM so
simply such function is not visible or accessible at all. Here we can
introduce the following actions:
   HB_KEEP_LOCAL_REFERENCES
It means that local references to public function will not be overloaded
by the functions with the same name already registered in HVM. So other
code and macro compiler will still access public HVM function but code
executed from .hrb module will access the function defined in this module.
   HB_OVERLOAD_FUNCTIONS
When public function name conflict appears then function defined in .hrb
module will overload the one existed before. This is a little bit danger
functionality which should not be enabled in the code like HTTP server.
In practice it will be usable for users who want to upgrade/patch their
programs using .hrb modules. It's important and interesting functionality
but rather limited to local usage. It also produce other problems.
F.e. how to restore functions overloaded by few modules when they are
unloaded in different order. I do not know if I want to deeply fight with
such problem. If I implement such functionality then probably it will be
very basic version and after unloading previous functions will not be
restored.
If module is unloaded and some other code will try to execute function
or method which was in this module f.e. using function reference then
it simply receives RT error that function or method does not exists.
When module is loaded again and symbol table reused all references becomes
automatically valid.

Now let's return to MT mode.
All of the above operations are protected by HVM and does not need any
external protection. There are only two exceptions:
1-st I was pointing above - never unload code which is currently executed
also in ST programs. This have to be resolved by user.
2-nd is much more complicated because it cannot be easy resolved by user.
It's caused by modules which use static variables. When such module is
loaded 1-st time then it's necessary to allocate area for new statics.
We keep all static variables in single array. It's one continuous memory
block and reallocation may change it address. It means that other thread
may access static variable in the same moment when this block change its
address and this is not thread safe. It may cause GPF. We have to resolve
it in HVM but here simple mutex does not help at all. We have only two
choices:
   1. stop all threads in known place to be sure that they do not access
      any static variables. It's not 100% safe solution because some C
      code may keep static variable address and we will not know about it.
      It can be exploited even by very simple code like:
         HB_FUNC( FOO ) { hb_itemReturn( hb_param( 1, HB_IT_ANY ) ); }
      It's enough that user will call FOO( @s_var ) and inside return item
      we will have object variable with destructor. This destructor
      execution will be stopped by our interrupt point and other thread
      will reallocate static table so address returned by
      hb_param( 1, HB_IT_ANY ) will not be longer valid during hb_itemMove()
      execution inside hb_itemReturn().
      I know that the risk is small but it exists and can happen.
   2. Redesign data structures used to keep static variables to eliminate
      reallocation when new statics are added. This is still on my TODO
      list. I was sending this list in the past here. AFAIR this is the
      last MT problem not addressed by me yet.

So what should you make now?
I suggest to ignore this problem until I'll not change structures used
to hold static variables. The chance for GPF is rather small. The side
effect of array preallocation added by Mindaugas is noticeable reduced
number of memory reallocation when new statics are added so the risk is
really small that the problem will be exploited. Of course it _MUST_ be
fixed but it should not be critical during uHTTPd developing.
Now concentrate on the above information. I hope it helps to understand
how .hrb modules works.

BTW all of the above is also valid with exactly the same conditions for
compiled .prg code loaded/unloaded from dynamic libraries (.dll, .so,
.sl, .dyn, ...).

best regards,
Przemek
_______________________________________________
Harbour mailing list
Harbour@harbour-project.org
http://lists.harbour-project.org/mailman/listinfo/harbour

Reply via email to