> greetings, grand boss!
>
> slave boss has a red carpet rolled for him
>
> the vivisectees forced to be him are honored
this morning i looked a little bit at httptransformers
i seem to have stopped trying to port deepseekv3 and instead just try to use my
work sometimes with other models
where i have left it off i had some notes on resuming inferences, which could
increase the fun factor
the notes are somewhat obscure, but the upshot is that the approaches i found
of interest mostly simplify to figuring out how to replace the forward()
function of torch modules to perform resuming for a single forward pass of a
hierarchy of modules that has an unknown resume state
# we can make a torch model resumable when execution is interrupted,
# by recording the outputs between layers and replaying them on next run. (see
the check hook in test_nettensors.py)
# this is efficient because only the uppermost completed layers need be stored.
# meanwhile, resuming a generation call in transformers means simple exchanges
like the shorter inputs for the longer recorded ones
# it ends up being conditions and wrappers or such for however the user caused
the model call.
# code could recognize the untokenized string, for example, based on initial
inputs inside a resuming context manager.
# I WORKED ON THE BELOW FOR FUN WHEN IT WAS REALLY HARD
# FEEL FREE TO RESTART / WIPE
class ResumeModule:
# there is 1 resume state which is where the pass is.
# it is updated in every pass after it
# and referenced for data in every pass prior.
def __init__(self, module, path, seconds=60*5):
self.path = path
self.state = {}
for name, mod in module.named_modules():
module.register_forward_hook(self.__hook)
mod.__name = name
#self.state [mod.
def __enter__(self):
def __exit__(self, *params, **kpwarams):
return super().__exit__(*params, **kwparams)
def _forward_store_resume(self, modid, wrapped, *params, **kwparams):
# two cases: we are performing a resume,
# or we are storing resume state.
# the second is more common. the first could have its own hooks.
result = wrapped(*params, **kwparams)
return result
# ideas for replacing modules within torch spec:
# - register_[module_]forward_pre_hook could change the module's behavior,
and then another hook put it back after
# - manual replacement of forward function, might be similar
# - inspection of __call__ source to see default space clearly
# __call__ calls _call_impl which uses local variables like forward_call.
# _call_impl(self, *args, **kwargs)