> > > > > > > > > > [so how did they get deepseek running on this thing? the > > > > > > > > > > page on it > > > > > > > > > > has a link to their OS image ... > > > > > > > > > > > > > > > > > > https://www.eswincomputing.com/en/bocupload/2024/06/19/17187920991529ene8q.pdf > > > > > > > > > indicates that there are GPU drivers for normal python-based > > > > > > > > > frameworks for this CPU ("Pytorch, Tensorflow, PaddlePaddle, > > > > > > > > > ONNX, > > > > > > > > > etc") > > > > > > > > > > > > > > > > i wonder if the lesson is that if you port a mainstream > > > > > > > > language model > > > > > > > > to a small chip and then sell it, everybody will buy it. > > > > > > > > > > > > > > OOPS > > > > > > > > > > > > not today it seems > > > > > > > > > > > > there's still some interest in image generation, but it mostly > > > > > > assumes > > > > > > that there's already a local way to do this. > > > > > > > > > > > > i'm on windows right now, hopefully temporarily. i use wsl2 ubuntu > > > > > > :s :s : s > > > > > > > > > > so maybe network servic--- [becau-- > > > > > > > > what if we used a super tiny model? maybe that's interesting > > > > or music synthesis or something ! > > > > > > pulling away from httptransformer could help sort out a little. it's > > > definitely never been designed for diffusion models > > > > [machine learning in general is designed in opposition to the things > > people like me try to do, the design choices are based around lots of > > infrastructure and minimal algorithmic resear > > maybe let's try a tiny diffusion model or something > > the huggingface diffusers architecture seems more flexible than the > transformers architecture, they [seem to kind of parameterize their > pipelines to load submodels and wire them together, of course it also > looks hardcoded into constructor classes,
ran into a problem needing further exploration regarding torch.unsqueeze(NetTensor) sometimes this was returning a NetTensor for me, other times a tensor(device=meta) which is a bug. placing a breakpoint in __torch_function__ my handler was not getting called when tensor(device=meta) was returned a normal next step for me could be to step into the torch source (which i have) and comprehend where and when __torch_function__ is called and such. if torch.unsqueeze() can't be hooked via the normal api then i could kind of polyfill something in. a simple approach would be to ensure this tensor is fetched in advance or to patch the code using it, but this wouldn't address the problem in other cases