On Sun, Jun 22, 2025 at 07:34:58PM -0400, Martin Blais wrote: > Today's models are pretty amazing actually. > You can say things like "output the text under the white cat" and that > would likely work. > I'm blown away every moment of the day these days using these.
Yeah, but especially for vision models it seems to me that the quality gap between self-hostable open-weight models and remote proprietary ones is still pretty big. Last time I tried vision models locally for OCR in the context of personal finance, the results weren't great (= not usable yet), but it was ~1 year ago and things move fast. If someone on this list have concrete experiences about self-hostable vision models that work well for this, I'd love to hear about the specifics (which model, system prompt, etc.). Cheers -- Stefano Zacchiroli . [email protected] . https://upsilon.cc/zack _. ^ ._ Full professor of Computer Science o o o \/|V|\/ Télécom Paris, Polytechnic Institute of Paris o o o </> <\> Co-founder & CSO Software Heritage o o o o /\|^|/\ Mastodon: https://mastodon.xyz/@zacchiro '" V "' -- You received this message because you are subscribed to the Google Groups "Beancount" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/beancount/20250623065146.4pi355rnbshr5x4r%40upsilon.cc.
