Hi Ihor,

> - We then went a bit into tangent - gptel (LLM UI for Emacs)
>   - My usage of whisper.cpp is combined with gptel
>     - I have whisper output directly fed into gptel to clean up inaccuracies
>     - I use @voice preset for this:
>       
> https://github.com/yantar92/emacs-config/blob/master/config.org#voice-input
>     - I recently got some errors with that preset, but karthink (the author 
> or gptel)
>       said that nothing changed in the APIs I am using to implement the 
> preset.
>       [2025-12-22 Mon] I narrowed this down to tools being enabled. Not sure 
> what
>       happened with tool calls, but disabling those fixed the problem.

This is off-topic for Org, but I see the problem with your gptel
callback now and wanted to respond in context.

This is from your configuration:

(gptel-request prompt
      :system
      "You will receive an output of whisper.cpp speech recognition.
  The output might contain typical speech iregularities and irrelevant words.
  More importantly, the recognized text may contain some incorrectly recognized 
words,
  especially for abbreviations or special terminology.
  Cleanup the text, replacing unrecognized words with words that would be more 
suitable
  according to the current chat context and the overall meaning of the output 
text.
  Also, remove repetitions and spacer words.   DO NOT remove words like 
@keyword.
  Output the modified cleaned up text without answering it."
      :callback
      (lambda (response inner-info)
        (cond
         ((stringp response)
          (with-current-buffer buffer
            (goto-char (point-max))
            (text-property-search-backward 'gptel 'response)
            (delete-region (point) (point-max))
            (insert "The prompt recognized from speech will be given below...")
            (insert response)))
         ;; Do nothing otherwise
         (t nil))
        (funcall callback)))

You are calling the (outer) callback when the (inner) :callback is
called, irrespective of whether the response is a string.  The (inner)
:callback is called by gptel-request for many reasons, including
confirming tool calls, returning tool results, and providing "reasoning"
text.  What you want instead is to call the (outer) callback only when
the response is a string:

      :callback
      (lambda (response inner-info)
        (cond
         ((stringp response)
          (with-current-buffer buffer
            (goto-char (point-max))
            (text-property-search-backward 'gptel 'response)
            (delete-region (point) (point-max))
            (insert "The prompt recognized from speech will be given below...")
            (insert response))
          (funcall callback))
         ;; Do nothing otherwise
         (t nil))))

It's not a full fix, because if for some reason the model calls a tool
that requires confirmation, the state machine will halt at that point
and the outer callback will never be called.  So disabling tools is
required as well, but this should make it more robust.

I'm trying to figure out if I can provide some syntax to declaratively
thread gptel-request calls (in any acyclic graph), which should make
this kind of nested use (@voice) very simple to write.  I looked at your
example as a test and found this bug.

Karthik

Reply via email to