i've made these changes, however ...

diff --git a/app/config.py b/app/config.py
index 51356a0..4c1f224 100644
--- a/app/config.py
+++ b/app/config.py
@@ -24,6 +24,7 @@ class LLMSettings(BaseModel):
         None,
         description="Maximum input tokens to use across all requests (None for 
unlimited)",
     )
+    hf_tokenizer_id: Optional[str] = Field(None, description="HuggingFace 
model_id to apply the chat template locally.")
     temperature: float = Field(1.0, description="Sampling temperature")
     api_type: str = Field(..., description="AzureOpenai or Openai")
     api_version: str = Field(..., description="Azure Openai version if 
AzureOpenai")
diff --git a/app/llm.py b/app/llm.py
index 18a13af..c15a7e4 100644
--- a/app/llm.py
+++ b/app/llm.py
@@ -71,6 +71,11 @@ class LLM:
             except KeyError:
                 # If the model is not in tiktoken's presets, use cl100k_base 
as default
                 self.tokenizer = tiktoken.get_encoding("cl100k_base")
+            if llm_config.hf_tokenizer_id is not None:
+                import transformers
+                self.hf_tokenizer = 
transformers.AutoTokenizer.from_pretrained(llm_config.hf_tokenizer_id) 
+            else:
+                self.hf_tokenizer = None
 
             if self.api_type == "azure":
                 self.client = AsyncAzureOpenAI(
@@ -252,6 +257,12 @@ class LLM:
                 "model": self.model,
                 "messages": messages,
             }
+            if self.hf_tokenizer is Non:
+                params["messages"] = messages
+                client_api_completions = self.client.chat.completions
+            else:
+                params["prompt"] = 
self.hf_tokenizer.apply_chat_template(messages, tokenize=False)
+                client_api_completions = self.client.completions
 
             if self.model in REASONING_MODELS:
                 params["max_completion_tokens"] = self.max_tokens
@@ -265,7 +276,7 @@ class LLM:
                 # Non-streaming request
                 params["stream"] = False
 
-                response = await self.client.chat.completions.create(**params)
+                response = await client_api_completions.create(**params)
 
                 if not response.choices or not 
response.choices[0].message.content:
                     raise ValueError("Empty or invalid response from LLM")
@@ -279,7 +290,7 @@ class LLM:
             self.update_token_count(input_tokens)
 
             params["stream"] = True
-            response = await self.client.chat.completions.create(**params)
+            response = await client_api_completions.create(**params)

i don't expect them to work. i expect the two api endpoints to be too 
different, maybe a couple other concerns

i'm thinking it makes sense to configure an endpoint and run it, gives one a 
direct look at the issue without having to cross reference things
1622

the working api i've last used is in one of the zinc binaries hardcoded. i'd 
pull it out of there and put it in openmanus

1629 ummmm ummm ummm i'm out of space install all its gpu depencies for local 
evaluation. gotta disable those.

1650
(Pdb) print(params["prompt"])
<|begin▁of▁sentence|>You are OpenManus, an all-capable AI assistant, aimed at 
solving any task presented by the user. You have various tools at your disposal 
that you can call upon to efficiently complete complex requests. Whether it's 
programming, information retrieval, file processing, or web browsing, you can 
handle it all.<|User|>hi<|User|>You can interact with the computer using 
PythonExecute, save important content and information files through FileSaver, 
open browsers with BrowserUseTool, and retrieve information using GoogleSearch.

PythonExecute: Execute Python code to interact with the computer system, data 
processing, automation tasks, etc.

FileSaver: Save files locally, such as txt, py, html, etc.

BrowserUseTool: Open, browse, and use web browsers.If you open a local HTML 
file, you must provide the absolute path to the file.

WebSearch: Perform web information retrieval

Terminate: End the current interaction when the task is complete or when you 
need additional information from the user. Use this tool to signal that you've 
finished addressing the user's request or need clarification before proceeding 
further.

Based on user needs, proactively select the most appropriate tool or 
combination of tools. For complex tasks, you can break down the problem and use 
different tools step by step to solve it. After using each tool, clearly 
explain the execution results and suggest the next steps.

Always maintain a helpful, informative tone throughout the interaction. If you 
encounter any limitations or need more details, clearly communicate this to the 
user before terminating.
<|Assistant|>

....................... 1840 p
well i'm not finding any information on the tool calling formats for deepseek.
their public api has tool calling with a caveat that it doesn't work well, and 
a handful of libraries have patched it on with prompt tuning, but it's not 
direct
my sambanova 405b access is limited despite temporarily free

some interest in figuring out deepseek's tool calling prompt by probing their 
api, maybe not the thing atm

oops! _would need a free tool calling model to make most task agents 
plug-and-play!_

now maybe the quickest way to work aroudn this would be to look at what format 
the system expects, and just tell a model to output in that format

Reply via email to