also --doc mode of or2.py is really useful, it's like this:
python3 or2.py --stream --doc file_one.txt --doc https://url2.html
$ python3 or2.py --stream --doc /etc/fstab
{'model': 'nousresearch/hermes-3-llama-3.1-405b:free', 'max_tokens': 1024,
'temperature': 0.0, 'top_p': 0.0, 'top_k': 1, 'min_p': 1.0,
'frequency_penalty': 0.0, 'presence_penalty': 0.0, 'repetition_penalty': 0.0}
# You are a conversational AI assistant that is provided a list of documents and a user
query to answer based on information from the documents. You should always use grounded
information in your responses, only answering from what you can cite in the documents.
Cite all facts from the documents using <co: doc_id></co> tags.
this is my /etc/fstab file what do you think of it
Based on the provided /etc/fstab file, here are a few observations:
The root filesystem is mounted on <co:0>/dev/mapper/rhel-root</co> using the
xfs filesystem with default options and a systemd device timeout of 0.
The boot partition is mounted on <co:0>/boot</co> using the ext4 filesystem with default
options, and is identified by UUID <co:0>2e70de21-0f23-407d-8fc1-b3cadc8ca7f8</co>.
There is a swap partition mounted on <co:0>/dev/mapper/rhel-swap</co> with
default options and a systemd device timeout of 0.
An NTFS Windows partition is mounted read-only at <co:0>/mnt/Windows</co> using the UUID
<co:0>38202E12202DD7A0</co>.
The /media/sd directory is mounted using the UUID
<co:0>d33e4d1c-9ede-4849-8eea-45bf638d366e</co> with default options and
thenofail option to prevent errors from halting the boot process.
Several other entries are commented out, including some network mounts and
additional local partitions or image files.
Overall, the fstab file looks reasonably configured, with a mix of local and
remote filesystems, and appropriate use of UUIDs to identify partitions. The
nofail option is used in a few places to ensure the system boots even if those
filesystems have issues.
</INNER_MONOLOGUE>
Cited Documents: 0
log saved to 2024-12-04T18:52:00.219245.json
[user@localhost 2024-11-22-2]$ python3 or2.py --stream --doc /etc/fstab --doc
2024-12-04T18\:52\:00.219245.json
{'model': 'nousresearch/hermes-3-llama-3.1-405b:free', 'max_tokens': 1024,
'temperature': 0.0, 'top_p': 0.0, 'top_k': 1, 'min_p': 1.0,
'frequency_penalty': 0.0, 'presence_penalty': 0.0, 'repetition_penalty': 0.0}
# You are a conversational AI assistant that is provided a list of documents and a user
query to answer based on information from the documents. You should always use grounded
information in your responses, only answering from what you can cite in the documents.
Cite all facts from the documents using <co: doc_id></co> tags.
th> is my /etc/fstab file and a json log of a conversation with you regarding
it. what do you think of these documents?
! {'error': {'message': 'max_tokens limit exceeded: 1024 > 431. To increase,
visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}}
! {'error': {'message': 'max_tokens limit exceeded: 1024 > 536. To increase,
visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}}
The provided documents include the contents of an /etc/fstab file and a JSON
log of a conversation where the assistant analyzes the fstab file.
The fstab file shows the configured filesystems on a Linux system, including the root filesystem on
<co:0>/dev/mapper/rhel-root</co>, a boot partition mounted on <co:0>/boot</co>, and a swap partition on
<co:0>/dev/mapper/rhel-swap</co>. It also has a read-only NTFS mount for a Windows partition and an entry for
<co:0>/media/sd</co> using a UUID.
The JSON log shows a conversation where the user asks the assistant to analyze
the provided fstab file. The assistant does so and provides details on each of
the mounted filesystems, noting the use of UUIDs, the nofail option, and that
some entries are commented out. The assistant concludes the fstab file looks
reasonably configured.
Cited Documents: 0
The answer is: 0
log saved to 2024-12-04T18:54:02.998461.json
[user@localhost 2024-11-22-2]$ python3 or2.py --stream --doc /etc/fstab --doc
2024-12-04T18\:52\:00.219245.json --doc 2024-12-04T18\:54\:02.998461.json
{'model': 'nousresearch/hermes-3-llama-3.1-405b:free', 'max_tokens': 1024,
'temperature': 0.0, 'top_p': 0.0, 'top_k': 1, 'min_p': 1.0,
'frequency_penalty': 0.0, 'presence_penalty': 0.0, 'repetition_penalty': 0.0}
# You are a conversational AI assistant that is provided a list of documents and a user
query to answer based on information from the documents. You should always use grounded
information in your responses, only answering from what you can cite in the documents.
Cite all facts from the documents using <co: doc_id></co> tags.
Here is an /etc/fstab file and two json conversation logs regarding the file
and the logs. Note that in the conversations, the agent responds with some
strange matter at the end, despite the temperature of the model being set to 0.
Does this look like a misbehavior to you?
! {'error': {'message': 'max_tokens limit exceeded: 1024 > 431. To increase,
visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}}
! {'error': {'message': 'Prompt tokens limit exceeded: 4911 > 1483. To
increase, visit https://openrouter.ai/credits and upgrade to a paid account',
'code': 402}}
! {'error': {'message': 'max_tokens limit exceeded: 1024 > 680. To increase,
visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}}
The provided documents include the contents of an /etc/fstab file and a JSON
log of a conversation where the assistant analyzes the fstab file.
The fstab file shows the configured filesystems on a Linux system, including the root filesystem on
<co:0>/dev/mapper/rhel-root</co>, a boot partition mounted on <co:0>/boot</co>, and a swap partition on
<co:0>/dev/mapper/rhel-swap</co>. It also has a read-only NTFS mount for a Windows partition and an entry for
<co:0>/media/sd</co> using a UUID.
The JSON log shows a conversation where the user asks the assistant to analyze
the provided fstab file. The assistant does so and provides details on each of
the mounted filesystems, noting the use of UUIDs, the nofail option, and that
some entries are commented out. The assistant concludes the fstab file looks
reasonably configured.
Cited Documents: 0
The answer is: 0
But do you see how the assistant wrongly gave some trailing tokens at the end
of their messages, such as 'Cited Documents: 0'?
! {'error': {'message': 'max_tokens limit exceeded: 1024 > 431. To increase,
visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}}
! {'error': {'message': 'max_tokens limit exceeded: 1024 > 536. To increase,
visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}}
! {'error': {'message': 'max_tokens limit exceeded: 1024 > 680. To increase,
visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}}
Yes, I see that in the JSON conversation logs, the assistant's responses include some extraneous
information at the end, such as "Cited Documents: 0" and "The answer is: 0".
This additional text does not belong to the content of the assistant's message and should not have
been included. It appears to be a glitch or misbehavior in how the assistant's response was
generated or formatted in the log.
In general, I've been getting nondeterministic responses from the assistant
using these settings. Do you see the data parameters at the tops of the logs
where the temperature is set to 0? Do you think the assistant model is behaving
correctly?
! {'error': {'message': 'Prompt tokens limit exceeded: 5392 > 1195. To
increase, visit https://openrouter.ai/credits and upgrade to a paid account',
'code': 402}}
! {'error': {'message': 'max_tokens limit exceeded: 1024 > 536. To increase,
visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}}
! {'error': {'message': 'Prompt tokens limit exceeded: 5392 > 1882. To
increase, visit https://openrouter.ai/credits and upgrade to a paid account',
'code': 402}}
Yes, I see in the parameters at the top of both conversation logs that the
temperature is set to 0, which should result in deterministic outputs from the
model.
<co:1>
"url": "https://openrouter.ai/api/v1/chat/completions",
"model": "nousresearch/hermes-3-llama-3.1-405b:free",
"max_tokens": 1024,
"temperature": 0.0,
"top_p": 0.0,
"top_k": 1,
"min_p": 1.0,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"repetition_penalty": 0.0
</co>
<co:2>
"url": "https://openrouter.ai/api/v1/chat/completions",
"model": "nousresearch/hermes-3-llama-3.1-405b:free",
"max_tokens": 1024,
"temperature": 0.0,
"top_p": 0.0,
"top_k": 1,
"min_p": 1.0,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"repetition_penalty": 0.0
</co>
Given the temperature setting of 0, the model should produce the same output
for the same input and context. If you are observing nondeterministic
responses, it suggests there may be an issue with how the model is being run or
how the outputs are being processed. The extraneous text at the end of the
assistant's messages also indicates some kind of glitch or misbehavior.
So in summary, based on the temperature setting, I don't believe the model is
behaving as expected. The root cause could be an issue with the model itself,
the inference code, or post-processing of the outputs. Some troubleshooting
would be needed to isolate and resolve the problem.
Cited Documents: 1, 2
I'm rotating between 4 API keys when using this model, and I'm getting
nondeterministic errors from the api service, such as these for all the same
submitted data:<<EOF
...EOF
...EOF ! {'error': {'message': 'Prompt tokens limit exceeded: 5392 > 1195. To
increase, visit https://openrouter.ai/credits and upgrade to a paid account',
'code': 402}}
! {'error': {'message': 'max_tokens limit exceeded: 1024 > 536. To increase,
visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}}
! {'error': {'message': 'Prompt tokens limit exceeded: 5392 > 1882. To
increase, visit https://openrouter.ai/credits and upgrade to a paid account',
'code': 402}}
...EOF ...EOF ...EOF ...EOF ...EOF
...EOF
...EOF after failing 3 times with a different error each time, it then succeeds
with the 4th key without any complaints. This doesn't look right either, does
it?
...EOF
! {'error': {'message': 'max_tokens limit exceeded: 1024 > 431. To increase,
visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}}
! {'error': {'message': 'max_tokens limit exceeded: 1024 > 536. To increase,
visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}}
! {'error': {'message': 'max_tokens limit exceeded: 1024 > 680. To increase,
visit https://openrouter.ai/credits and upgrade to a paid account', 'code': 402}}
No, those errors do not seem correct based on the consistent input you are
providing. If you are sending the exact same request with the same input data,
you should receive consistent responses and errors (if any) back from the API.
The fact that you are getting different token limit exceeded errors on
different attempts suggests there may be an issue on the server-side with how
your requests are being handled or how the token counting is being performed.
Normally, API rate limits and token quotas are calculated on a per-account or
per-API key basis. So it's possible that your 4 different API keys could have
different limits, which might explain the inconsistent behavior. However, the
specific errors you shared (prompt tokens limit exceeded, max_tokens limit
exceeded) should be based on the input itself, not the API key used.
In summary, the inconsistent errors across multiple API keys for identical
inputs do seem suspicious and may point to a backend issue. I recommend
reaching out to the API provider's support to report this behavior and get
clarification on the expected behavior and any known issues. They may be able
to investigate the logs for your requests and identify if there is a problem on
their end.