Thank you all for your helpful replies. It seems pretty clear that the recommended Notmuch usage
for someone who wants to incorporate a script that classifies a batch of messages, is either to
write the whole script to use Notmuch from the beginning, and to have the messages specified as a
list of "id:" IDs or even a general Notmuch query; or, if you are using an existing
script that accepts a list of files, to try to extract the message ID from each file at the end so
that the new tags can be communicated from the script back to Notmuch. Both options seem a little
hacky - especially since it is rather common to receive multiple distinct messages with the same
ID, for example when someone replies to a mailing list post and Cc's me, and I would want these to
be separately viewable (they are linked together ith an "=" in the Mutt thread view) for
security reasons. If Notmuch is meant to function as an abstraction layer over message files stored
on the file system, then why doesn't it p
rovide a standard way to go from file paths to Notmuch messages?
As for why it would be a security issue to ignore new messages with duplicate
message IDs, consider that one can apparently play the following game on a
Notmuch user. (1) Send a private email that the user will never see because it
contains spam keywords. (2) Send a public email to a mailing list with the same
ID. The Notmuch user will not see the second email, and everyone will think he
is unable to reply to the allegations it presumably contains, and that he is
therefore guilty and should be arrested.
My recommendation would be to split the Notmuch project into three teams: one to work on
the source code, another on the documentation, and a third on test cases. There should be
separate Git repositories for each team, so that I can for example run current test cases
against a fork of the source repo, or use recent manual pages with an older version of
the source. This way, the documentation team will be able to document deficiencies in
various source releases as well as standardizing proposed new features or syntax. Or,
someone would submit a pull request to the source team, that would then be discussed on
the mailing list or in the issue tracker, and someone on another team would then use that
discussion to write documentation or test cases before the PR is accepted. The teams
would have a "checks and balances" relationship, like with the three branches
of government. (I think that all software projects should be run this way, so please
don't be offended.)
I wrote some Perl scripts a long time ago, which work together to tag mail and
put links to each message in a tag-specific directory for each of its tags. The
script would add headers to the message, however, and it rewrote the Message-ID
if it wasn't unique. It did not create a full-text index like Notmuch does. It
did seem fairly reliable. I am trying to adapt it to send the tags to Notmuch.
I am having to use Notmuch because of a third piece of software that depends on
it. It is somewhat perplexing to me that no one else has had my use case before.
Best wishes,
Frederick
On Sat, Sep 21, 2024 at 11:38:18AM +0200, Michael J Gruber wrote:
Am Sa., 21. Sept. 2024 um 05:23 Uhr schrieb Frederick Eaton <frede...@ofb.net>:
Thank you for your response, Pengji.
On Sat, Sep 21, 2024 at 08:25:10AM +0800, Pengji Zhang wrote:
>Hi Frederick,
>
>Frederick Eaton <frede...@ofb.net> writes:
>
>>I am trying to figure out how to adapt a script I wrote for
>>filtering messages, to apply notmuch tags to each message. A
>>difficulty is that the messages are already in the Notmuch database,
>>because another tool has delivered them to a maildir and run
>>"notmuch new".
>>
>>Now, Notmuch can provide me with the paths of all the new
>>(unfiltered) messages, which I can give to my script. The question I
>>have is, once the filter is done, how can the script tell Notmuch
>>which message to apply the tags to?
>
>
>I am not sure if I understand you correctly. If the problem here is to
>distinguish existing messages and new messages, would the config
>option 'new.tags' work? For example, use
>
> notmuch config set new.tags new
>
>to give all new messages a 'new' tag.
No, I already have that configuration. The first sentence described what I
already know how to do, the second sentence is what I'm trying to do.
It seems that we're still guess-working-out what your script is
doing/trying to do. Do you mind sharing a trimmed down version?
It might be useful for the reasons I stated, namely in case the Message-ID does
not exist or is not unique.
This is probably at the heart of the problem. Within notmuch, a
"message" is something identified by a message-id (mid), and all
information in the notmuch database is tied to a mid.
When you speak about a message, you probably mean the content of an
individual "message file" - which is a natural, but different notion.
A "path:" refers to a message file, a "mid:" to message id.
When "notmuch new" encounters a new message files, it
- checks if it contains a valid "Message-ID" header
- used that as mid or generates a mid using a sha1 checksum of the message file
- checks whether that mid (!) is in the database already
- adds the path to the existing db entry, or creates a new db entry
So, you may have several files (path entries) for the same mid, and
which one is used for indexing purposes depends on the order of
arrival (or, in the case of reindexing, probably on file system
ordering). notmuch assumes that this makes no difference - same mid
same "message". This assumption can break, for example for list
copies, different headers on sent versus received etc.
I"m elaborating on this because we have to guess about your script -
what is a "new message" for your script, and which kind of information
does it want to process?
Typical processing would be done in a notmuch post-hook, and it would:
- check for new messages (tag:new)
- get their file paths form `notmuch search --output=files mid:XYZ` or such
- do whatever it needs using the file if you really need to parse that yourself
I guess most of us have some sort of script running on new messages as
part of a hook, be it `afew` or something homegrown, and this
typically clears the new tag afterwards.
Michael
On Tue, Sep 24, 2024 at 11:09:26AM +0200, Michael J Gruber wrote:
Am Sa., 21. Sept. 2024 um 18:24 Uhr schrieb Panayotis Manganaris
<panos.mangana...@gmail.com>:
...
notmuch search --output=messages 'tag:new' > /tmp/msgs
notmuch search --output=files 'tag:new' |\
bogofilter -o0.7,0.7 -bt |\
paste - /tmp/msgs |\
awk '$1 ~ /S/ { print "-new +spam", "-", $3 }' |\
notmuch tag --batch
...
This script operates on the assumption that the order of results from notmuch
queries are
always the same, which is fortunately true.
It also operates under the assumption that you receive no duplicate
messages with the same message-id (such as list copies,
sent/reveived), or else `paste` will have a hard time matching lines.
Note that you can loop over the msgs, treat them individually, and
still collect input for `notmuch tag --batch`, which solves both the
problem with duplicate messages and potential ordering instability
while keeping batch efficiency.
Your instinct to use batch tagging and id: queries is correct. I collect my new
message ids in
/tmp/msgs. These ids are unique, they are definitely unique enough to be used
to tag individual
messages on a daily basis.
I'm sorry, but either they're unique or not. What's unique enough? I'm
pestering on this because part of the OP's problem is being clear
about the notion of message, which is uniquely identified by a message
id in the notmuch db. I tried to clear that up in my previous answer
in this thread.
> It might be useful for the reasons I stated, namely in case the Message-ID
does not exist or
> is not unique.
I think mail that is successfully transmitted through a mail host necessarily
obtains a message
id, but I might be wrong. I believe notmuch indexes on both it's own unique
thread ids and the
message ids. Thereby further decreasing the already minuscule chance of message
id collisions.
No. Messages can arrive without mid. In that case, notmuch creates one
(without altering the message file) and uses it for indexing.
"Thread-id" is something completely different from message-ids. They
do not identify a message uniquely (but a thread of messages "joint"
by references), albeit indirectly (such as "root message of the
thread", assuming one root).
Cheers
Michael
_______________________________________________
notmuch mailing list -- notmuch@notmuchmail.org
To unsubscribe send an email to notmuch-le...@notmuchmail.org