* Zororg via "General discussions about Org-mode. <emacs-orgmode@gnu.org> 
[2025-03-14 08:17]:
> NLP and LLM models might be truly beneficial.
> Although scary, its a good idea for personal knowledge management.

Is it scary? Interesting to see the reaction. Though I don't know why it would 
be scary.

I can understand it as scary when arobot get guns or otherwise get
programmed to make harm, including, when it is unintentionally not
well programmed. Current state of Large Language Model (LLM) is that
they spit out the bullshit, generating bullshit, so one cannot trust
it to be precise in every decision making.

ChatGPT is bullshit | Ethics and Information Technology:
https://link.springer.com/article/10.1007/s10676-024-09775-5

Let's say for notes:

I am constantly using speech recognition and transcription, including
in this sentence now. And I can't live anymore without speech
recognition on the computer. Why should I write if I can speak? And
what is really great, I can press a button, like let's say a function
key or press a mouse on a specific icon, and then everything I spoke
can be beta written, corrected, and even translated to any language.

Now imagine writing something, you know, we make errors and we make mistakes. 
So writing text can easily be challenging, no matter if you are writing in your 
native language, many people use multiple languages. Why not use the large 
language model to correct the notes I am writing? Another issue is why not use 
it to search for notes. There are too many applications in life. We are 
actually missing that feature of computer. That is how computer is meant to be. 
 be helping Human to minimise his efforts.

> > And all that in natural language manner.
> 
> That is indeed very much sophisticated. I remember using google
> assisstant back then for reminding and noting which helped me alot.

I don't see it as particularly sophisticated; rather, it feels like a
necessity—everything in our future is bound to be integrated very
soon. Why would you type if you can talk? Or delve into technical
details when working at a higher level suffices?

Computers are meant for conversation with users—as envisioned back in
the year 2000 Odyssey movie project—and now we have personalized,
fully free software that extends even to note-taking.

Take my Emcas dired mode: why wouldn't I use speech transcription if
it's available?

https://gnu.support/images/2025/03/2025-03-17/2025-03-17-22:16:43.mp4

It is great to have:

- phone call recording feature in the phone
- files with those recordings
- possibility to save or export chat recordings
- then to summarize or provide full transcripts by using Large Language Models 
(LLM)

Great for searching, great for understanding and decision making.

With hundreds of audio files—recordings and phone calls—I can inject
them into an Org-like system for automatic transcription. This applies
not only there but everywhere; you index a list of audio files,
transcribe automatically, see the results instantly—and then search
through your audio effortlessly.

This isn't merely technical—it's fundamentally useful! We users don’t
want to grapple with tech problems—we seek ease and life improvement
from our computers. Isn't that clear?

The more we debate Org mode intricacies—the less progress is
made—because it’s too laden with details for what most need:
simplicity, not complexity.

There might be pleasure in mastering every detail like a piano
player—but majority of people don’t want to play the instrument; they
just enjoy listening to music. Why shouldn't interacting and
retrieving information from computers be as straightforward?

If I wish my computer could talk back, it should do so effortlessly
when prompted with simple commands—“please perform this task.”

The concept of digital assistants is fantastic on Android phones but
absent in our personal computing experience—a disaster! Whatever we
write or speak can already serve as a note. Imagine having such an
integrated system where everything spoken becomes recorded.

Is every piece critical? Perhaps not—but one day, it might become
invaluable. With computer space so affordable now (I’m using about 1.6
terabytes out of two), there’s no real cost in keeping these records
safe for future use.

Today I had an issue with solving a problem: three people were sent to
Tanzania for this project and they turned out to be almost illiterate
in terms of basic organizational skills—regardless of their university
education, they struggled fundamentally. To make matters worse, there
was a heap of information that lacked any semblance or structure;
Org-organization is precisely what it's missing! Org means having
everything organized.

There were a number of photographs that made it difficult to determine
their specific location within different areas or sites across
Tanzania.

I've organized images within my Dynamic Knowledge Repository and
marked those needing attention for further processing. While it's
unclear at this point whether there is a way to perform similar
actions directly in Emacs org-mode, exporting pertinent details as
hyperlinks or notes could serve well enough by allowing me access
through Org files. This method ensures that I can seamlessly retrieve
the necessary links when required within my workflow environment.

After sorting the photos, I promptly labeled everything and swiftly
created a temporary folder containing symbolic links for each picture.

When you open this directory in JOSM software, it displays images
associated with their specific geographical locations using
coordinates like latitude and longitude. This correlation of photos
simplifies the process of generating reports.

Important to note is that whole process takes just few seconds for
hundreds of images, which are in different directories.

By enabling people to interact with computers through voice recordings
or audio notes, I can generate coherent and professional reports by
transcribing these audios into text. While individuals may make
numerous errors while speaking—such as hesitations, repetitions, or
grammatical mistakes—the Large Language Model (LLM), specifically
Microsoft Phi-4 in this instance, excels at correcting the resulting
transcripts to ensure clarity and accuracy.

References:

microsoft/phi-4 · Hugging Face:
https://huggingface.co/microsoft/phi-4

About Dynamic Knowledge Repositories (DKR):
https://www.dougengelbart.org/content/view/190/163/

I was talking during this email:

LLM-Helpers/rcd-llm-speech-single-input.sh at main - LLM-Helpers - Gitea: Git 
with a cup of tea
https://gitea.com/gnusupport/LLM-Helpers/src/branch/main/bin/rcd-llm-speech-single-input.sh

Demo:
https://www.youtube.com/watch?v=84iS3atFQdI

Now imagine giving to computer commands. Emacs listening, and doing
what you want. And teaching him in the moment of doing, without
delays.

How about telling Emacs in Org mode:

- Insert a primary header 
- Start with top-level headers
- make major heading
- title the main topic

and then Emacs doing:

*

and asking you: What would be the title?

And you say: make it "About notes" and you get:

* About notes

Then you start talking and insert the transcription. Then you can correct the 
text automatically.

- make second level heading

You talk and then it give it to you.

I am not far from that point. Today I have implemented whisper-cli as
it is full free model to transcribe text from audio speech.

Interacting with computer with less effort is of course always more desirable.

And I am surprised how perfect it works.

As you can see, I can just talk and it gets transcribed straight into the Emacs.
https://gnu.support/images/2025/03/2025-03-17/2025-03-17-22:41:29.mp4

-- 
Jean Louis

Reply via email to