* Zororg via "General discussions about Org-mode. <emacs-orgmode@gnu.org> [2025-03-14 08:17]: > NLP and LLM models might be truly beneficial. > Although scary, its a good idea for personal knowledge management.
Is it scary? Interesting to see the reaction. Though I don't know why it would be scary. I can understand it as scary when arobot get guns or otherwise get programmed to make harm, including, when it is unintentionally not well programmed. Current state of Large Language Model (LLM) is that they spit out the bullshit, generating bullshit, so one cannot trust it to be precise in every decision making. ChatGPT is bullshit | Ethics and Information Technology: https://link.springer.com/article/10.1007/s10676-024-09775-5 Let's say for notes: I am constantly using speech recognition and transcription, including in this sentence now. And I can't live anymore without speech recognition on the computer. Why should I write if I can speak? And what is really great, I can press a button, like let's say a function key or press a mouse on a specific icon, and then everything I spoke can be beta written, corrected, and even translated to any language. Now imagine writing something, you know, we make errors and we make mistakes. So writing text can easily be challenging, no matter if you are writing in your native language, many people use multiple languages. Why not use the large language model to correct the notes I am writing? Another issue is why not use it to search for notes. There are too many applications in life. We are actually missing that feature of computer. That is how computer is meant to be. be helping Human to minimise his efforts. > > And all that in natural language manner. > > That is indeed very much sophisticated. I remember using google > assisstant back then for reminding and noting which helped me alot. I don't see it as particularly sophisticated; rather, it feels like a necessity—everything in our future is bound to be integrated very soon. Why would you type if you can talk? Or delve into technical details when working at a higher level suffices? Computers are meant for conversation with users—as envisioned back in the year 2000 Odyssey movie project—and now we have personalized, fully free software that extends even to note-taking. Take my Emcas dired mode: why wouldn't I use speech transcription if it's available? https://gnu.support/images/2025/03/2025-03-17/2025-03-17-22:16:43.mp4 It is great to have: - phone call recording feature in the phone - files with those recordings - possibility to save or export chat recordings - then to summarize or provide full transcripts by using Large Language Models (LLM) Great for searching, great for understanding and decision making. With hundreds of audio files—recordings and phone calls—I can inject them into an Org-like system for automatic transcription. This applies not only there but everywhere; you index a list of audio files, transcribe automatically, see the results instantly—and then search through your audio effortlessly. This isn't merely technical—it's fundamentally useful! We users don’t want to grapple with tech problems—we seek ease and life improvement from our computers. Isn't that clear? The more we debate Org mode intricacies—the less progress is made—because it’s too laden with details for what most need: simplicity, not complexity. There might be pleasure in mastering every detail like a piano player—but majority of people don’t want to play the instrument; they just enjoy listening to music. Why shouldn't interacting and retrieving information from computers be as straightforward? If I wish my computer could talk back, it should do so effortlessly when prompted with simple commands—“please perform this task.” The concept of digital assistants is fantastic on Android phones but absent in our personal computing experience—a disaster! Whatever we write or speak can already serve as a note. Imagine having such an integrated system where everything spoken becomes recorded. Is every piece critical? Perhaps not—but one day, it might become invaluable. With computer space so affordable now (I’m using about 1.6 terabytes out of two), there’s no real cost in keeping these records safe for future use. Today I had an issue with solving a problem: three people were sent to Tanzania for this project and they turned out to be almost illiterate in terms of basic organizational skills—regardless of their university education, they struggled fundamentally. To make matters worse, there was a heap of information that lacked any semblance or structure; Org-organization is precisely what it's missing! Org means having everything organized. There were a number of photographs that made it difficult to determine their specific location within different areas or sites across Tanzania. I've organized images within my Dynamic Knowledge Repository and marked those needing attention for further processing. While it's unclear at this point whether there is a way to perform similar actions directly in Emacs org-mode, exporting pertinent details as hyperlinks or notes could serve well enough by allowing me access through Org files. This method ensures that I can seamlessly retrieve the necessary links when required within my workflow environment. After sorting the photos, I promptly labeled everything and swiftly created a temporary folder containing symbolic links for each picture. When you open this directory in JOSM software, it displays images associated with their specific geographical locations using coordinates like latitude and longitude. This correlation of photos simplifies the process of generating reports. Important to note is that whole process takes just few seconds for hundreds of images, which are in different directories. By enabling people to interact with computers through voice recordings or audio notes, I can generate coherent and professional reports by transcribing these audios into text. While individuals may make numerous errors while speaking—such as hesitations, repetitions, or grammatical mistakes—the Large Language Model (LLM), specifically Microsoft Phi-4 in this instance, excels at correcting the resulting transcripts to ensure clarity and accuracy. References: microsoft/phi-4 · Hugging Face: https://huggingface.co/microsoft/phi-4 About Dynamic Knowledge Repositories (DKR): https://www.dougengelbart.org/content/view/190/163/ I was talking during this email: LLM-Helpers/rcd-llm-speech-single-input.sh at main - LLM-Helpers - Gitea: Git with a cup of tea https://gitea.com/gnusupport/LLM-Helpers/src/branch/main/bin/rcd-llm-speech-single-input.sh Demo: https://www.youtube.com/watch?v=84iS3atFQdI Now imagine giving to computer commands. Emacs listening, and doing what you want. And teaching him in the moment of doing, without delays. How about telling Emacs in Org mode: - Insert a primary header - Start with top-level headers - make major heading - title the main topic and then Emacs doing: * and asking you: What would be the title? And you say: make it "About notes" and you get: * About notes Then you start talking and insert the transcription. Then you can correct the text automatically. - make second level heading You talk and then it give it to you. I am not far from that point. Today I have implemented whisper-cli as it is full free model to transcribe text from audio speech. Interacting with computer with less effort is of course always more desirable. And I am surprised how perfect it works. As you can see, I can just talk and it gets transcribed straight into the Emacs. https://gnu.support/images/2025/03/2025-03-17/2025-03-17-22:41:29.mp4 -- Jean Louis