Wikimedia,

Hello. After receiving and listening to the feedback from our previous 
discussion, I have revised the Wikianswers proposal: 
https://meta.wikimedia.org/wiki/Wikianswers . I would like to also call your 
attention to its technical discussion section: 
https://meta.wikimedia.org/wiki/Wikianswers#Technical_discussion . A current 
version of this section is available below.

Per the feedback, the revised proposal includes, in addition to an option for a 
sister project at a new domain, e.g., https://en.wikianswers.org , an option 
for integration into the search systems of Wikipedia, Wikidata, and Commons. 
With respect to this latter option, AI systems' (LLMs') responses to end-users' 
questions would still be URL-addressed, human-editable content, e.g.: 
https://en.wikipedia.org/qa/2b106ea8-4d1b-441f-9dc8-4555a9999ae9 .

Thank you for checking out the revised proposal and for any feedback.

Technical discussion
Overview
Relevant artificial intelligence topics include retrieval-augmented generation, 
retrieval-augmented generation with guardrails, and agent-based approaches.
As presently considered, those parts of the question-and-answer data which 
could be human-editable include: (1) the template of the prompts, (2) the task, 
(3) the retrieved context data, (4) the questions, and (5) the answers.
The template is the overall structure of the prompts to the LLM. It includes 
some natural language and slots where the other parts will be placed. This 
should be locked so as to be editable only by administrators. Editing this 
would invalidate every cached and unlocked answer, meaning that every unlocked 
answer would be updated, refreshed, or regenerated.
The task is an instruction, e.g., "You are a helpful system which will answer 
the user's question using the following information". This should be locked so 
as to be editable only by administrators. Editing this would invalidate every 
dependent cached and unlocked answer, meaning that every unlocked answer would 
be updated, refreshed, or regenerated.
The retrieved context data are chunks or excerpts, e.g., of Wikipedia articles, 
which enhance the answering of a particular question. Users could edit them, 
resulting in the cascading invalidations of dependent cached and unlocked 
answers. With respect to user experiences, editors might click on these 
displayed chunks or excerpts of content to navigate to them as they occurred in 
source pages and edit them there, these updates to the underlying pages 
resulting in updates to the chunks and dependent unlocked answers.
The questions would be unusual to edit, except in the cases of typographical 
errors.
The answers, abstractly, result from processing the other ingredients. These 
could be edited by humans but, as shown above, they could be subsequently 
revised by the system per cascading updates, refreshes, or regenerations. In 
some cases, editors might want to edit an answer and then to lock it from 
subsequent revisions by the system.
In conclusion, as presently considered, users would ordinarily tend to want to 
edit the retrieved chunks of content drawn from Wikipedia pages, these chunks 
augmenting the prompts to the LLMs, the cascading of these page revisions 
updating dependent unlocked answers automatically.
Database schemas
Wikianswers database schemas would include one or more tables with vector 
columns for embedding vectors. A project goal, then, would be to efficiently 
combine into a database schema the existing concepts of revision tables, page 
tables, and text tables with the newer concepts of embedding vectors and vector 
databases. Relevant tools include pgvector, a database extension which provides 
open-source vector-similarity search to PostgreSQL.
URL-addressability
Instead of requiring a new domain, e.g., https://en.wikianswers.org/ , 
Wikianswers features could be integrated into the search systems of Wikipedia, 
Wikidata, and Commons. In this case, human-editable responses could still be 
URL-addressable, e.g.: 
https://en.wikipedia.org/qa/2b106ea8-4d1b-441f-9dc8-4555a9999ae9 .
Datetime encoding
Some questions have impermanent answers and others are volatile, meaning that 
their answers could vary each time that the question was asked. In these 
regards, date and time data could be encoded into URLs in a human-readable 
manner, e.g., 
https://en.wikipedia.org/qa/2023/09/21/21/29/00/2b106ea8-4d1b-441f-9dc8-4555a9999ae9
 . Some questions and answers might involve different granularities of time. 
For example, a natural-language question "Which teams are in the Super Bowl?" 
might have a number of URLs, one for each year, e.g., 
https://en.wikipedia.org/qa/2022/40a7338d-fe75-4897-aee6-ec87141020a6 and 
https://en.wikipedia.org/qa/2021/40a7338d-fe75-4897-aee6-ec87141020a6 .
User experience
In the approach where Wikianswers features are integrated into Wikipedia, 
Wikidata, and Commons search, user experiences could utilize the existing text 
search boxes atop pages. Perhaps the "magnifying glass" icon in those search 
boxes could be accompanied by a "question mark" icon. One of these two icons 
would be selected, or activated, by end-users. Which such icon was activated 
would toggle between using the existing keyword-based content search and the 
described Wikianswers human-editable question-answering subsystem. Still under 
consideration is whether and how end-users could specify whether they desire 
for their question to have their current page, or selections thereof, as focal 
when responding to their question.




Best regards,

Adam Sobieski

_______________________________________________
Wikimedia-l mailing list -- [email protected], guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/JEABZM5G6LY4CZPHTGXS2FGU5V5ZJWBV/
To unsubscribe send an email to [email protected]

Reply via email to