@Yufei,

Regarding:

> Love the following example. Not sure if Vale can catch this and provide
> suggestions. It may be only possible with LLM.
>
>> Replace this: If you're ready to purchase Office 365 for your
>> organization, contact your Microsoft account representative.
>> With this: Ready to buy? Contact us.
>
>
Think of Vale as a Spelling/Grammar Check that runs at compile time much
like syntax linters. They general idea is that there is some application of
regex that checks for patterns in the language and flags them. The flags
will give some indication of what rule(s) were violated and depending on
the settings, will either throw a non-zero exit, or warning to print a
concerning warning at build time.

Vale uses various abstractions like styles
<https://vale.sh/docs/topics/styles/>, vocabulary
<https://vale.sh/docs/topics/vocab/>, and packaging to pull together the
list of rules/regexes to flag these issues. One common and concrete example
is using passive voice
<https://learn.microsoft.com/en-us/style-guide/grammar/verbs#active-and-passive-voice>.
This Vale style is encoded in the community-driven Microsoft style-guide
<https://github.com/errata-ai/Microsoft/blob/master/Microsoft/Passive.yml> and
will flag sentences like "Apache Iceberg 1.4.2 was released on November 2,
2023". It is then up to you to rephrase to the proper grammar to remove
that message "The Iceberg community released Apache Iceberg 1.4.2 on
November 2, 2023".

I'm not comfortable with using LLMs to provide knowledge yet when our
existing documentation is lacking a lot of context and has an inconsistent
tone. As we grow a quality corpus around Iceberg and ecosystems, I am
definitely interested in building tools or integrating with GitHub AI tools
to help generate documentation and PR messaging that the engineer will
later tweak. One step at a time though.


On Wed, Nov 1, 2023 at 5:14 PM Yufei Gu <flyrain...@gmail.com> wrote:

> +1 Love the following example. Not sure if Vale can catch this and provide
> suggestions. It may be only possible with LLM.
>
>> Replace this: If you're ready to purchase Office 365 for your
>> organization, contact your Microsoft account representative.
>> With this: Ready to buy? Contact us.
>
>
> Yufei
>
>
> On Wed, Nov 1, 2023 at 12:20 PM Ryan Blue <b...@tabular.io> wrote:
>
>> +1
>>
>> On Wed, Nov 1, 2023 at 6:38 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>> wrote:
>>
>>> Hi Brian
>>>
>>> I like the proposal, it sounds like a good way to "align" our
>>> documentation.
>>>
>>> Thanks !
>>> Regards
>>> JB
>>>
>>> On Wed, Nov 1, 2023 at 8:20 AM Brian Olsen <bitsondata...@gmail.com>
>>> wrote:
>>> >
>>> > Hey Iceberg Nation, As I've gone through the Iceberg docs, I've
>>> noticed a lot of inconsistencies with terminology, grammar, and style. As a
>>> distributed community, we have a lot of non-native English speakers reading
>>> and writing our documentation. I propose we adopt the Microsoft Style Guide
>>> to improve the communication and consistency of the docs. Common rules like
>>> defaulting to use present tense not only make the documentation consistent
>>> but also more accessible for those who struggle to understand complex
>>> conjugations. Then there are examples like making sure to capitalize proper
>>> nouns like (Spark, Flink, Trino, Apache Software Foundation, etc...). You
>>> may think, that's great Brian, but good luck getting everyone reading the
>>> project and following that. I also want to propose adding a prose linter
>>> called Vale, that will enable us to add the existing rules for the
>>> Microsoft Style Guide, and our own custom rules to ensure consistent style
>>> with documentation changes.
>>> > Let's discuss this in the sync tomorrow! Bits
>>>
>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>

Reply via email to