@Yufei, Regarding:
> Love the following example. Not sure if Vale can catch this and provide > suggestions. It may be only possible with LLM. > >> Replace this: If you're ready to purchase Office 365 for your >> organization, contact your Microsoft account representative. >> With this: Ready to buy? Contact us. > > Think of Vale as a Spelling/Grammar Check that runs at compile time much like syntax linters. They general idea is that there is some application of regex that checks for patterns in the language and flags them. The flags will give some indication of what rule(s) were violated and depending on the settings, will either throw a non-zero exit, or warning to print a concerning warning at build time. Vale uses various abstractions like styles <https://vale.sh/docs/topics/styles/>, vocabulary <https://vale.sh/docs/topics/vocab/>, and packaging to pull together the list of rules/regexes to flag these issues. One common and concrete example is using passive voice <https://learn.microsoft.com/en-us/style-guide/grammar/verbs#active-and-passive-voice>. This Vale style is encoded in the community-driven Microsoft style-guide <https://github.com/errata-ai/Microsoft/blob/master/Microsoft/Passive.yml> and will flag sentences like "Apache Iceberg 1.4.2 was released on November 2, 2023". It is then up to you to rephrase to the proper grammar to remove that message "The Iceberg community released Apache Iceberg 1.4.2 on November 2, 2023". I'm not comfortable with using LLMs to provide knowledge yet when our existing documentation is lacking a lot of context and has an inconsistent tone. As we grow a quality corpus around Iceberg and ecosystems, I am definitely interested in building tools or integrating with GitHub AI tools to help generate documentation and PR messaging that the engineer will later tweak. One step at a time though. On Wed, Nov 1, 2023 at 5:14 PM Yufei Gu <flyrain...@gmail.com> wrote: > +1 Love the following example. Not sure if Vale can catch this and provide > suggestions. It may be only possible with LLM. > >> Replace this: If you're ready to purchase Office 365 for your >> organization, contact your Microsoft account representative. >> With this: Ready to buy? Contact us. > > > Yufei > > > On Wed, Nov 1, 2023 at 12:20 PM Ryan Blue <b...@tabular.io> wrote: > >> +1 >> >> On Wed, Nov 1, 2023 at 6:38 AM Jean-Baptiste Onofré <j...@nanthrax.net> >> wrote: >> >>> Hi Brian >>> >>> I like the proposal, it sounds like a good way to "align" our >>> documentation. >>> >>> Thanks ! >>> Regards >>> JB >>> >>> On Wed, Nov 1, 2023 at 8:20 AM Brian Olsen <bitsondata...@gmail.com> >>> wrote: >>> > >>> > Hey Iceberg Nation, As I've gone through the Iceberg docs, I've >>> noticed a lot of inconsistencies with terminology, grammar, and style. As a >>> distributed community, we have a lot of non-native English speakers reading >>> and writing our documentation. I propose we adopt the Microsoft Style Guide >>> to improve the communication and consistency of the docs. Common rules like >>> defaulting to use present tense not only make the documentation consistent >>> but also more accessible for those who struggle to understand complex >>> conjugations. Then there are examples like making sure to capitalize proper >>> nouns like (Spark, Flink, Trino, Apache Software Foundation, etc...). You >>> may think, that's great Brian, but good luck getting everyone reading the >>> project and following that. I also want to propose adding a prose linter >>> called Vale, that will enable us to add the existing rules for the >>> Microsoft Style Guide, and our own custom rules to ensure consistent style >>> with documentation changes. >>> > Let's discuss this in the sync tomorrow! Bits >>> >> >> >> -- >> Ryan Blue >> Tabular >> >