Thanks all for the responses. Let me try to address everything. > the programming guides are also different between versions since features are being added, configs are being added/ removed/ changed, defaults are being changed etc.
I agree that this is the case. But I think it's fine to mention what version a feature is available in. In fact, I would argue that mentioning an improvement that a version brings motivates users to upgrade more than keeping docs improvement to "new releases to keep the community updating". Users should upgrade to get a better Spark, not better Spark documentation. > having a programming guide that refers to features or API methods that does not exist in that version is confusing and detrimental I don't think that we'd do this. Again, programming guides should teach fundamentals that do not change version-to-version. TypeScript <https://www.typescriptlang.org/docs/handbook/typescript-from-scratch.html> (which has one of the best DX's and docs) does this exceptionally well. Their guides are refined, versionless pages, new features are elaborated upon in release notes (analogous to our version-specific docs), and for the occasional caveat for a version, it is called out in the guides. I agree with Wenchen's 3 points. I don't think we need to say that they *have* to go to the old page, but that if they want to, they can. Neil On Wed, Jun 5, 2024 at 12:04 PM Wenchen Fan <cloud0...@gmail.com> wrote: > I agree with the idea of a versionless programming guide. But one thing we > need to make sure of is we give clear messages for things that are only > available in a new version. My proposal is: > > 1. keep the old versions' programming guide unchanged. For example, > people can still access > https://spark.apache.org/docs/3.3.4/quick-start.html > 2. In the new versionless programming guide, we mention at the > beginning that for Spark versions before 4.0, go to the versioned doc site > to read the programming guide. > 3. Revisit the programming guide of Spark 4.0 (compare it with the one > of 3.5), and adjust the content to mention version-specific changes (API > change, new features, etc.) > > Then we can have a versionless programming guide starting from Spark 4.0. > We can also revisit programming guides of all versions and combine them > into one with version-specific notes, but that's probably too much work. > > Any thoughts? > > Wenchen > > On Wed, Jun 5, 2024 at 1:39 AM Martin Andersson < > martin.anders...@kambi.com> wrote: > >> While I have no practical knowledge of how documentation is maintained in >> the spark project, I must agree with Nimrod. For users on older versions, >> having a programming guide that refers to features or API methods that does >> not exist in that version is confusing and detrimental. >> >> Surely there must be a better way to allow updating documentation more >> often? >> >> Best Regards, >> Martin >> >> ------------------------------ >> *From:* Nimrod Ofek <ofek.nim...@gmail.com> >> *Sent:* Wednesday, June 5, 2024 08:26 >> *To:* Neil Ramaswamy <n...@ramaswamy.org> >> *Cc:* Praveen Gattu <praveen.ga...@databricks.com.invalid>; dev < >> dev@spark.apache.org> >> *Subject:* Re: [DISCUSS] Versionless Spark Programming Guide Proposal >> >> >> EXTERNAL SENDER. Do not click links or open attachments unless you >> recognize the sender and know the content is safe. DO NOT provide your >> username or password. >> >> Hi Neil, >> >> >> While you wrote you don't mean the api docs (of course), the programming >> guides are also different between versions since features are being added, >> configs are being added/ removed/ changed, defaults are being changed etc. >> >> I know of "backport hell" - which is why I wrote that once a version is >> released it's freezed and the documentation will be updated for the new >> version only. >> >> I think of it as facing forward and keeping older versions but focusing >> on the new releases to keep the community updating. >> While spark has support window of 18 months until eol, we can have only 6 >> months support cycle until eol for documentation- there are no major >> security concerns for documentation... >> >> Nimrod >> >> בתאריך יום ד׳, 5 ביוני 2024, 08:28, מאת Neil Ramaswamy < >> n...@ramaswamy.org>: >> >> Hi Nimrod, >> >> Quick clarification—my proposal will not touch API-specific >> documentation for the specific reasons you mentioned (signatures, behavior, >> etc.). It just aims to make the *programming guides *versionless. >> Programming guides should teach fundamentals of Spark, and the fundamentals >> of Spark should not change between releases. >> >> There are a few issues with updating documentation multiple times after >> Spark releases. First, fixes that apply to all existing versions' >> programming guides need backport PRs. For example, this change >> <https://github.com/apache/spark/pull/46797/files> applies to all the >> versions of the SS programming guide, but is likely to be fixed only in >> Spark 4.0. Additionally, any such update within a Spark release will require >> re-building the static sites in the spark repo, and copying those files to >> spark-website via a commit in spark-website. Making a typo fix like the one >> I linked would then require <number of versions we want to update> + 1 PRs, >> opposed to 1 PR in the versionless programming guide world. >> >> Neil >> >> On Tue, Jun 4, 2024 at 1:32 PM Nimrod Ofek <ofek.nim...@gmail.com> wrote: >> >> Hi, >> >> While I think that the documentation needs a lot of improvement and >> important details are missing - and detaching the documentation from the >> main project can help iterating faster on documentation specific tasks, I >> don't think we can nor should move to versionless documentation. >> >> Documentation is version specific: parameters are added and removed, new >> features are added, behaviours sometimes change etc. >> >> I think the documentation should be version specific- but separate from >> spark release cadence - and can be updated multiple times after spark >> release. >> The way I see it is that the documentation should be updated only for the >> latest version and some time before a new release should be archived and >> the updated documentation should reflect the new version. >> >> Thanks, >> Nimrod >> >> בתאריך יום ג׳, 4 ביוני 2024, 18:34, מאת Praveen Gattu >> <praveen.ga...@databricks.com.invalid>: >> >> +1. This helps for greater velocity in improving docs. However, we might >> still need a way to provide version specific information isn't it, i.e. >> what features are available in which version etc. >> >> On Mon, Jun 3, 2024 at 3:08 PM Neil Ramaswamy <n...@ramaswamy.org> wrote: >> >> Hi all, >> >> I've written up a proposal to migrate all the Apache Spark programming >> guides to be versionless. You can find the proposal here >> <https://docs.google.com/document/d/1OqeQ71zZleUa1XRZrtaPDFnJ-gVJdGM80o42yJVg9zg/>. >> Please leave comments, or reply in this DISCUSS thread. >> >> TLDR: by making the programming guides versionless, we can make updates >> to them whenever we'd like, instead of at the Spark release cadence. This >> increased update velocity will enable us to make gradual improvements, >> including breaking up the Structured Streaming programming guide into >> smaller sub-guides. The proposal does not break *any *existing URLs, and >> it does not affect our versioned API docs in any way. >> >> Thanks! >> Neil >> >> CONFIDENTIALITY NOTICE: This email message (and any attachment) is >> intended only for the individual or entity to which it is addressed. The >> information in this email is confidential and may contain information that >> is legally privileged or exempt from disclosure under applicable law. If >> you are not the intended recipient, you are strictly prohibited from >> reading, using, publishing or disseminating such information and upon >> receipt, must permanently delete the original and destroy any copies. We >> take steps to protect against viruses and other defects but advise you to >> carry out your own checks and precautions as Kambi does not accept any >> liability for any which remain. Thank you for your co-operation. >> >