I'm a big +1 on this proposal. We should be able to continue improving the
programming guides to enhance their quality and make this process easier.

> Move the programming guide to the spark-website repo, to allow faster
iterations and releases

This is a great idea. It should work for structured streaming programming
guides. PySpark's user guides (
https://spark.apache.org/docs/latest/api/python/user_guide/index.html) are
actually generated by Sphinx so I am not sure if they can also be moved to
the spark-website.

> I think the documentation should be version specific- but separate from
spark release cadence - and can be updated multiple times after spark
release.

@Nimrod, here is the discussion thread to separate the Spark docs releases
from the Spark releases:
https://lists.apache.org/thread/1675rzxx5x4j2x03t9x0kfph8tlys0cx. This will
allow us to keep improving version-specific docs as well.



On Tue, Jun 11, 2024 at 4:00 PM serge rielau.com <se...@rielau.com> wrote:

> I think some of the issues raised here are not really common.
> Examples should follow best practice.
> It would be odd to have an example that exploits ansi.enabled=false to
> e.g. overflow an integer.
> Instead an example that works with ansi mode will typically work perfectly
> fine in an older version, especially at the level of discussion here, which
> is towards starter guides.
> What can happen of course is that best practice in a new version is
> different from best practice in an older version.
> But even then we want to bias towards the new version to bring people
> along.
> The old "workaround" and the new best practice can be shown with a
> disclaimer regarding the version they apply to. (I..e. we version WITHIN
> the page)
> Note that for e.g. builtin functions we already do this. We state when a
> function was introduced.
>
> IMHO the value of a unified doc tree can not be overstated when it comes
> to searchability (SEO).
>
>
> On Jun 11, 2024, at 11:37 AM, Wenchen Fan <cloud0...@gmail.com> wrote:
>
> Shall we decouple these two decisions?
>
>    - Move the programming guide to the spark-website repo, to allow
>    faster iterations and releases
>    - Make programming guide version-less
>
> I think the downside of moving the programming guide to the spark-website
> repo is almost negligible: you may need to have PRs in both the Spark and
> spark-website repo for major features that need to be mentioned in the
> programming guide. The release process may need more steps to build the doc
> site.
>
> We can have more discussions on version-less. Today we upload the full doc
> site for each maintenance release, which is a big waste as the content is
> almost the same with the previous maintenance release. As a result, git
> operations on the spark-website repo are quite slow today, as this repo is
> too big. I think we should at least have a single programming guide for
> each feature release.
>
>
> On Tue, Jun 11, 2024 at 10:36 AM Neil Ramaswamy <n...@ramaswamy.org>
> wrote:
>
>> There are two issues and one main benefit that I see with versioned
>> programming guides:
>>
>>    - *Issue 1*: We often retroactively realize that code snippets have
>>    bugs and explanations are confusing (see examples: dropDuplicates
>>    <https://github.com/apache/spark/pull/46797>,
>>    dropDuplicatesWithinWatermark
>>    
>> <https://stackoverflow.com/questions/77512507/how-exactly-does-dropduplicateswithinwatermark-work>).
>>    Without backporting to older guides, I don't think that users can have, as
>>    Mridul says, "reasonable confidence that features, functionality and
>>    examples mentioned will work with that released Spark version". In this
>>    sense, I definitely disagree with Nimrod's position of "working on updated
>>    versions and not working with old versions anyway." To have confidence in
>>    versioned programming guides, we *must *have a system for backporting
>>    and re-releasing.
>>    - *Issue 2*: If programming guides live in the Spark website, you now
>>    need maintenance releases in Spark to get those changes to production 
>> (i.e.
>>    spark-website). Historically, Spark does *not *create maintenance
>>    releases frequently, especially not just for a docs change. So, we'd need
>>    to break precedent (this will create potentially dozens of minor releases,
>>    far more than what we do today), and the person making docs changes needs
>>    to rebuild the docs site and create one PR in spark-website for *every
>>    *version they change. Fixing a code typo in 4 versions? You need 4
>>    maintenance releases, and 4 more PRs.
>>    - *Benefit 1*: versioned docs don't have to caveat what features are
>>    available in prose.
>>
>>
>> Personally, I think it's fine to caveat what features are available in
>> prose. For the rare case where we have *completely *incompatible Spark
>> code (which should be exceedingly rare), we can provide different code
>> snippets. As Wenchen points out, if we *do *have 100 mutually
>> incompatible versions, we have an issue, but the ANSI SQL default might be
>> one of these rare examples.
>>
>> (Note: version-specific commentary is already present in the Structured
>> Streaming Programming Guide, our most popular
>> <https://analytics.apache.org/index.php?module=CoreHome&action=index&date=yesterday&period=day&idSite=40#?idSite=40&period=day&date=yesterday&category=General_Actions&subcategory=General_Pages>
>> guide. It flows nicely: for example, we talk about state, and then we say,
>> "hey, if you have Spark 4.0, state is more easily debuggable because of the
>> state reader." The prose focuses on the stable concept of state—which has
>> been unchanged since 2.0.0—and then mentions a feature that can
>> encourage upgrade.)
>>
>> However, I do see one path forward with versioned guides: 1) guide
>> changes do not constitute a maintenance release 2) we create an automation
>> to allow us to backport docs changes to old branches 3) once merged in
>> Spark, the automation rebuilds all the static sites and creates PRs in
>> spark-website. The downside is that backport merge conflicts *will *force
>> developers to backport changes themselves. While I do not want to sign up
>> for that work, is this something people are more comfortable with?
>>
>> Neil
>>
>>
>> On Tue, Jun 11, 2024 at 8:47 AM Wenchen Fan <cloud0...@gmail.com> wrote:
>>
>>> Just FYI, the Hive languages manual is also version-less:
>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual
>>>
>>> It's not a strong data point as this doc is not actively updated, but my
>>> personal feeling is that it's nice to see the history of a feature: when it
>>> was introduced, when it got changed, with JIRA ticket linked.
>>>
>>> One potential issue is that if a feature has been changed 100 times in
>>> history, it's too verbose to document all 100 different behaviors for
>>> different versions. If that happens, I think we can make each major version
>>> have its own programming guide, assuming we won't change a feature 100
>>> times in Spark 4 :)
>>>
>>> On Mon, Jun 10, 2024 at 1:08 PM Nimrod Ofek <ofek.nim...@gmail.com>
>>> wrote:
>>>
>>>> My personal opinion is that having the documents per version (current
>>>> and previous), without fixing previous versions - just keeping them as a
>>>> snapshot in time of the current documentation once the new version was
>>>> released, should be good enough.
>>>>
>>>> Because now Neil would like to change the documentation (personally I
>>>> think it's very needed and it's a great thing to do) - there will be a big
>>>> gap between the old documents and the new ones...
>>>> If after rewriting and rearenging the documents someone would feel it
>>>> can be beneficial to port back the documentation for some of the older
>>>> versions as well as a one time thing, that's possible as well of course...
>>>>
>>>> I find this solution to be best of all worlds - versioned, so you can
>>>> read documents which are relevant to the version you use (though I am in
>>>> favour of working on updated versions and not working with old versions
>>>> anyway), while the documentation can be updated many times, after the
>>>> release and independently from the actual release of Spark.
>>>>
>>>> I think that keeping one document to support all versions will soon
>>>> become hard to read and understand with little benefit of having updated
>>>> documentation for old versions.
>>>>
>>>>
>>>> Regarding SEO and deranking, afaik updating the documentation more
>>>> frequently should only improve ranking so the latest documentation should
>>>> always be ranked high in Google search, but maybe I'm missing something.
>>>>
>>>> Nimrod
>>>>
>>>>
>>>>
>>>> בתאריך יום ב׳, 10 ביוני 2024, 21:25, מאת Nicholas Chammas ‏<
>>>> nicholas.cham...@gmail.com>:
>>>>
>>>>> I will let Neil and Matt clarify the details because I believe they
>>>>> understand the overall picture better. However, I would like to emphasize
>>>>> something that motivated this effort and which may be getting lost in the
>>>>> concerns about versioned vs. versionless docs.
>>>>>
>>>>> The main problem is that some of the guides need major overhauls.
>>>>>
>>>>> There are people like Neil who are interested in making significant
>>>>> contributions to the guides. What is holding them back is that major
>>>>> changes to the web docs can trigger wholesale deranking of our site by
>>>>> Google. Since versioned docs are tied to Spark releases, which are
>>>>> infrequent, that means potentially being nuked in the search rankings for
>>>>> months.
>>>>>
>>>>> Versionless docs allow for rapid iteration on the guides, which can be
>>>>> driven in part by search rankings.
>>>>>
>>>>> In other words, there is a problem chain here that leads to
>>>>> versionless docs:
>>>>>
>>>>> 1. Several guides need major improvements.
>>>>> 2. We cannot make such improvements because a) that would risk site
>>>>> deranking, and b) we are constrained by Spark's release schedule.
>>>>> 3. Versionless guides allow for incremental improvements, which
>>>>> addresses problems 2a and 2b.
>>>>>
>>>>> This is my understanding of the big picture as described to me by Neil
>>>>> and Matt. I defer to them to elaborate on the details, especially in
>>>>> relation to Google site rankings. If this concern is not valid or not that
>>>>> serious, then we can just iterate slowly on the docs with Spark’s existing
>>>>> release schedule and there is less need for versionless docs.
>>>>>
>>>>> Nick
>>>>>
>>>>>
>>>>> On Jun 10, 2024, at 1:53 PM, Mridul Muralidharan <mri...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>>   Versioned documentation has the benefit that users can have
>>>>> reasonable confidence that features, functionality and examples mentioned
>>>>> will work with that released Spark version.
>>>>> A versionless guide runs into potential issues with deprecation,
>>>>> behavioral changes and new features.
>>>>>
>>>>> My concern is not just around features highlighting their supported
>>>>> versions, but examples which reference others features in spark.
>>>>>
>>>>> For example, sql differences between hive ql and ansi sql when we flip
>>>>> the default in 4.0 : we would have 4.x example snippets for some feature
>>>>> (say UDAF) which would not work for 3.x and vice versa.
>>>>>
>>>>> Regards,
>>>>> Mridul
>>>>>
>>>>>
>>>>> On Mon, Jun 10, 2024 at 12:03 PM Hyukjin Kwon <gurwls...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> I am +1 on this but as you guys mentioned, we should really be clear
>>>>>> on how to address different versions.
>>>>>>
>>>>>> On Wed, 5 Jun 2024 at 18:27, Matthew Powers <
>>>>>> matthewkevinpow...@gmail.com> wrote:
>>>>>>
>>>>>>> I am a huge fan of the Apache Spark docs and I regularly look at the
>>>>>>> analytics on this page
>>>>>>> <https://analytics.apache.org/index.php?module=CoreHome&action=index&date=yesterday&period=day&idSite=40#?period=day&date=yesterday&category=Dashboard_Dashboard&subcategory=1>
>>>>>>> to see how well they are doing.  Great work to everyone that's 
>>>>>>> contributed
>>>>>>> to the docs over the years.
>>>>>>>
>>>>>>> We've been chipping away with some improvements over the past year
>>>>>>> and have made good progress.  For example, lots of the pages were 
>>>>>>> missing
>>>>>>> canonical links.  Canonical links are a special type of link that are
>>>>>>> extremely important for any site that has duplicate content.  Versioned
>>>>>>> documentation sites have lots of duplicate pages, so getting these
>>>>>>> canonical links added was important.  It wasn't really easy to make this
>>>>>>> change though.
>>>>>>>
>>>>>>> The current site is confusing Google a bit.  If you do a "spark
>>>>>>> rocksdb" Google search for example, you get the Spark 3.2 Structured
>>>>>>> Streaming Programming Guide as the first result (because Google isn't
>>>>>>> properly indexing the docs).  You need to Control+F and search for
>>>>>>> "rocksdb" to navigate to the relevant section which says: "As of
>>>>>>> Spark 3.2, we add a new built-in state store implementation...",
>>>>>>> which is what you'd expect in a versionless docs site in any case.
>>>>>>>
>>>>>>> There are two different user experiences:
>>>>>>>
>>>>>>> * Option A: push Spark 3.1 Structured Streaming users to the Spark
>>>>>>> 3.1 Structured Streaming Programming guide that doesn't mention RocksDB
>>>>>>> * Option B: push Spark Structured Streaming users to the latest
>>>>>>> Structure Streaming Programming guide, which mentions RocksDB, but 
>>>>>>> caveat
>>>>>>> that this feature was added in Spark 3.2
>>>>>>>
>>>>>>> I think Option B provides Spark 3.1 users a better experience
>>>>>>> overall.  It's better to let users know they can access RocksDB by
>>>>>>> upgrading than hiding this info from them IMO.
>>>>>>>
>>>>>>> Now if we want Option A, then we'd need to give users a reasonable
>>>>>>> way to actually navigate to the Spark 3.1 docs.  From what I can tell, 
>>>>>>> the
>>>>>>> only way to navigate from the latest Structured Streaming
>>>>>>> Programming Guide
>>>>>>> <https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html>
>>>>>>> to a different version is by manually updating the URL.
>>>>>>>
>>>>>>> I was just skimming over the Structured Streaming Programming guide
>>>>>>> and noticing again how lots of the Python code snippets aren't PEP 8
>>>>>>> compliant.  It seems like our current docs publishing process would 
>>>>>>> prevent
>>>>>>> us from improving the old docs pages.
>>>>>>>
>>>>>>> In this conversation, let's make sure we distinguish between
>>>>>>> "programming guides" and "API documentation".  API docs should be 
>>>>>>> versioned
>>>>>>> and there is no question there.  Programming guides are higher level
>>>>>>> conceptual overviews, like the Polars user guide
>>>>>>> <https://docs.pola.rs/>, and should be relevant across many
>>>>>>> versions.
>>>>>>>
>>>>>>> I would also like to point out the the current programming guides
>>>>>>> are not consistent:
>>>>>>>
>>>>>>> * The Structured Streaming programming guide
>>>>>>> <https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html>
>>>>>>> is one giant page
>>>>>>> * The SQL programming guide
>>>>>>> <https://spark.apache.org/docs/latest/sql-programming-guide.html>
>>>>>>> is split on many pages
>>>>>>> * The PySpark programming guide
>>>>>>> <https://spark.apache.org/docs/latest/api/python/getting_started/index.html>
>>>>>>> takes you to a whole different URL structure and makes it so you can't 
>>>>>>> even
>>>>>>> navigate to the other programming guides anymore
>>>>>>>
>>>>>>> I am looking forward to collaborating with the community and
>>>>>>> improving the docs to 1. delight existing users and 2. attract new 
>>>>>>> users.
>>>>>>> Docs are a "website problem" and we're big data people, but I'm 
>>>>>>> confident
>>>>>>> we'll be able to work together and find a good path forward here.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jun 5, 2024 at 3:22 PM Neil Ramaswamy <n...@ramaswamy.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks all for the responses. Let me try to address everything.
>>>>>>>>
>>>>>>>> > the programming guides are also different between versions since
>>>>>>>> features are being added, configs are being added/ removed/ changed,
>>>>>>>> defaults are being changed etc.
>>>>>>>>
>>>>>>>> I agree that this is the case. But I think it's fine to mention
>>>>>>>> what version a feature is available in. In fact, I would argue that
>>>>>>>> mentioning an improvement that a version brings motivates users to 
>>>>>>>> upgrade
>>>>>>>> more than keeping docs improvement to "new releases to keep the 
>>>>>>>> community
>>>>>>>> updating". Users should upgrade to get a better Spark, not better Spark
>>>>>>>> documentation.
>>>>>>>>
>>>>>>>> > having a programming guide that refers to features or API methods
>>>>>>>> that does not exist in that version is confusing and detrimental
>>>>>>>>
>>>>>>>> I don't think that we'd do this. Again, programming guides should
>>>>>>>> teach fundamentals that do not change version-to-version.
>>>>>>>> TypeScript
>>>>>>>> <https://www.typescriptlang.org/docs/handbook/typescript-from-scratch.html>
>>>>>>>>  (which
>>>>>>>> has one of the best DX's and docs) does this exceptionally well.
>>>>>>>> Their guides are refined, versionless pages, new features are 
>>>>>>>> elaborated
>>>>>>>> upon in release notes (analogous to our version-specific docs), and 
>>>>>>>> for the
>>>>>>>> occasional caveat for a version, it is called out in the guides.
>>>>>>>>
>>>>>>>>  I agree with Wenchen's 3 points. I don't think we need to say that
>>>>>>>> they * have* to go to the old page, but that if they want to, they
>>>>>>>> can.
>>>>>>>>
>>>>>>>> Neil
>>>>>>>>
>>>>>>>> On Wed, Jun 5, 2024 at 12:04 PM Wenchen Fan <cloud0...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I agree with the idea of a versionless programming guide. But one
>>>>>>>>> thing we need to make sure of is we give clear messages for things 
>>>>>>>>> that are
>>>>>>>>> only available in a new version. My proposal is:
>>>>>>>>>
>>>>>>>>>    1. keep the old versions' programming guide unchanged. For
>>>>>>>>>    example, people can still access
>>>>>>>>>    https://spark.apache.org/docs/3.3.4/quick-start.html
>>>>>>>>>    2. In the new versionless programming guide, we mention at the
>>>>>>>>>    beginning that for Spark versions before 4.0, go to the versioned 
>>>>>>>>> doc site
>>>>>>>>>    to read the programming guide.
>>>>>>>>>    3. Revisit the programming guide of Spark 4.0 (compare it with
>>>>>>>>>    the one of 3.5), and adjust the content to mention 
>>>>>>>>> version-specific changes
>>>>>>>>>    (API change, new features, etc.)
>>>>>>>>>
>>>>>>>>> Then we can have a versionless programming guide starting from
>>>>>>>>> Spark 4.0. We can also revisit programming guides of all versions and
>>>>>>>>> combine them into one with version-specific notes, but that's 
>>>>>>>>> probably too
>>>>>>>>> much work.
>>>>>>>>>
>>>>>>>>> Any thoughts?
>>>>>>>>>
>>>>>>>>> Wenchen
>>>>>>>>>
>>>>>>>>> On Wed, Jun 5, 2024 at 1:39 AM Martin Andersson <
>>>>>>>>> martin.anders...@kambi.com> wrote:
>>>>>>>>>
>>>>>>>>>> While I have no practical knowledge of how documentation is
>>>>>>>>>> maintained in the spark project, I must agree with Nimrod. For users 
>>>>>>>>>> on
>>>>>>>>>> older versions, having a programming guide that refers to features 
>>>>>>>>>> or API
>>>>>>>>>> methods that does not exist in that version is confusing and 
>>>>>>>>>> detrimental.
>>>>>>>>>>
>>>>>>>>>> Surely there must be a better way to allow updating documentation
>>>>>>>>>> more often?
>>>>>>>>>>
>>>>>>>>>> Best Regards,
>>>>>>>>>> Martin
>>>>>>>>>>
>>>>>>>>>> ------------------------------
>>>>>>>>>> *From:* Nimrod Ofek <ofek.nim...@gmail.com>
>>>>>>>>>> *Sent:* Wednesday, June 5, 2024 08:26
>>>>>>>>>> *To:* Neil Ramaswamy <n...@ramaswamy.org>
>>>>>>>>>> *Cc:* Praveen Gattu <praveen.ga...@databricks.com.invalid>; dev <
>>>>>>>>>> dev@spark.apache.org>
>>>>>>>>>> *Subject:* Re: [DISCUSS] Versionless Spark Programming Guide
>>>>>>>>>> Proposal
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> EXTERNAL SENDER. Do not click links or open attachments unless
>>>>>>>>>> you recognize the sender and know the content is safe. DO NOT 
>>>>>>>>>> provide your
>>>>>>>>>> username or password.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Neil,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> While you wrote you don't mean the api docs (of course), the
>>>>>>>>>> programming guides are also different between versions since 
>>>>>>>>>> features are
>>>>>>>>>> being added, configs are being added/ removed/ changed, defaults are 
>>>>>>>>>> being
>>>>>>>>>> changed etc.
>>>>>>>>>>
>>>>>>>>>> I know of "backport hell" - which is why I wrote that once a
>>>>>>>>>> version is released it's freezed and the documentation will be 
>>>>>>>>>> updated for
>>>>>>>>>> the new version only.
>>>>>>>>>>
>>>>>>>>>> I think of it as facing forward and keeping older versions but
>>>>>>>>>> focusing on the new releases to keep the community updating.
>>>>>>>>>> While spark has support window of 18 months until eol, we can
>>>>>>>>>> have only 6 months support cycle until eol for documentation- there 
>>>>>>>>>> are no
>>>>>>>>>> major security concerns for documentation...
>>>>>>>>>>
>>>>>>>>>> Nimrod
>>>>>>>>>>
>>>>>>>>>> בתאריך יום ד׳, 5 ביוני 2024, 08:28, מאת Neil Ramaswamy ‏<
>>>>>>>>>> n...@ramaswamy.org>:
>>>>>>>>>>
>>>>>>>>>> Hi Nimrod,
>>>>>>>>>>
>>>>>>>>>> Quick clarification—my proposal will not touch API-specific
>>>>>>>>>> documentation for the specific reasons you mentioned (signatures, 
>>>>>>>>>> behavior,
>>>>>>>>>> etc.). It just aims to make the *programming guides *versionless.
>>>>>>>>>> Programming guides should teach fundamentals of Spark, and the 
>>>>>>>>>> fundamentals
>>>>>>>>>> of Spark should not change between releases.
>>>>>>>>>>
>>>>>>>>>> There are a few issues with updating documentation multiple times
>>>>>>>>>> after Spark releases. First, fixes that apply to all existing 
>>>>>>>>>> versions'
>>>>>>>>>> programming guides need backport PRs. For example, this change
>>>>>>>>>> <https://github.com/apache/spark/pull/46797/files> applies to
>>>>>>>>>> all the versions of the SS programming guide, but is likely to be 
>>>>>>>>>> fixed
>>>>>>>>>> only in Spark 4.0. Additionally, any such update within a Spark 
>>>>>>>>>> release will require
>>>>>>>>>> re-building the static sites in the spark repo, and copying those 
>>>>>>>>>> files to
>>>>>>>>>> spark-website via a commit in spark-website. Making a typo fix like 
>>>>>>>>>> the one
>>>>>>>>>> I linked would then require <number of versions we want to update> + 
>>>>>>>>>> 1 PRs,
>>>>>>>>>> opposed to 1 PR in the versionless programming guide world.
>>>>>>>>>>
>>>>>>>>>> Neil
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 4, 2024 at 1:32 PM Nimrod Ofek <ofek.nim...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> While I think that the documentation needs a lot of improvement
>>>>>>>>>> and important details are missing - and detaching the documentation 
>>>>>>>>>> from
>>>>>>>>>> the main project can help iterating faster on documentation specific 
>>>>>>>>>> tasks,
>>>>>>>>>> I don't think we can nor should move to versionless documentation.
>>>>>>>>>>
>>>>>>>>>> Documentation is version specific: parameters are added and
>>>>>>>>>> removed, new features are added, behaviours sometimes change etc.
>>>>>>>>>>
>>>>>>>>>> I think the documentation should be version specific- but
>>>>>>>>>> separate from spark release cadence - and can be updated multiple 
>>>>>>>>>> times
>>>>>>>>>> after spark release.
>>>>>>>>>> The way I see it is that the documentation should be updated only
>>>>>>>>>> for the latest version and some time before a new release should be
>>>>>>>>>> archived and the updated documentation should reflect the new 
>>>>>>>>>> version.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Nimrod
>>>>>>>>>>
>>>>>>>>>> בתאריך יום ג׳, 4 ביוני 2024, 18:34, מאת Praveen Gattu
>>>>>>>>>> ‏<praveen.ga...@databricks.com.invalid>:
>>>>>>>>>>
>>>>>>>>>> +1. This helps for greater velocity in improving docs. However,
>>>>>>>>>> we might still need a way to provide version specific information 
>>>>>>>>>> isn't it,
>>>>>>>>>> i.e. what features are available in which version etc.
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 3, 2024 at 3:08 PM Neil Ramaswamy <n...@ramaswamy.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I've written up a proposal to migrate all the Apache Spark
>>>>>>>>>> programming guides to be versionless. You can find the proposal
>>>>>>>>>> here
>>>>>>>>>> <https://docs.google.com/document/d/1OqeQ71zZleUa1XRZrtaPDFnJ-gVJdGM80o42yJVg9zg/>.
>>>>>>>>>> Please leave comments, or reply in this DISCUSS thread.
>>>>>>>>>>
>>>>>>>>>> TLDR: by making the programming guides versionless, we can make
>>>>>>>>>> updates to them whenever we'd like, instead of at the Spark release
>>>>>>>>>> cadence. This increased update velocity will enable us to make 
>>>>>>>>>> gradual
>>>>>>>>>> improvements, including breaking up the Structured Streaming 
>>>>>>>>>> programming
>>>>>>>>>> guide into smaller sub-guides. The proposal does not break *any 
>>>>>>>>>> *existing
>>>>>>>>>> URLs, and it does not affect our versioned API docs in any way.
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>> Neil
>>>>>>>>>>
>>>>>>>>>> CONFIDENTIALITY NOTICE: This email message (and any attachment)
>>>>>>>>>> is intended only for the individual or entity to which it is 
>>>>>>>>>> addressed. The
>>>>>>>>>> information in this email is confidential and may contain 
>>>>>>>>>> information that
>>>>>>>>>> is legally privileged or exempt from disclosure under applicable 
>>>>>>>>>> law. If
>>>>>>>>>> you are not the intended recipient, you are strictly prohibited from
>>>>>>>>>> reading, using, publishing or disseminating such information and upon
>>>>>>>>>> receipt, must permanently delete the original and destroy any 
>>>>>>>>>> copies. We
>>>>>>>>>> take steps to protect against viruses and other defects but advise 
>>>>>>>>>> you to
>>>>>>>>>> carry out your own checks and precautions as Kambi does not accept 
>>>>>>>>>> any
>>>>>>>>>> liability for any which remain. Thank you for your co-operation.
>>>>>>>>>>
>>>>>>>>>
>>>>>
>

Reply via email to