Hi all, A few months ago, I started a thread about migrating our programming guides to be versionless. I had a POC, and the mostly-positive reception on the thread encouraged me to implement it for real.
I did that recently here <https://github.com/neilramaswamy/spark-website/pull/2>, but there were a few critical issues: some guides (like MLlib) reference code examples in the apache/spark repo itself, and the SQL reference directly references the generated API reference using a Jekyll Liquid tag called include_api_gen. I think these are non-starters unless there is significant community interest. One of the motivations for versionless guides was to be able to quickly iterate to avoid large, SEO-impacting changes. However, with the challenge that versionless poses, I think it's better to just break apart the large guides, like the Structured Streaming one, and just hope that they rank well in Spark 4.0.0+. To that end, I've broken apart the Structured Streaming Programming Guideāit now resembles the MLlib and SQL reference guides. Critically, I have not changed *any *content. This work should make it easier for us to better paginate and structure our Structured Streaming docs in the future, which will make it easier for our users to consume. This is especially important because similar tools like Flink do a much nicer job of organizing content. You can view the changes on my personal site here <https://nr-spark-site.vercel.app/streaming/index.html>, and you can see the code changes here <https://github.com/neilramaswamy/nr-spark/pull/6>. Please let me know what you think; if there's no major objection, I will create a ticket and submit the PR. Best, Neil