Hi all,

A few months ago, I started a thread about migrating our programming guides
to be versionless. I had a POC, and the mostly-positive reception on the
thread encouraged me to implement it for real.

I did that recently here
<https://github.com/neilramaswamy/spark-website/pull/2>, but there were a
few critical issues: some guides (like MLlib) reference code examples in
the apache/spark repo itself, and the SQL reference directly references the
generated API reference using a Jekyll Liquid tag called include_api_gen. I
think these are non-starters unless there is significant community interest.

One of the motivations for versionless guides was to be able to quickly
iterate to avoid large, SEO-impacting changes. However, with the challenge
that versionless poses, I think it's better to just break apart the large
guides, like the Structured Streaming one, and just hope that they rank
well in Spark 4.0.0+.

To that end, I've broken apart the Structured Streaming Programming
Guide—it now resembles the MLlib and SQL reference guides. Critically, I
have not changed *any *content. This work should make it easier for us to
better paginate and structure our Structured Streaming docs in the future,
which will make it easier for our users to consume. This is especially
important because similar tools like Flink do a much nicer job of
organizing content.

You can view the changes on my personal site here
<https://nr-spark-site.vercel.app/streaming/index.html>, and you can see
the code changes here <https://github.com/neilramaswamy/nr-spark/pull/6>.
Please let me know what you think; if there's no major objection, I will
create a ticket and submit the PR.

Best,
Neil

Reply via email to