I'm glad you think it's generally a good idea! I will mention, though, that with these better docs I've almost finished, I'm hoping that Structured Streaming no longer stays a specialist topic that requires "trench warfare." With good pedagogy, I think that it's very approachable. The Knowledge Sharing Hub could be useful for e2e real-world use-cases, but I think that operator semantics, stream configurations, etc. have a better home in the official documentation.
Thanks for your engagement, Mich. Looking forward to hearing others' opinions. Neil On Mon, Mar 25, 2024 at 2:50 PM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Hi, > > Your intended work on improving the Structured Streaming documentation is > great! Clear and well-organized instructions are important for everyone > using Spark, beginners and experts alike. > Having said that, Spark Structured Streaming much like other specialist > topics with Spark say (k8s) or otherwise cannot be mastered by > documentation alone. These topics require a considerable amount of practice > and trench warfare so to speak to master them. Suffice to say that I agree > with the proposals of making examples. However, it is an area that many try > to master but fail( judging by typical issues brought up in the user group > and otherwise). Perhaps using a section such as the proposed "Knowledge > Sharing Hub'', may become more relevant. Moreover, the examples have to > reflect real life scenarios and conversly will be of limited use otherwise. > > HTH > > Mich Talebzadeh, > > Technologist | Data | Generative AI | Financial Fraud > > London > United Kingdom > > > view my Linkedin profile > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > Disclaimer: The information provided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essential to note > that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner Von Braun)". > > Mich Talebzadeh, > Technologist | Data | Generative AI | Financial Fraud > London > United Kingdom > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* The information provided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essential to note > that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > On Mon, 25 Mar 2024 at 21:19, Neil Ramaswamy <n...@ramaswamy.org> wrote: > >> Hi all, >> >> I recently started an effort to improve the Structured Streaming >> documentation. I thought that the current documentation, while very >> comprehensive, could be improved in terms of organization, clarity, and >> presence of examples. >> >> You can view the repo here >> <https://github.com/neilramaswamy/structured-streaming>, and you can see >> a preview of the site here <https://structured-streaming.vercel.app/>. >> It's almost at full parity with the programming guide, and it also has >> additional content, like a guide on unit testing and an in-depth >> explanation of watermarks. I think it's at a point where we can bring this >> to completion if it's something that the community wants. >> >> I'd love to hear feedback from everyone: is this something that we would >> want to move forward with? As it borrows certain parts from the programming >> guide, it has an Apache License, so I'd be more than happy if it is adopted >> by an official Spark repo. >> >> Best, >> Neil >> >