I'm +1 as long as Slack TOS are ok with it. We already have full public archives of the mailing list and I see slack as just an extension of the mailing list.
On Tue, Aug 8, 2023 at 4:18 PM Brian Olsen <bitsondata...@gmail.com> wrote: > Hey Iceberg Nation, > > I wanted to propose having the public Apache Iceberg Slack > <https://apache-iceberg.slack.com/> chat and user data for the community > to use as a public data source. I have a couple of specific use cases in > mind that I would like to use it for, hence what brought me to ask about it. > > The main problem I want to address for the community is the lack of > persistence of the answers we're generating in Slack. Slack is on a free > version that only retains the last 60 days of valuable threads happening > there. Questions are repeatedly asked, and this takes up time for everyone > in the community to answer the same questions multiple times. If we publish > the public chat and user data (i.e. no emails or user info outside of > what's displayed in Slack), then we can address this in the following ways: > > 1. We can use this as a getting started tutorial > featuring pyIceberg is to pull this dataset into a python or SQL ecosystem > to learn about Iceberg, but also to discover old conversations that no > longer appear on Slack. We can also take the raw data and push it into a > local chatbot for folks to ask questions locally, build analytics projects > etc... > 2. For those that are less interested in building your own chatbot or > data pipeline, once this data is available, Tabular could use it to build > and maintain a Discourse Forum <https://discourse.org/> (not to be > confused with Discord). There are many reasons to add this on top of Slack, > like persistence, discoverability via Google, curation and organization > into wiki style to the point answers, and gamification, to make the goal > that it's not just Tabular moderating this, but that the community takes > over as they build trust similar to Stack Overflow. Of course, once we have > the initial community working together there, we can use both Slack for > faster messaging, and migrate specific valuable conversations to Discourse > once it is done. > 3. Another idea, would be that we could also use the Discourse forum > as one of the inputs to create some sort of chatbot experience, either in > Slack or nested in the docs. This would likely outperform just directly > training on Slack data as answers in Slack aren't verified and curated to > the most concise form possible. > 4. The Slack and Tabular Discourse forum would be public to read, so > this would allow for other companies in the space to build their own > solutions. > > > The idea is that we would run a daily job that would export the Slack logs > to some public dumping ground (GitHub or something) to store this dataset. > Again, only public data that you could see if you signed up and logged into > Slack would be exposed. > > How does this sound to everyone? Let me know if you have any questions or > other ideas! > > Bits >