Good point, it looks like the main thing Slack's TOS <https://www.salesforce.com/content/dam/web/en_us/www/documents/legal/Salesforce_MSA.pdf?_gl=1*1u5n6fj*_ga*MTU2MzM4Mjk5OC4xNjgyNTM4NjIz*_ga_QTJQME5M5D*MTY5MTUzMDE1Mi40Mi4xLjE2OTE1MzA4MzIuMjkuMC4w> in section 3.3 points us to Salesforce's External Facing Services Policy <https://www.salesforce.com/content/dam/web/en_us/www/documents/legal/Agreements/policies/ExternalFacing_Services_Policy.pdf> which addresses is the consent for businesses under NDAs on public or shared channels or private conversations or PII being exported without consent, and a bunch of other clearly illegal stuff we're not doing.
I think since this data is public in the sense that anyone with the publicly available invite can join and read/see display names, we are fine. Slack has nothing in there about an PMC admin running an export to get access to the data that's owned by the ASF. So I believe as long as we get consent from the community and the PMC is okay with it, then we should be fine from a legal standpoint as long as we don't export private information like emails or private chats being included in this. On Tue, Aug 8, 2023 at 4:53 PM Russell Spitzer <russell.spit...@gmail.com> wrote: > I'm +1 as long as Slack TOS are ok with it. We already have full public > archives of the mailing list and I see slack as just an extension of the > mailing list. > > On Tue, Aug 8, 2023 at 4:18 PM Brian Olsen <bitsondata...@gmail.com> > wrote: > >> Hey Iceberg Nation, >> >> I wanted to propose having the public Apache Iceberg Slack >> <https://apache-iceberg.slack.com/> chat and user data for the community >> to use as a public data source. I have a couple of specific use cases in >> mind that I would like to use it for, hence what brought me to ask about it. >> >> The main problem I want to address for the community is the lack of >> persistence of the answers we're generating in Slack. Slack is on a free >> version that only retains the last 60 days of valuable threads happening >> there. Questions are repeatedly asked, and this takes up time for everyone >> in the community to answer the same questions multiple times. If we publish >> the public chat and user data (i.e. no emails or user info outside of >> what's displayed in Slack), then we can address this in the following ways: >> >> 1. We can use this as a getting started tutorial >> featuring pyIceberg is to pull this dataset into a python or SQL ecosystem >> to learn about Iceberg, but also to discover old conversations that no >> longer appear on Slack. We can also take the raw data and push it into a >> local chatbot for folks to ask questions locally, build analytics projects >> etc... >> 2. For those that are less interested in building your own chatbot or >> data pipeline, once this data is available, Tabular could use it to build >> and maintain a Discourse Forum <https://discourse.org/> (not to be >> confused with Discord). There are many reasons to add this on top of >> Slack, >> like persistence, discoverability via Google, curation and organization >> into wiki style to the point answers, and gamification, to make the goal >> that it's not just Tabular moderating this, but that the community takes >> over as they build trust similar to Stack Overflow. Of course, once we >> have >> the initial community working together there, we can use both Slack for >> faster messaging, and migrate specific valuable conversations to Discourse >> once it is done. >> 3. Another idea, would be that we could also use the Discourse forum >> as one of the inputs to create some sort of chatbot experience, either in >> Slack or nested in the docs. This would likely outperform just directly >> training on Slack data as answers in Slack aren't verified and curated to >> the most concise form possible. >> 4. The Slack and Tabular Discourse forum would be public to read, so >> this would allow for other companies in the space to build their own >> solutions. >> >> >> The idea is that we would run a daily job that would export the Slack >> logs to some public dumping ground (GitHub or something) to store this >> dataset. Again, only public data that you could see if you signed up and >> logged into Slack would be exposed. >> >> How does this sound to everyone? Let me know if you have any questions or >> other ideas! >> >> Bits >> >