Had you considered using the ASF's slack? That keeps history On Tue, Aug 8, 2023 at 3:05 PM Brian Olsen <bitsondata...@gmail.com> wrote:
> Good point, it looks like the main thing Slack's TOS > <https://www.salesforce.com/content/dam/web/en_us/www/documents/legal/Salesforce_MSA.pdf?_gl=1*1u5n6fj*_ga*MTU2MzM4Mjk5OC4xNjgyNTM4NjIz*_ga_QTJQME5M5D*MTY5MTUzMDE1Mi40Mi4xLjE2OTE1MzA4MzIuMjkuMC4w> > in > section 3.3 points us to Salesforce's External Facing Services Policy > <https://www.salesforce.com/content/dam/web/en_us/www/documents/legal/Agreements/policies/ExternalFacing_Services_Policy.pdf> > which > addresses is the consent for businesses under NDAs on public or shared > channels or private conversations or PII being exported without consent, > and a bunch of other clearly illegal stuff we're not doing. > > I think since this data is public in the sense that anyone with the > publicly available invite can join and read/see display names, we are fine. > Slack has nothing in there about an PMC admin running an export to get > access to the data that's owned by the ASF. So I believe as long as we get > consent from the community and the PMC is okay with it, then we should be > fine from a legal standpoint as long as we don't export private information > like emails or private chats being included in this. > > On Tue, Aug 8, 2023 at 4:53 PM Russell Spitzer <russell.spit...@gmail.com> > wrote: > >> I'm +1 as long as Slack TOS are ok with it. We already have full public >> archives of the mailing list and I see slack as just an extension of the >> mailing list. >> >> On Tue, Aug 8, 2023 at 4:18 PM Brian Olsen <bitsondata...@gmail.com> >> wrote: >> >>> Hey Iceberg Nation, >>> >>> I wanted to propose having the public Apache Iceberg Slack >>> <https://apache-iceberg.slack.com/> chat and user data for the >>> community to use as a public data source. I have a couple of specific use >>> cases in mind that I would like to use it for, hence what brought me to ask >>> about it. >>> >>> The main problem I want to address for the community is the lack of >>> persistence of the answers we're generating in Slack. Slack is on a free >>> version that only retains the last 60 days of valuable threads happening >>> there. Questions are repeatedly asked, and this takes up time for everyone >>> in the community to answer the same questions multiple times. If we publish >>> the public chat and user data (i.e. no emails or user info outside of >>> what's displayed in Slack), then we can address this in the following ways: >>> >>> 1. We can use this as a getting started tutorial >>> featuring pyIceberg is to pull this dataset into a python or SQL >>> ecosystem >>> to learn about Iceberg, but also to discover old conversations that no >>> longer appear on Slack. We can also take the raw data and push it into a >>> local chatbot for folks to ask questions locally, build analytics >>> projects >>> etc... >>> 2. For those that are less interested in building your own chatbot >>> or data pipeline, once this data is available, Tabular could use it to >>> build and maintain a Discourse Forum <https://discourse.org/> (not >>> to be confused with Discord). There are many reasons to add this on top >>> of >>> Slack, like persistence, discoverability via Google, curation and >>> organization into wiki style to the point answers, and gamification, to >>> make the goal that it's not just Tabular moderating this, but that the >>> community takes over as they build trust similar to Stack Overflow. Of >>> course, once we have the initial community working together there, we can >>> use both Slack for faster messaging, and migrate specific valuable >>> conversations to Discourse once it is done. >>> 3. Another idea, would be that we could also use the Discourse forum >>> as one of the inputs to create some sort of chatbot experience, either in >>> Slack or nested in the docs. This would likely outperform just directly >>> training on Slack data as answers in Slack aren't verified and curated to >>> the most concise form possible. >>> 4. The Slack and Tabular Discourse forum would be public to read, so >>> this would allow for other companies in the space to build their own >>> solutions. >>> >>> >>> The idea is that we would run a daily job that would export the Slack >>> logs to some public dumping ground (GitHub or something) to store this >>> dataset. Again, only public data that you could see if you signed up and >>> logged into Slack would be exposed. >>> >>> How does this sound to everyone? Let me know if you have any questions >>> or other ideas! >>> >>> Bits >>> >>