[ Please note the cross-post and Reply-To ] Hi folks,
As promised, here's a quick summary of what was discussed at the BoF session in Portland. Apologies for the delay - it takes a while to write these up... :-/ Thanks to the awesome efforts of our video team, the session is already online [1]. I've taken a copy of the (partial!) Gobby notes too, alongside my small set of slides for the session. [2] We only had a small number of attendees at the session in person- whether that's because of lack of interest or a clash with the other sessions at the time I've no idea. debian-www ========== I didn't have a huge amount to talk about here, but felt it was worth trying to start some discussion... * We're still using CVS for the website, which is a PITA. Git might work, but for a few (potential?) problems: + New way of working for our contributors, including translators who may not cope with learning a more complex tool + Space/time constraints working with a big repo - CVS supports partial checkouts better. I'm not convinced that this should matter any more, but... Later data comparisons tell us that CVS uses ~350M for a checkout, a git clone uses ~540M. An initial git clone can also be slow. + Our current work-flow (helper scripts, "page outdated" logic, translations) is built around CVS and would need a major revamp to fit git. Maybe p04a could help here? Is anybody interested enough in switching that we'll find enough manpower to make the change? CVS is *horrible* (IMHO), but it's a lot of work to switch. No other topics were brought up, so we moved on to the wiki... debian-wiki =========== Quick summary of the wiki status: * 12,203 pages (non-spam) * 12,565 registered user accounts (non-spam) * Using Moin 1.9.4 with some local patches (since upgraded to 1.9.7) Brief discussion of how we've dealt with spammers - the problem is *believed* to be just about solved now. To edit pages in the wiki, a user must be logged in with an account. To create an account, they must register using a valid email address and we validate that email/account link by sending a URL that needs to be visited. Whenever anybody attempts to sign up for an account, our scripts attempt (based on heuristics and history) to detect and block spam sign-ups. Questions about account setup, clarified that account holders in the wiki don't need to be DDs. Sign-ups are free for anyone who cares - please join in! Wiki anti-spam discussion ========================= More in-depth explanation of how people appear "spammy" when attempting to create an account. A typical spammer will look have: * <random alphanumerics>@hotmail.com for email * <a totally unrelated set of random characters> for a username * an IP on a random Chinese mobile broadband network or known spam-haven The anti-spam checks will score all the information on a sign-up attempt and will refuse to create an account if the total score is too high. If people attempt to sign up too many times in succession for an account from the same spammy-looking email or IP, the IP will be blacklisted. The blacklist is not just for blocking account sign-ups - spammers are clearly not interested in Debian and are just looking to spam. We block the IP so they can't access any of our pages. Too many obvious spam sign-up attempts from the same network address range will also result in us blacklisting a network block, or even an entire ISP in the case of known spam-havens. We have tried in the past using Captchas on the Debian wiki, but it didn't help much. There are a whole load of problems with Captchas anyway (e.g. blocking blind people, privacy issues), but the biggest problem is that the Captchas just did not solve the spam problem for us! Most of the spam account sign-ups are already coming from botnets where the spammers have broken Captchas to get free email accounts - the one for the wiki is no harder for them! Steve implemented Captcha support for Moin to try this all out, then turned off that support on the Debian wiki after not very long. There is a potential problem with Tor exit nodes being blacklisted due to spammy-looking activity. We'd like to not block the nodes themselves here - we'll need to work on this with the Tor folks. Steve showed a small demo of the anti-spam stuff at work, using his "console" on the wiki, and demonstrated some example spammers that would be blocked. There's no perfect solution here - we're having to work out spam/ham on a small amount of information, and we can never be *100%* sure. In the case that a user tries to sign up and is blocked as a false-positive for spamming, they should mail the debian-www list or the wiki admins and we can white-list email addresses in that situation. Gentoo/Arch wiki comparison =========================== Both Gentoo and Arch have/had really good wikis full of great content and excellent links to more information. It would be awesome if the Debian wiki could be as good; this is down to the people supplying and maintaining the content. Freezing the wiki on a per-release basis? ========================================= This has been suggested a few times in the past - freeze the content in the wiki for each release and create new versions of pages for future content. That way, it becomes easier to track out of date content. I'm not convinced - lots of the wiki (not sure of the split!) is *not* necessarily linked to a particular Debian release so this wouldn't fit too well I think. This works very well for the Apache folks for their documentation, but they have a very different setup. Paul has macros which could help solve this - allow page content to know what the current release is and maybe show different content. Maybe that could help, maybe it's solving a different problem. We haven't spoken to the other distro folks about wiki setups (e.g. comparing anti-spam). Wiki engine choice? =================== We've had the suggestion several times about maybe moving to a different wiki engine. We're on moin and reasonably happy with the setup so far. Moving to another wiki engine is difficult - very intensive to translate markup, or you choose to start again with a mostly empty wiki and risk not getting any content. If anybody is interested in doing a migration, they would need a copy of our data to work with. The wiki admins are happy to give dumps of the wiki to anybody interested. Paul has even added a moin patch to generate daily dumps like this (e.g. for offline use) and is struggling to get it reviewed so far. Steve's patches have been proposed and reviewed upstream and there are outstanding comments he needs to resolve. Lack of time all round. :-( Content is much more important than the wiki engine itself. Steve has a script that can walk through the wiki and try to identify appropriately-tagged pages and check to see if they're out of date, mailing the most recent editors to ask for review. It's only a proof-of-concept so far, might put it into place shortly as a trial. Templates for wiki pages ======================== Some discussion about moin templates - we're using these already in a few places (e.g. for BSP pages), please suggest more if you think they would be useful. HELP! ===== We're *always* looking for more help in the wiki. A particular place where people can help is in triaging the BTS for wiki,debian.org. Special features in the wiki ============================ We have some cute extra macro features that people have added: * DebianBug() * Release name, version, dates * Message-ID search for mailing list lookup Also: * The CategoryPermalink tag should be used on pages that are referenced externally, to make it obvious that they should not be moved/renamed/deleted. Special sprint / BSP for web/wiki? ================================== It might be very useful to have a specific get-together to work on features and bugs. A good example of this is the up-coming semi-planned switch to single-sign-on for the wiki. We'd like to get away from the separate accounts that everybody has. SSO is something we've been wanting to do for ages, but short of manpower. We'll be working on migration when we find some time. Translations in the wiki ======================== The way we do this is not wonderful, with links to <LANG>/PageName. Moin has better support for this for its own internal pages, but not sure how we can use that better ourselves. Wiki infrastructure and performance =================================== We moved servers a couple of years back, from a dedicated i386 machine to an amd64 VM hosted by DSA with lots of memory. We're using a heavily-threaded moin/wsgi setup on that machine and it seems to cope now. As far as we can tell, we're one of the biggest Moin sites on the planet. An example that proved this was the page save / notification performance bug that hit us a couple of years back, which turned out to be a scalability bug in moin. Overall, performance is looking fine now. Mentioned the break-in we had a few years back from the drawing plugin security hole. We had to reset all the passwords, and a lot of people with older accounts did not have working email addresses attached to their accounts. Those people could not recover their accounts automatically because of that, so would have been locked out. If anybody is still in that situation, please contact the admins! The security breach also caused us to move to a new and better setup with privilege separation to reduce the impact of potential future attacks. DSA (weasel in particular) were awesome in terms of doing the re-installation at that time, and fixing up the system security. Thanks! Why are the website updates so slow? ==================================== Rebuilds happen from cron rather than on every commit. Why? The website build takes a very long time due to its size. The cross-linking in wml is great for generating well-linked content with macros etc., but on a very large site it takes a long time to generate all the HTML. It's possible to just rebuild small parts of the site, but that can be risky and can cause bugs. Could we use po4a / gettext for www translation? ================================================ Maybe - we need people to work on this to see if we can make it work. [1] http://meetings-archive.debian.net/Public/debian-meetings/2014/debconf14/webm/Web_and_wiki_BoF.webm [2] http://www.einval.com/~steve/talks/Debconf14-web-wiki/ -- Steve McIntyre, Cambridge, UK. st...@einval.com Welcome my son, welcome to the machine.
signature.asc
Description: Digital signature