Hi WereSpielChequers,

I worked on the data analysis for previous AFT versions and I believe I've 
already answered on a number of occasions your questions as to what we could 
test and what we couldn't in the previous phase, but I am happy to do this here 
and clarify what the research plans for the next version are.

Subjective ratings

We have definitely seen a lot of love/hate rating happen in the case of popular 
articles (e.g. Lady Gaga, Justin Bieber). Teasing apart ratings on the quality 
of the article and rater attitudes towards the topic of the article is pretty 
hard given the fact that an average enwiki article gets a very small number of 
ratings per day and articles that get a sufficient number of ratings tend to be 
attracting particularly opinionated or polarized visitors. 

To give you a measure of the problem: of the 3.7M articles in the main 
namespace of the English Wikipedia only 40 articles (0.001%) obtain 10 or more 
ratings per day. The vast majority of articles don't get any rating for several 
days or weeks or ever. FInding ways to increase the volume of ratings per 
article is one of the issues we're discussing in the context of v.5.

The second problem is that we don't have enough observations on multiple 
ratings by the same user. Only 0.02% of unique raters rate more than one 
article and that means that on a single article basis we cannot easily filter 
out users who only rated a topic they love or hate and still have enough good 
data to process. This is unfortunate: the more rating data we can get per 
rater, the more we can identify gaming or rating biases and control them in 
public article feedback reports.

Effects of AFT on participations

I ran a number of pre/post analyses comparing editing activity before and after 
AFT was activated on a random sample of English Wikipedia articles, controlling 
for page views before and after the activation and found no statistically 
significant difference in the volume  of edits. As I noted elsewhere the 
comparison between two random samples of articles is problematic because we 
cannot easily control for the multiple factors that affect editing activity in 
independent samples of articles so any result you may get out of this coarse 
analysis would be questionable. I agree that's a very important issue and the 
proper way to address it is by a/b testing different AFT interfaces (including 
no AFT widget whatsoever) for the same article and measuring the effects on 
edit activity for the same articles across different user groups: this is one 
of the plans we are considering for v.5

Another important limitation of AFT v.4 is that we only collected aggregate 
event counts for call to actions and we didn't mark edits or new accounts 
created via AFT, which means that we couldn't directly study the effects of AFT 
as an on-ramping tool for new editors (e.g. how many readers it is converting 
to registered users and what is the quality of edits generated via the AFT. 
i.e. how many users who create an account via AFT call to actions actually end 
up becoming editors? What is their survival compared to users who create an 
account in a standard way? And how many among the edits created via AFT are 
vandalism? How many are good faith tests that get reverted? These are all 
questions that we will be addressing as of v.5.

We'll be still working on analyzing the current AFT data to support the design 
of v.5. In particular, we will be focusing on (1) correlations between 
consistent low ratings and poor quality or vandalism or the likelihood of an 
article to be nominated for deletion and (2) the relation between ratings and 
changes in other quality-related metrics on a per-article basis.

I have also pitched the existing data to a number of external researchers 
interesting in article quality measurements and/or rating systems and I invite 
you to do the same.

Hope this helps. I look forward to a more in-depth discussion during the office 
hours.

Dario

On Oct 26, 2011, at 7:33 AM, WereSpielChequers wrote:

>> ------------------------------
>> 
>> Message: 6
>> Date: Wed, 26 Oct 2011 11:11:57 +0100
>> From: Oliver Keyes <scire.fac...@gmail.com>
>> Subject: Re: [Foundation-l] Office Hours on the article feedback tool
>> To: Wikimedia Foundation Mailing List
>>       <foundation-l@lists.wikimedia.org>
>> Message-ID:
>>       <capyupwa34cujyan_vv_chgyxwfct3ejnb4d-nrav_u20qej...@mail.gmail.com
>>> 
>> Content-Type: text/plain; charset=ISO-8859-1
>> 
>> No, the data will remain; you can find it at
>> http://toolserver.org/~catrope/articlefeedback/ (we really need to
>> advertise
>> that more widely, actually).
>> 
>> To be clear, we're not talking about junking the idea; we will still have
>> an
>> "Article Feedback Tool" that lets readers provide feedback to editors. The
>> goal is more to move away from a subjective rating system, and towards
>> something the editors can look at and go "huh, that's a reasonable
>> suggestion as to how to fix the article, I'll go do that" or "aw, that's
>> really nice! I'm glad they liked it so much"
>> 
>> O.
>> 
>> 
> As someone who was never exactly a fan of the Article Feedback Tool I'm glad
> to hear that the current version is to be canned. The sort of subjective
> ratings it could produce were never going to be useful at improving
> articles, certainly not useful enough to justify the screen space. My fear
> was that it might divert people from improving articles to complaining about
> them. Since we skipped a key stage in the testing we will never know whether
> it did that. I didn't realise at the time that it was going to abuse our
> readers trust by collecting shed loads of data that we weren't going to use.
> 
> We took a big risk in implementing the Article Feedback Tool without first
> testing to see whether it would do more harm than good. It is hard to tell
> in hindsight whether it has been negative or neutral in effect. Yes
> recruitment of new editors has fallen sharply - September's new editors on
> EN wiki are down to levels not seen since 2005
> http://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editdistribution but
> things were on the decline anyway so we don't know whether and to what
> extent the Article Feedback tool exacerbated the trend. My concern about
> turning it into something that collects more meaningful comments is that
> this could exacerbate the pernicious trend from improving articles to
> tagging them for others to improve. I appreciate that there are various
> competing theories as to why the community went off the boil circa 2007, but
> for me and anyone else who considers that the trend to template rather than
> improve articles has been a major cause of community decline, an "improved"
> version of the Article Feedback Tool is a worrying prospect.
> 
> Can we make sure that any new generation Article Feedback tool is properly
> tested, and that testing includes:
> 
>   1. Implementing it on a random group  of articles and comparing them with
>   a control sample to see which group of articles had the more edits from
>   newbies;
>   2. Whether the collecting of feedback on ways to improve the article
>   generates additional comments or diverts some editors away from actually
>   fixing the article.
>   3. Which group of articles recruited the most new editors to the pedia.
> 
> Please don't implement it if the testing shows that it diverts people from
> fixing articles to pointing out things that others can fix.
> 
> On a broader note I suggested some time ago that for the community to give
> meaningful input into article development we need a process for the
> community to give feedback on the priority of various potential
> developments. Wikimania does something like that in the way the program is
> put together. The image filter "referendum" came close in that it asked
> people to rate the image filter for importance, unfortunately it didn't
> include other proposals so that people could put them in order of relevant
> importance (we also need a quite separate question for whether you think
> something is worth doing at all). In your new role as liaison between the
> community and the development team please could you initiate something like
> that, so that those of us who would give a higher priority to global
> watchlists or enhancing catalot so that it works on uncategorised articles
> can say so?
> 
> Regards
> 
> WereSpielChequers
> _______________________________________________
> foundation-l mailing list
> foundation-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Reply via email to