Hi Yaroslav, 

That is what I would guess too, I was wondering if anyone had gone to the 
trouble of analysing the situation more rigorously. It seems that it might be 
less easy to get useful and reliable data on what sources are available, and 
particularly those not available, by language, which would be quite a strong 
constraint on coverage. For example, one on the biggest historical constraints 
on coverage of women in science is lack of sources, so no matter how much we 
might want to improve coverage it is just more difficult to do it, this problem 
even applies to men in some lower profile fields. and we have not yet plucked 
all the low hanging fruit, even on English Wikipedia. Sometimes it looks like 
the research also mainly goes for the low hanging fruit. After all, why 
wouldn’t they?

There is also the matter of free access vs paywalled and paper only which can 
be quite difficult if you are nowhere near a suitable library, which also puts 
Europe and North America ahead in the game. What numbers of Wikipedians would 
have access to any given source, by language and geographical distribution, 
would also be an interesting topic for analysis. All these things ae barriers 
to contribution in some fields.

Cheers, 

Peter

 

From: Yaroslav Blanter [mailto:[email protected]] 
Sent: 09 June 2022 09:05
To: Wikimedia Mailing List
Subject: [Wikimedia-l] Re: Wikimedia Research Showcase June 15

 

Hi Peter,

 

from my (admittedly from more than ten years ago) experience with the Russian 
Wikipedia, articles are often translated from other language Wikipedias, with 
references just taken over and not being independently checked (perhaps it is 
checked that an online reference is still available, but not its content). I 
also see my articles on the English Wikipedia being translated to other 
languages, even when they have Dutch or Russian references which I do not 
expect the translators to be able to read. This is an anecdotal evidence though.

 

In addition, there are very few projects where the main population is 
monolingual, in almost all cases the bulk of the editors speak also a major 
language which is used like lingua franca (like Russian for the Chuvash 
Wikipedia, or perhaps Spanish for the Quechua Wikipedia). This makes the 
problem less acute.

 

Best

Yaroslav

 

On Wed, Jun 8, 2022 at 8:44 PM Peter Southwood <[email protected]> 
wrote:

Interesting research. Maybe I just missed it, but I didn’t notice any 
discussion of relation of availability of reliable sources to coverage in 
different languages. In English Wikipedia we are not allowed to write about 
topics which are not covered by suitable sources, but there may also be more 
and a wider range of sources available in English, and English Wikipedia is 
also written by people with a wider range of languages, making more non-English 
sources available. Is there any research on comparing  this tendency in other 
languages? If there is no-one editing a Wikipedia who can read a source, no-one 
can write about its content. It can be very difficult to find sources for some 
topics, and it would be unsurprising if geographical topics in an area where a 
given language is not spoken are not covered in that language.

Cheers,

Peter

 

From: Emily Lescak [mailto:[email protected]] 
Sent: 08 June 2022 15:13
To: [email protected]; [email protected]; 
[email protected]
Subject: [Wikimedia-l] Wikimedia Research Showcase June 15

 

Hi all,

The next Research Showcase, Wikipedia's Languages, will be live-streamed 
Wednesday, June 15, at 4:00 AM PST/11:00 AM UTC. View your local time here 
<https://zonestamp.toolforge.org/1655290800> . 

YouTube stream: https://www.youtube.com/watch?v=AZQM1dtn3g0

You are welcome to ask questions via YouTube chat or on IRC at 
#wikimedia-research. 

This month's presentations: 

Quantifying knowledge synchronisation in the 21st century

By Jisung Yoon (Pohang University of Science and Technology)

Humans acquire and accumulate knowledge through language usage and eagerly 
exchange their knowledge for advancement. Although geographical barriers had 
previously limited communication, the emergence of information technology has 
opened new avenues for knowledge exchange. However, it is unclear which 
communication pathway is dominant in the 21st century. Here, we explore the 
dominant path of knowledge diffusion in the 21st century using Wikipedia, the 
largest communal dataset. We evaluate the similarity of shared knowledge 
between population groups, distinguished based on their language usage. When 
population groups are more engaged with each other, their knowledge structure 
is more similar, where engagement is indicated by socio-economic connections, 
such as cultural, linguistic, and historical features. Moreover, geographical 
proximity is no longer a critical requirement for knowledge dissemination. 
Furthermore, we integrate our data into a mechanistic model to better 
understand the underlying mechanism and suggest that the knowledge "Silk Road" 
of the 21st century is based online.




The Language Geography of Wikipedia

By Martin Dittus

Every language is a system of being, doing, knowing, and imagining. With over 
7,000 active languages in the world, how many languages are fully represented 
online? To answer this question, digital non-profit Whose Knowledge? initiated 
the first ever report on the State of the Internet's Languages. As part of this 
report, Martin Dittus and Mark Graham have investigated the languages of 
Wikipedia. Wikipedia began with a single English-language edition more than two 
decades ago, and now offers more than 300 language editions, which places it at 
the forefront of digital language support. However, this does not mean that 
speakers of these languages get access to the same content: Wikipedia’s 
language editions vary widely in scale. We further find that this inequality is 
also reflected in Wikipedia’s geographic coverage: not all places are captured 
in every language. Wikipedia's coverage often follows the global distribution 
of speakers of the respective language. Yet even when we account for the 
distribution of language populations, certain language communities are much 
more strongly represented on Wikipedia than others. As a consequence, we find 
that for many countries in Africa, Central and South America, and South Asia, 
most of the content about those countries is in a foreign language, often a 
European-colonial language. In other words, in many of these places, people may 
need to be able to speak a second (possibly foreign) language in order to 
access Wikipedia information about their own places. Why do we see these 
differences? And what can be done to improve things?

You can also watch our past research showcases here: 
https://www.mediawiki.org/wiki/Wikimedia_Research/ 
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase> Showcase

 

Emily, on behalf of the Research team

 

--

Emily Lescak (she / her)

Senior Research Community Officer

The Wikimedia Foundation

 


 
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
 

Virus-free.  
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
 www.avg.com 

 

_______________________________________________
Wikimedia-l mailing list -- [email protected], guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/7TWLX36C5PSHFFSQGCXGMVR35QB7LRRV/
To unsubscribe send an email to [email protected]

_______________________________________________
Wikimedia-l mailing list -- [email protected], guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/T32M5WWCIDHSDNQV373XWL4N2TOO3XE7/
To unsubscribe send an email to [email protected]

Reply via email to