On Thu, Aug 27, 2009 at 11:35 AM, Gregory Maxwell<gmaxw...@gmail.com> wrote: > On Wed, Aug 26, 2009 at 9:30 PM, John Vandenberg<jay...@gmail.com> wrote: >> And yet ... this is what every successful wiki does. Wikipedia is >> extremely structured. The writers are not always expected to know the >> structure; gnomes do the tidying up. > > You must have an enormously different idea of extremely structured > than I do. I once created software to extract lat/long from Wikitext > on enwp and gave up when I got to the 100th or so distinct template > invocation which did almost but not quite exactly the same thing. > > Go search the archives for some of my example bat-shit category linkage maps. > > It's extremely structures compared to complete anarchy, or perhaps > "extremely structured" compared to the human body. It's not structured > compared to normal sources of data. Not at all.
English Wikipedia is not "well" structured for many data mining tasks. The problem domain is much larger and the content more dynamic, but there are also too many cooks and partially implemented ideas, and not enough concern about consistency and re-use. The Creator & Author namespace on Commons & Wikisource respectively are a better example of structured information that can be mined. Wikispecies pages have a limited amount of information on them, and it is quite sensibly structured. And I'd bet that the Wikispecies community is also going to be more accommodating of any proposals to increase standardisation of the content in order to allow mining. -- John Vandenberg _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l