Thank you very much. That exactly what I wanted to know. 2011/11/27 Bjoern Hoehrmann <[email protected]>
> * Khalida BEN SIDI AHMED wrote: > >In the html code of a Wikipedia article how to recognise the > >*first*sentence of this article? > > It's not marked up and probably differs among language versions. On the > english version the first `p` child of a `mw-content-ltr` element is a > good bet, as I pointed out earlier, to identify the first paragraph. It > would then be necessary to find the full stop at the end of a sentence; > criteria for that include that a space or the end of a paragraph follows > and that it is not included in some nesting construct like parentheses; > http://en.wikipedia.org/wiki/Sentence_boundary_disambiguation discusses > some of the problems and includes pointers to some solutions. > -- > Björn Höhrmann · mailto:[email protected] · http://bjoern.hoehrmann.de > Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de > 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ > > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
