Great research Frederik! *~~~~~~* *Denis Carriere*
On Wed, Jul 5, 2017 at 5:05 PM, Frederik Ramm <[email protected]> wrote: > Hi, > > > These spam changes do not need that complexity to detect. > > I've done some numbers, maybe it helps. > > I counted all users that only ever commited one changeset with one edit > inside. This number is 140352. > > Then I discarded those where the changeset comment was shorter than 50 > characters or where the content had been redacted long time ago, leaving > me with 12173. > > Then I looked at the objects modified/created, and discarded all where > the object had neither website, nor description, nor note tag. This left > me with 3323 objects. > > Then I looked at the list and found a broad range of edits. Some, while > having an advertising slant, seem a legit addition of someone's own > business: > > user=Martin Merkur > changeset=38362589 > comment=Our doors are always open. Come and visit, taste our coffee, > see what we do > object=node 4103514010 > addr:city=Berlin;addr:housenumber=38;addr:postcode= > 12435;addr:street=Elsenstraße;amenity=cafe;cuisine=coffee_ > shop;internet_access=no;name=passenger > coffee;note=https://www.facebook.com/PassengerEspresso/;opening_ > hours=7:30-15:00 > Uhr;smoking=outside;website=passenger-coffee.de > > or > > user=otheryan > changeset=13150739 > comment=Added in West Town Bikes as it is at the same address and has > enough of its own activity that it needs to be recognized on the map. > object=node 1585399965 > addr:housenumber=2459;addr:postcode=60622;addr:street=W > Division;name=Ciclo Urbano/West Town > Bikes;shop=bicycle;website=http://ciclourbanochicago.com/ > > some look more SEO-y > > user=northcarolinahealth > changeset=43324244 > comment=Updated Osborne Insurance Services at Raleigh, NC > object=node 4474950186 > addr:city=Raleigh;addr:housenumber=5316;addr:postcode=27609;addr:state=NC; > addr:street=Six > Forks Road;hours=Mon-Fri > :8.00AM-6.00PM;name=Osborne Insurance > Services;phone=919-845-9955;suite=110;website=http:// > northcarolinahealth.org > > or > > user=blakemanhart > changeset=43027180 > comment=Updated State Farm - Blake Manhart at Springfield, VA > object=node 4456153164 > addr:city=Springfield;addr:housenumber=8322;addr: > postcode=22152;addr:state=VA;addr:street=Traford > Ln #B;name=State Farm - > Blake Manhart;Owner=Blake > Manhart;phone=703-992-9664;website=http://blakemanhart.com > > I had a look at trying to automatically match website and user name; 457 > of them actually contain the user name in the web site. but that is a > too coarse check. I fear that it might be necessary to look through the > rest manually to detect the dodgy ones. > > Of the 3323, 208 have a highway tag. But here it bites me that I took > everything that had either note or description or website, because some > of the edits with highway=* are legit and have a description/note where > the newbie mapper explained what they did. 170 of the 208 do have a > website tag, and finally, they *all* seem dodgy. (Interestingly it was > not all ways - some highway=traffic_signals too!) > > I've run a revert on these 170 but the majority had already been fixed > by others! > > That leaves us with a good 3115 objects to investigate. Many do clearly > violate our "no advertising" rules but then again we don't want to bee > to harsh with the cycle shop owner who maybe oversteps the line. > > I've put my interim results here > > http://www.remote.org/frederik/tmp/username-in-url.csv > > (for those where the username is in the URL) - do you think we should > revert them all automatically? (Keep in mind many may have been reverted > already - we'd only work on those where the spam version is still current.) > > and > > http://www.remote.org/frederik/tmp/other.csv > > for those where the username is not (fully) in the URL. > > Bye > Frederik > > -- > Frederik Ramm ## eMail [email protected] ## N49°00'09" E008°23'33" > > _______________________________________________ > Talk-us mailing list > [email protected] > https://lists.openstreetmap.org/listinfo/talk-us >
_______________________________________________ Talk-us mailing list [email protected] https://lists.openstreetmap.org/listinfo/talk-us

