Hi, > These spam changes do not need that complexity to detect.
I've done some numbers, maybe it helps. I counted all users that only ever commited one changeset with one edit inside. This number is 140352. Then I discarded those where the changeset comment was shorter than 50 characters or where the content had been redacted long time ago, leaving me with 12173. Then I looked at the objects modified/created, and discarded all where the object had neither website, nor description, nor note tag. This left me with 3323 objects. Then I looked at the list and found a broad range of edits. Some, while having an advertising slant, seem a legit addition of someone's own business: user=Martin Merkur changeset=38362589 comment=Our doors are always open. Come and visit, taste our coffee, see what we do object=node 4103514010 addr:city=Berlin;addr:housenumber=38;addr:postcode=12435;addr:street=Elsenstraße;amenity=cafe;cuisine=coffee_shop;internet_access=no;name=passenger coffee;note=https://www.facebook.com/PassengerEspresso/;opening_hours=7:30-15:00 Uhr;smoking=outside;website=passenger-coffee.de or user=otheryan changeset=13150739 comment=Added in West Town Bikes as it is at the same address and has enough of its own activity that it needs to be recognized on the map. object=node 1585399965 addr:housenumber=2459;addr:postcode=60622;addr:street=W Division;name=Ciclo Urbano/West Town Bikes;shop=bicycle;website=http://ciclourbanochicago.com/ some look more SEO-y user=northcarolinahealth changeset=43324244 comment=Updated Osborne Insurance Services at Raleigh, NC object=node 4474950186 addr:city=Raleigh;addr:housenumber=5316;addr:postcode=27609;addr:state=NC;addr:street=Six Forks Road;hours=Mon-Fri :8.00AM-6.00PM;name=Osborne Insurance Services;phone=919-845-9955;suite=110;website=http://northcarolinahealth.org or user=blakemanhart changeset=43027180 comment=Updated State Farm - Blake Manhart at Springfield, VA object=node 4456153164 addr:city=Springfield;addr:housenumber=8322;addr:postcode=22152;addr:state=VA;addr:street=Traford Ln #B;name=State Farm - Blake Manhart;Owner=Blake Manhart;phone=703-992-9664;website=http://blakemanhart.com I had a look at trying to automatically match website and user name; 457 of them actually contain the user name in the web site. but that is a too coarse check. I fear that it might be necessary to look through the rest manually to detect the dodgy ones. Of the 3323, 208 have a highway tag. But here it bites me that I took everything that had either note or description or website, because some of the edits with highway=* are legit and have a description/note where the newbie mapper explained what they did. 170 of the 208 do have a website tag, and finally, they *all* seem dodgy. (Interestingly it was not all ways - some highway=traffic_signals too!) I've run a revert on these 170 but the majority had already been fixed by others! That leaves us with a good 3115 objects to investigate. Many do clearly violate our "no advertising" rules but then again we don't want to bee to harsh with the cycle shop owner who maybe oversteps the line. I've put my interim results here http://www.remote.org/frederik/tmp/username-in-url.csv (for those where the username is in the URL) - do you think we should revert them all automatically? (Keep in mind many may have been reverted already - we'd only work on those where the spam version is still current.) and http://www.remote.org/frederik/tmp/other.csv for those where the username is not (fully) in the URL. Bye Frederik -- Frederik Ramm ## eMail [email protected] ## N49°00'09" E008°23'33" _______________________________________________ Talk-us mailing list [email protected] https://lists.openstreetmap.org/listinfo/talk-us

