Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-11 Thread ????
On 10/12/2010 23:51, John Doe wrote: > I'm In the process of creating a cleanup tool that checks archive.org and > webcitation.org if a URL is not archived it checks to see if it is live and > if it is I request that webcitation archive it on demand, and fills in the > archiveurl parameter of cite

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-10 Thread WJhonson
In a message dated 12/10/2010 2:58:08 PM Pacific Standard Time, jamesmikedup...@googlemail.com writes: > my idea was that you will want to search pages that are referenced by > wikipedia already, in my work on kosovo, it would be very helpful > because there are lots of bad results on google, a

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-10 Thread WJhonson
In a message dated 12/10/2010 1:10:26 PM Pacific Standard Time, jamesmikedup...@googlemail.com writes: > My point is we should index them ourselves. We should have the pages > used as references first listed in an easy to use manner and if > possible we should cache them. If they are not cacheab

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-10 Thread WJhonson
In a message dated 12/10/2010 1:31:20 PM Pacific Standard Time, jamesmikedup...@googlemail.com writes: > If we prefer pages that can be cached and translated, and mark the > others that cannot, then by natural selection we will in long term > replaces the pages that are not allowed to be cached

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-10 Thread WJhonson
In a message dated 12/10/2010 2:12:44 PM Pacific Standard Time, jamesmikedup...@googlemail.com writes: > Well, lets backtrack. > The original question was, how can we exclude wikipedia clones from the > search. > my idea was to create a search engine that includes only refs from > wikipedia in

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-10 Thread John Doe
I'm In the process of creating a cleanup tool that checks archive.org and webcitation.org if a URL is not archived it checks to see if it is live and if it is I request that webcitation archive it on demand, and fills in the archiveurl parameter of cite templates. John ___

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-10 Thread Mike Dupont
On Sat, Dec 11, 2010 at 12:02 AM, wrote: > In a message dated 12/10/2010 2:58:08 PM Pacific Standard Time, > jamesmikedup...@googlemail.com writes: > > > my idea was that you will want to search pages that are referenced by > wikipedia already, in my work on kosovo, it would be very helpful > bec

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-10 Thread Mike Dupont
On Fri, Dec 10, 2010 at 11:16 PM, wrote: > In a message dated 12/10/2010 2:12:44 PM Pacific Standard Time, > jamesmikedup...@googlemail.com writes: > > > Well, lets backtrack. > The original question was, how can we exclude wikipedia clones from the > search. > my idea was to create a search engi

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-10 Thread Mike Dupont
Well, lets backtrack. The original question was, how can we exclude wikipedia clones from the search. my idea was to create a search engine that includes only refs from wikipedia in it. then the idea was to make our own engine instead of only using google. lets agree that we need first a list of re

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-10 Thread Mike Dupont
I know all about the aspects of programming and copyright, I thought I answered the questions. Of course I can program this myself, and we can use open source indexing tools for that. the translations of course are a separate issue, they would be under the same restrictions as the source page. If

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-10 Thread Mike Dupont
On Fri, Dec 10, 2010 at 9:54 PM, wrote: > In a message dated 12/10/2010 12:48:31 PM Pacific Standard Time, > jamesmikedup...@googlemail.com writes: > > > I am not talking about books, just webpages. > > lets take ladygaga.com as example > > Wayback engine : > http://web.archive.org/web/*/http://w

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-10 Thread Mike Dupont
I am not talking about books, just webpages. lets take ladygaga.com as example Wayback engine : http://web.archive.org/web/*/http://www.ladygaga.com Google cache: http://webcache.googleusercontent.com/search?q=cache:1720lEPHkysJ:www.ladygaga.com/+lady+gaga&cd=1&hl=de&ct=clnk&gl=de&client=firefox

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-10 Thread Mike Dupont
i mean google has copies, caches of items for searching. How can google cache this? Archive.org has copyrighted materials as well. We should be able to save backups of this material as well. mike On Fri, Dec 10, 2010 at 5:16 PM, wrote: > In a message dated 12/9/2010 11:06:30 PM Pacific Standard

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-10 Thread WJhonson
In a message dated 12/9/2010 11:06:30 PM Pacific Standard Time, jamesmikedup...@googlemail.com writes: > Google does it, archive.org (wayback machine) does it, we can copy > them for caching and searching i assume. we are not changing the > license, but just preventing the information from disap

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-09 Thread Mike Dupont
On Thu, Dec 9, 2010 at 6:02 PM, wrote: > In a message dated 12/9/2010 2:51:39 AM Pacific Standard Time, > jamesmikedup...@googlemail.com writes: > > >> yes it would be great. As i said, it could just include all pages >> listed as REF pages and that would allow people to review the results >> and

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-09 Thread WJhonson
In a message dated 12/9/2010 2:51:39 AM Pacific Standard Time, jamesmikedup...@googlemail.com writes: > yes it would be great. As i said, it could just include all pages > listed as REF pages and that would allow people to review the results > and find pages that should not belong. > > We also

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-09 Thread Mike Dupont
On Thu, Dec 9, 2010 at 12:52 PM, Fred Bauder wrote: >> On Thu, Dec 9, 2010 at 9:55 AM, Domas Mituzas >> wrote: >>> >>> On Dec 8, 2010, at 6:21 PM, Mike Dupont wrote: >>> Sounds like we need to have a notable search engine that includes only "approved and allowed" sources, that would be

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-09 Thread Fred Bauder
> On Thu, Dec 9, 2010 at 9:55 AM, Domas Mituzas > wrote: >> >> On Dec 8, 2010, at 6:21 PM, Mike Dupont wrote: >> >>> Sounds like we need to have a notable search engine that includes only >>> "approved and allowed" sources, that would be nice to have. >> >> Sounds like a great community project, W

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-09 Thread Mike Dupont
On Thu, Dec 9, 2010 at 9:55 AM, Domas Mituzas wrote: > > On Dec 8, 2010, at 6:21 PM, Mike Dupont wrote: > >> Sounds like we need to have a notable search engine that includes only >> "approved and allowed" sources, that would be nice to have. > > Sounds like a great community project, Wiki Search!

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-09 Thread Domas Mituzas
On Dec 8, 2010, at 6:21 PM, Mike Dupont wrote: > Sounds like we need to have a notable search engine that includes only > "approved and allowed" sources, that would be nice to have. Sounds like a great community project, Wiki Search! Domas ___ founda

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-09 Thread Pascal Martin
ation Mailing List" Sent: Wednesday, December 08, 2010 7:58 PM Subject: Re: [Foundation-l] excluding Wikipedia clones from searching >I thought about this more, > It would be to extract a list of all pages that are included as > in the WP. We would use this for the search engi

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-08 Thread Mike Dupont
I thought about this more, It would be to extract a list of all pages that are included as in the WP. We would use this for the search engine. we should also make sure that all referenced pages (not linked ones) are stored in archive.org or someplace permanent. I wonder if there is some API to ext

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-08 Thread praveenp
On Wednesday 08 December 2010 05:16 PM, Amir E. Aharoni wrote: > I know that some Wikipedias customized Special:Search, adding other search > engines except Wikipedias built-in one. I tried to see whether any Wikipedia > added an ability to search using Google (or Bing, or Yahoo, or any other > sea

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-08 Thread Andrew Gray
On 8 December 2010 11:46, Amir E. Aharoni wrote: > For some time i used to fight this problem by adding > "-site:wikipedia.org-site: > wapedia.mobi -site:miniwiki.org" etc. to my search queries, but i hit a > wall: Google limits the search string to 32 words, and today there are many > more than

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-08 Thread Mike Dupont
Sounds like we need to have a notable search engine that includes only "approved and allowed" sources, that would be nice to have. On Wed, Dec 8, 2010 at 5:08 PM, David Gerard wrote: > On 8 December 2010 15:26, Amir E. Aharoni > wrote: > >> Yes, but that may also exclude sites that are useful a

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-08 Thread David Gerard
On 8 December 2010 15:26, Amir E. Aharoni wrote: > Yes, but that may also exclude sites that are useful and original, but > happen to mention Wikipedia. Add -"quoted sentence from article intro" to the search? - d. ___ foundation-l mailing list foun

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-08 Thread Amir E. Aharoni
On Wed, Dec 8, 2010 at 15:42, Fred Bauder wrote: > > If the copyright license has been followed -wikipedia should exclude all > clones. However, often, material is copied without crediting it to > Wikipedia. Yes, but that may also exclude sites that are useful and original, but happen to mention

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-08 Thread Fred Bauder
If the copyright license has been followed -wikipedia should exclude all clones. However, often, material is copied without crediting it to Wikipedia. Fred User:Fred Bauder > The "Google test" used to be a tool for checking the notability of a > subject > or to find sources about it. For some la

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-08 Thread Stephen Bain
On Wed, Dec 8, 2010 at 10:46 PM, Amir E. Aharoni wrote: > > For some time i used to fight this problem by adding > "-site:wikipedia.org-site: > wapedia.mobi -site:miniwiki.org" etc. to my search queries, but i hit a > wall: Google limits the search string to 32 words, and today there are many > m

Re: [Foundation-l] excluding Wikipedia clones from searching

2010-12-08 Thread Nikola Smolenski
On 12/08/2010 12:46 PM, Amir E. Aharoni wrote: > The "Google test" used to be a tool for checking the notability of a subject > or to find sources about it. For some languages it may be also used for > other purposes - for example in Hebrew, the spelling of which is not > established so well, it is

[Foundation-l] excluding Wikipedia clones from searching

2010-12-08 Thread Amir E. Aharoni
The "Google test" used to be a tool for checking the notability of a subject or to find sources about it. For some languages it may be also used for other purposes - for example in Hebrew, the spelling of which is not established so well, it is very frequently used for finding the most common spell