Re: [R] 'R' Software Output Plagiarism

peter dalgaard Tue, 22 Sep 2015 13:09:12 -0700

Marc,

I don't think Copyright/Intellectual property issues factor into this. Urkund 
and similar tools are to my knowledge entirely about plagiarism. So the issue 
would seem to be that the R output is considered identical or nearly indentical 
to R output in other published orotherwise  submitted material.


What puzzles me (except for how a document can be deemed 32% plagiarized in 25% 
of the text) is whether this includes the numbers and variable names. If those 
are somehow factored out, then any R regression could be pretty much identical 
to any other R regression. However, two analyses with similar variable names 
could happen if they are based on the same cookbook recipe and analyses with 
similar numerical output come from analyzing the same standard data. Such 
situations would not necessarily be considered plagiarism (I mean: If you claim 
that you are analyzing data from experiments that you yourself have performed, 
and your numbers are exactly identical to something that has been previously 
published, then it would be suspect. If you analyze something from public 
sources, someone else might well have done the same thing.). 

Similarly to John Kane, I think it is necessary to know exactly what sources 
the text is claimed to be plagiarized from and/or what parts of the text that 
are being matched by Urkund. If it turns out that Urkund is generating false 
positives, then this needs to be pointed out to them and to the people basing 
decisions on it.

-pd

> On 22 Sep 2015, at 18:24 , Marc Schwartz <marc_schwa...@me.com> wrote:
> 
> Hi,
> 
> With the usual caveat that I Am Not A Lawyer....and that I am not speaking on 
> behalf of any organization...
> 
> My guess is that they are claiming that the output of R, simply being copied 
> and pasted verbatim into your thesis constitutes the use of copyrighted 
> output from the software.
> 
> It is not clear to me that R's output is copyrighted by the R Foundation (or 
> by other parties for CRAN packages), albeit, the source code underlying R is, 
> along with other copyright owner's as apropos. There is some caselaw to 
> support the notion that the output alone is not protected in a similar 
> manner, but that may be country specific.
> 
> Did you provide any credit to R (see the output of citation() ) in your 
> thesis and indicate that your analyses were performed using R?
> 
> If R is uncredited, I could see them raising the issue.
> 
> You might check with your institution's legal/policy folks to see if there is 
> any guidance provided for students regarding the crediting of software used 
> in this manner, especially if that guidance is at no cost to you.
> 
> Regards,
> 
> Marc Schwartz
> 
> 
>> On Sep 22, 2015, at 11:01 AM, Bert Gunter <bgunter.4...@gmail.com> wrote:
>> 
>> 1. It is highly unlikely that we could be of help (unless someone else
>> has experienced this and knows what happened). You will have to
>> contact the Urkund people and ask them why their algorithms raised the
>> flags.
>> 
>> 2. But of course, the regression methodology is not "your own" -- it's
>> just a standard tool that you used in your work, which is entirely
>> legitimate of course.
>> 
>> Cheers,
>> Bert
>> 
>> 
>> Bert Gunter
>> 
>> "Data is not information. Information is not knowledge. And knowledge
>> is certainly not wisdom."
>>  -- Clifford Stoll
>> 
>> 
>> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver
>> <oliver.barr...@skema.edu> wrote:
>>> 
>>> Dear 'R' community support,
>>> 
>>> 
>>> I am a student at Skema business school and I have recently submitted my 
>>> MSc thesis/dissertation. This has been passed on to an external plagiarism 
>>> service provider, Urkund, who have scanned my document and returned a 
>>> plagiarism report to my professor having detected 32% plagiarism.
>>> 
>>> 
>>> I have contacted Urkund regarding this issue having committed no such 
>>> plagiarism and they have told me that all the plagiarism detected in my 
>>> document comes from the last 25% which consists only of 'R' regressions 
>>> like the one I have pasted below:
>>> 
>>> lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
>>>   Fed.t.4., data = OLS_CAR, x = TRUE)
>>> 
>>> Residuals:
>>>     Min        1Q    Median        3Q       Max
>>> -0.154587 -0.015961  0.001429  0.017196  0.110907
>>> 
>>> Coefficients:
>>>            Estimate Std. Error t value Pr(>|t|)
>>> (Intercept) -0.001630   0.001763  -0.925   0.3559
>>> Fed         -0.121595   0.165359  -0.735   0.4627
>>> Fed.t.1.     0.344014   0.140979   2.440   0.0153 *
>>> Fed.t.2.     0.026529   0.143648   0.185   0.8536
>>> Fed.t.3.     0.622357   0.142021   4.382 1.62e-05 ***
>>> Fed.t.4.     0.291985   0.158914   1.837   0.0671 .
>>> ---
>>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>> 
>>> Residual standard error: 0.0293 on 304 degrees of freedom
>>> (20 observations deleted due to missingness)
>>> Multiple R-squared:  0.08629,  Adjusted R-squared:  0.07126
>>> F-statistic: 5.742 on 5 and 304 DF,  p-value: 4.422e-05
>>> 
>>> I have produced all of these regressions myself and pasted them directly 
>>> from the 'R' software package. My regression methodology is entirely my own 
>>> along with the sourcing and preperation of the data used to produce these 
>>> statistics.
>>> 
>>> I would be very grateful if you could provide my with some clarity as to 
>>> why this output from 'R' is reading as plagiarism.
>>> 
>>> I would like to thank you in advance,
>>> 
>>> Kind regards,
>>> 
>>> Oliver Barrett
>>> (+44) 7341 834 217
>>> 
>>>       [[alternative HTML version deleted]]
>>> 
>>> ______________________________________________
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> ______________________________________________
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 'R' Software Output Plagiarism

Reply via email to