On 11/03/2016 08:14 AM, Kevin RUE wrote:
Apologies for the additional spam, for two reasons:
* The diff files that I've previously sent had the base and modified
versions swapped. This new one fixes that.
* This new diff file (always relative to the code I cloned from
Bioconductor-mirror) also fixes a bug whereby the updated code would
fail on packages that do not break any of the three guidelines.
Hi Kevin -- thanks for these, I incorporated a version in 1.11.1.
The report is for the first 6 lines of offenders; the code for finding
all offenders is mentioned in the vignette
BiocCheck:::checkFormatting("path/to/YourPackage", nlines=Inf)
Marcel Ramos mentions the lintr (CRAN) and goodpractice (github only,
https://github.com/MangoTheCat/goodpractice) packages as useful
resources for these types of questions; also formatR (CRAN).
For the record, it was fun to work through your code. The basic pattern
was to read lines and flag those that violated a (usually simple) test,
add these to a data frame, then if after parsing all files the data
frame had any rows report these to the user.
I standardized the columns of the data frame across checks to facilitate
re-use, and imposed the standard by using a constructor -- non-exported
Context(). I took a little bit of pain to make the constructor work
(return a data frame with the correct columns but no rows) when there
are no lines of code flagged to be added to the context, so that I could
use it without have to insert a bunch of conditional `if (...)` in my
code that make it both harder to understand and harder to test. Also,
the constructor is vectorized, whereas your implementation was iterative.
For reporting to the user, I wrote a small helper handleContext() that I
could re-use across the different types of checks (and likely elsewhere
in the package) and also maintain the separation of what is reported to
the user (e.g., in handleContext()) from how it is reported to the user
-- .msg, .verbatim, etc in the utils.R file. Probably the reporting
could have been refactored further.
I feel guilty about not writing unit tests; the package does include
tests, including a handy package-generator function to make packages
that are invalid in various ways. It seems like a better way to go would
be to further refactor the body of the checkFormatting() function so
that one could for instance invoke a function checkLineLengths(pkg,
file, lines) and get back a data.frame created by Context(); it would
then be easy to test these for various 'lines' without having to figure
out how to make test packages. But this seemed of course like too much
additional work for this feature (I'll probably regret not expending the
effort at a future date).
I also found myself using the horrible 'copy-and-append' paradigm
ctxt = Context()
for (fl in files) {
...
ctxt = rbind(ctxt, Context(...)
}
which makes a copy of (the increasingly large) ctxt each time through
the loop and therefore scales quadratically (when each file has some
problems) with the number of files. I'm counting on the number of files
to be small (e.g., less than 10000, when the copy-and-append pattern
starts to be expensive even for trivial operations; mzR has ~8750 files,
though many of these are not checked for formatting) so that this
pattern does not become a bottleneck. It was still painful to use...
Martin
Best,
Kevin
On Thu, Nov 3, 2016 at 11:49 AM, Kevin RUE <kevinru...@gmail.com
<mailto:kevinru...@gmail.com>> wrote:
Hi all,
Please find attached the diff relative to the code that I cloned
from Bioconductor-mirror yesterday (please ignore the previous diff
file).
Basically three new features:
* As per previous email: display up to the first 6 lines that are
over 80 characters long
* *New*: display up to the first 6 lines that are not indented by
a multiple of 4 spaces
* *New*: display up to the first 6 lines that use TAB instead of 4
spaces for indentation
I also attach the output of the updated code
<https://github.com/kevinrue/BiocCheck>, to illustrate the changes.
Notes:
* For demonstration purpose, I indented a handful of lines from
the checks.R file itself with TAB characters. I assume that's
OK, as some lines were already longer than 80 characters and not
indented by a multiple of 4 spaces.
All the best,
Kevin
On Wed, Nov 2, 2016 at 10:00 PM, Kevin RUE <kevinru...@gmail.com
<mailto:kevinru...@gmail.com>> wrote:
Me again :)
Please find attached the first patch to print the first 6 lines
over 80 characters long. (I'll get to the tabulation offenders
next).
Note that all the offending lines are stored in the "df.length"
data.frame. How about an option like "fullReport=c(FALSE, TRUE)"
that print *all* the offending lines?
The data.frame also stores the content of the lines for the
record, but does not print them. I think Kasper is right:
filename and line should be enough to track down the line.
All the best,
Kevin
On Wed, Nov 2, 2016 at 8:08 PM, Kevin RUE <kevinru...@gmail.com
<mailto:kevinru...@gmail.com>> wrote:
Thanks for the feedback!
I also tend to prefer *all* the lines being reported (or to
be honest, that was really true when I had lots of them; a
problem that I largely mitigated by fixing all of them once
and subsequently paying more attention while developing).
Printing the content of the offending line somewhat helps me
spot the line faster (more so for tab issues). But I must
admit that showing the whole line is somewhat "overkill". I
just started thinking of a compromise being to only show the
first N characters of the line, with N being 80 minus the
number of characters necessary to print the filename and
line number.
Thanks Martin for pointing out the lines in BiocCheck. (Now
I feel bad for not having checked sooner.. hehe!)
I think the idea of BiocCheck showing the first 6 offenders
in BiocCheck quite nice, as I rarely have more since I use
using the RStudio "Tools > Global Options > Code > Display >
Show Margin > Margin column: 80" feature.
I'll give a go at both approaches (developing BiocCheck and
my own scripts)
Cheers,
Kevin
On Wed, Nov 2, 2016 at 7:41 PM, Kasper Daniel Hansen
<kasperdanielhan...@gmail.com
<mailto:kasperdanielhan...@gmail.com>> wrote:
I would prefer all line numbers reported, but on the
other hand I am indifferent wrt. the content of the
line, unless (say) TABs are marked up somehow.
Kasper
On Wed, Nov 2, 2016 at 3:17 PM, Martin Morgan
<martin.mor...@roswellpark.org
<mailto:martin.mor...@roswellpark.org>> wrote:
On 11/02/2016 02:49 PM, Kevin RUE wrote:
Dear all,
Just thought I'd share a handful of scripts that
I wrote to follow up on
certain NOTE messages thrown by R CMD BiocCheck.
https://github.com/kevinrue/BiocCheckTools
<https://github.com/kevinrue/BiocCheckTools>
They're very simple, but I occasionally find
them quite convenient.
Apologies if something similar already exists
somewhere :)
Maybe consider creating a diff against the source
code that, e.g., reported the first 6 offenders? The
relevant lines are near
https://github.com/Bioconductor-mirror/BiocCheck/blob/master/R/checks.R#L1081
<https://github.com/Bioconductor-mirror/BiocCheck/blob/master/R/checks.R#L1081>
Martin
All the best,
Kevin
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel@r-project.org
<mailto:Bioc-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
This email message may contain legally privileged
and/or...{{dropped:2}}
_______________________________________________
Bioc-devel@r-project.org
<mailto:Bioc-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
This email message may contain legally privileged and/or...{{dropped:2}}
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel