On 11/10/2016 05:36 AM, Kevin RUE wrote:
Hi Martin,
Thanks for spending the time and effort to refactor and integrate my
code into BiocCheck. I didn't intend to give you so much to think about !
For the record, I used the "checkFunctionLengths" method (now starting
at line 829 of checks.R) as a template to offer code somewhat homogenous
with what I could see in place already. I believe that function should
also be refactored using your new "Context" and "handleContext" helpers,
right?
yep
Side note:
Far from trying to compete with the BiocCheck Bioconductor package, I
couldn't help but update my "follow-up" "BiocCheckTools" Python code
I think the right place for this detailed stuff is actually in lintr and
/ or formatR.
<https://github.com/kevinrue/BiocCheckTools/blob/master/README.md> in
the meantime (sorry for unoriginal name, I'll take suggestions!) taking
Kasper's advice on board:
* Single master file which runs everything and gives a report.
o *Bonus*: each module can be run separately (line_chars; tab_width).
* All line numbers are reported
* Content of the line is not displayed
* I didn't mark up TABs, but I indicate how many spaces they are made
of (when not a multiple of 4)
All the best,
Kevin
On Wed, Nov 9, 2016 at 11:31 PM, Martin Morgan
<martin.mor...@roswellpark.org <mailto:martin.mor...@roswellpark.org>>
wrote:
On 11/03/2016 08:14 AM, Kevin RUE wrote:
Apologies for the additional spam, for two reasons:
* The diff files that I've previously sent had the base and
modified
versions swapped. This new one fixes that.
* This new diff file (always relative to the code I cloned from
Bioconductor-mirror) also fixes a bug whereby the updated
code would
fail on packages that do not break any of the three guidelines.
Hi Kevin -- thanks for these, I incorporated a version in 1.11.1.
The report is for the first 6 lines of offenders; the code for
finding all offenders is mentioned in the vignette
BiocCheck:::checkFormatting("path/to/YourPackage", nlines=Inf)
Marcel Ramos mentions the lintr (CRAN) and goodpractice (github
only, https://github.com/MangoTheCat/goodpractice
<https://github.com/MangoTheCat/goodpractice>) packages as useful
resources for these types of questions; also formatR (CRAN).
For the record, it was fun to work through your code. The basic
pattern was to read lines and flag those that violated a (usually
simple) test, add these to a data frame, then if after parsing all
files the data frame had any rows report these to the user.
I standardized the columns of the data frame across checks to
facilitate re-use, and imposed the standard by using a constructor
-- non-exported Context(). I took a little bit of pain to make the
constructor work (return a data frame with the correct columns but
no rows) when there are no lines of code flagged to be added to the
context, so that I could use it without have to insert a bunch of
conditional `if (...)` in my code that make it both harder to
understand and harder to test. Also, the constructor is vectorized,
whereas your implementation was iterative.
For reporting to the user, I wrote a small helper handleContext()
that I could re-use across the different types of checks (and likely
elsewhere in the package) and also maintain the separation of what
is reported to the user (e.g., in handleContext()) from how it is
reported to the user -- .msg, .verbatim, etc in the utils.R file.
Probably the reporting could have been refactored further.
I feel guilty about not writing unit tests; the package does include
tests, including a handy package-generator function to make packages
that are invalid in various ways. It seems like a better way to go
would be to further refactor the body of the checkFormatting()
function so that one could for instance invoke a function
checkLineLengths(pkg, file, lines) and get back a data.frame created
by Context(); it would then be easy to test these for various
'lines' without having to figure out how to make test packages. But
this seemed of course like too much additional work for this feature
(I'll probably regret not expending the effort at a future date).
I also found myself using the horrible 'copy-and-append' paradigm
ctxt = Context()
for (fl in files) {
...
ctxt = rbind(ctxt, Context(...)
}
which makes a copy of (the increasingly large) ctxt each time
through the loop and therefore scales quadratically (when each file
has some problems) with the number of files. I'm counting on the
number of files to be small (e.g., less than 10000, when the
copy-and-append pattern starts to be expensive even for trivial
operations; mzR has ~8750 files, though many of these are not
checked for formatting) so that this pattern does not become a
bottleneck. It was still painful to use...
Martin
Best,
Kevin
On Thu, Nov 3, 2016 at 11:49 AM, Kevin RUE <kevinru...@gmail.com
<mailto:kevinru...@gmail.com>
<mailto:kevinru...@gmail.com <mailto:kevinru...@gmail.com>>> wrote:
Hi all,
Please find attached the diff relative to the code that I cloned
from Bioconductor-mirror yesterday (please ignore the
previous diff
file).
Basically three new features:
* As per previous email: display up to the first 6 lines
that are
over 80 characters long
* *New*: display up to the first 6 lines that are not
indented by
a multiple of 4 spaces
* *New*: display up to the first 6 lines that use TAB
instead of 4
spaces for indentation
I also attach the output of the updated code
<https://github.com/kevinrue/BiocCheck
<https://github.com/kevinrue/BiocCheck>>, to illustrate the changes.
Notes:
* For demonstration purpose, I indented a handful of lines
from
the checks.R file itself with TAB characters. I assume
that's
OK, as some lines were already longer than 80 characters
and not
indented by a multiple of 4 spaces.
All the best,
Kevin
On Wed, Nov 2, 2016 at 10:00 PM, Kevin RUE
<kevinru...@gmail.com <mailto:kevinru...@gmail.com>
<mailto:kevinru...@gmail.com <mailto:kevinru...@gmail.com>>>
wrote:
Me again :)
Please find attached the first patch to print the first
6 lines
over 80 characters long. (I'll get to the tabulation
offenders
next).
Note that all the offending lines are stored in the
"df.length"
data.frame. How about an option like
"fullReport=c(FALSE, TRUE)"
that print *all* the offending lines?
The data.frame also stores the content of the lines for the
record, but does not print them. I think Kasper is right:
filename and line should be enough to track down the line.
All the best,
Kevin
On Wed, Nov 2, 2016 at 8:08 PM, Kevin RUE
<kevinru...@gmail.com <mailto:kevinru...@gmail.com>
<mailto:kevinru...@gmail.com
<mailto:kevinru...@gmail.com>>> wrote:
Thanks for the feedback!
I also tend to prefer *all* the lines being reported
(or to
be honest, that was really true when I had lots of
them; a
problem that I largely mitigated by fixing all of
them once
and subsequently paying more attention while
developing).
Printing the content of the offending line somewhat
helps me
spot the line faster (more so for tab issues). But I
must
admit that showing the whole line is somewhat
"overkill". I
just started thinking of a compromise being to only
show the
first N characters of the line, with N being 80
minus the
number of characters necessary to print the filename and
line number.
Thanks Martin for pointing out the lines in
BiocCheck. (Now
I feel bad for not having checked sooner.. hehe!)
I think the idea of BiocCheck showing the first 6
offenders
in BiocCheck quite nice, as I rarely have more since
I use
using the RStudio "Tools > Global Options > Code >
Display >
Show Margin > Margin column: 80" feature.
I'll give a go at both approaches (developing
BiocCheck and
my own scripts)
Cheers,
Kevin
On Wed, Nov 2, 2016 at 7:41 PM, Kasper Daniel Hansen
<kasperdanielhan...@gmail.com
<mailto:kasperdanielhan...@gmail.com>
<mailto:kasperdanielhan...@gmail.com
<mailto:kasperdanielhan...@gmail.com>>> wrote:
I would prefer all line numbers reported, but on the
other hand I am indifferent wrt. the content of the
line, unless (say) TABs are marked up somehow.
Kasper
On Wed, Nov 2, 2016 at 3:17 PM, Martin Morgan
<martin.mor...@roswellpark.org
<mailto:martin.mor...@roswellpark.org>
<mailto:martin.mor...@roswellpark.org
<mailto:martin.mor...@roswellpark.org>>> wrote:
On 11/02/2016 02:49 PM, Kevin RUE wrote:
Dear all,
Just thought I'd share a handful of
scripts that
I wrote to follow up on
certain NOTE messages thrown by R CMD
BiocCheck.
https://github.com/kevinrue/BiocCheckTools
<https://github.com/kevinrue/BiocCheckTools>
<https://github.com/kevinrue/BiocCheckTools
<https://github.com/kevinrue/BiocCheckTools>>
They're very simple, but I occasionally find
them quite convenient.
Apologies if something similar already
exists
somewhere :)
Maybe consider creating a diff against the
source
code that, e.g., reported the first 6
offenders? The
relevant lines are near
https://github.com/Bioconductor-mirror/BiocCheck/blob/master/R/checks.R#L1081
<https://github.com/Bioconductor-mirror/BiocCheck/blob/master/R/checks.R#L1081>
<https://github.com/Bioconductor-mirror/BiocCheck/blob/master/R/checks.R#L1081
<https://github.com/Bioconductor-mirror/BiocCheck/blob/master/R/checks.R#L1081>>
Martin
All the best,
Kevin
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel@r-project.org
<mailto:Bioc-devel@r-project.org>
<mailto:Bioc-devel@r-project.org
<mailto:Bioc-devel@r-project.org>> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
<https://stat.ethz.ch/mailman/listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
This email message may contain legally
privileged
and/or...{{dropped:2}}
_______________________________________________
Bioc-devel@r-project.org
<mailto:Bioc-devel@r-project.org>
<mailto:Bioc-devel@r-project.org
<mailto:Bioc-devel@r-project.org>> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
<https://stat.ethz.ch/mailman/listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
This email message may contain legally privileged and/or
confidential information. If you are not the intended recipient(s),
or the employee or agent responsible for the delivery of this
message to the intended recipient(s), you are hereby notified that
any disclosure, copying, distribution, or use of this email message
is prohibited. If you have received this message in error, please
notify the sender immediately by e-mail and delete this email
message from your computer. Thank you.
This email message may contain legally privileged and/or...{{dropped:2}}
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel