On 04/13/2015 02:48 AM, Thomas Maurel wrote:
Dear Martin,

I have investigated with our Web team and we believe that the command
attempts to open a number of concurrent sessions in order to download all of
the files. If that is the case then the problem is that our ftp server is
configured to limit the number of concurrent sessions per user in order to
prevent people using scripts to monopolise the server resources (and in some
cases accidentally DoS attack the server).

Hi Thomas -- thank you for trouble-shooting this.

The code used getURL(url, ...) without specifying a curl= argument. This causes a new CURLHandle to be constructed for each call to getURL(). These are closed when the garbage collector is run, but that is apparently too infrequent, and expensive to run explicitly.

I updated the code to include the argument

  curl=httr::handle_find(url)$handle

which re-uses httr's pool of url-specific handlers hence limiting the number of simultaneous open connections. This seems to have been effective.

Thanks again,

Martin



Hope this helps, Regards, Thomas
On 10 Apr 2015, at 13:40, Thomas Maurel <mau...@ebi.ac.uk> wrote:

Hi Martin,

On 10 Apr 2015, at 13:23, Martin Morgan <mtmor...@fredhutch.org> wrote:

On 04/10/2015 04:34 AM, Rainer Johannes wrote:
hi Martin,

but if that's true, then I will never have a way to test whether the
recipe actually works, right?

I guess I don't really know what I'm talking about, and that insert=FALSE
is intended to not actually do the insertion so that the (immediate)
problem is not with AnnotationHubData.

From the traceback below it seems like the error occurs in calls like the
following

library(RCurl)
getURL("ftp://ftp.ensembl.org/pub/release-78/gtf/ailuropoda_melanoleuca/
<ftp://ftp.ensembl.org/pub/release-78/gtf/ailuropoda_melanoleuca/>",
dirlistonly=TRUE)

This seems to sometimes work and sometimes not

urls[1]
[1] "ftp://ftp.ensembl.org/pub/release-78/gtf/ailuropoda_melanoleuca/
<ftp://ftp.ensembl.org/pub/release-78/gtf/ailuropoda_melanoleuca/>"
getURL(urls[1], dirlistonly=TRUE)
[1] "Ailuropoda_melanoleuca.ailMel1.78.gtf.gz\nCHECKSUMS\nREADME\n"
getURL(urls[1], dirlistonly=TRUE)
[1] "Ailuropoda_melanoleuca.ailMel1.78.gtf.gz\nCHECKSUMS\nREADME\n"
getURL(urls[1], dirlistonly=TRUE)
Error in function (type, msg, asError = TRUE)  : Access denied: 530
You are right, I�ve noticed the same thing. I will investigate and see if
there is something wrong with our FTP site machine.

Regards, Thomas



that's the full traceback:

updateResources(AnnotationHubRoot=getWd(),
BiocVersion=biocVersion(),
preparerClasses="EnsemblGtfToEnsDbPreparer", insert=FALSE,
metadataOnly=TRUE) INFO [2015-04-10 13:32:18] Preparer Class:
EnsemblGtfToEnsDbPreparer Ailuropoda_melanoleuca.ailMel1.78.gtf.gz
Anas_platyrhynchos.BGI_duck_1.0.78.gtf.gz
Anolis_carolinensis.AnoCar2.0.78.gtf.gz
Astyanax_mexicanus.AstMex102.78.gtf.gz Bos_taurus.UMD3.1.78.gtf.gz
Caenorhabditis_elegans.WBcel235.78.gtf.gz
Callithrix_jacchus.C_jacchus3.2.1.78.gtf.gz Error in function (type,
msg, asError = TRUE)  : Access denied: 530
traceback()
17: fun(structure(list(message = msg, call = sys.call()), class =
c(typeName, "GenericCurlError", "error", "condition"))) 16: function
(type, msg, asError = TRUE) { if (!is.character(type)) { i =
match(type, CURLcodeValues) typeName = if (is.na(i)) character() else
names(CURLcodeValues)[i] } typeName = gsub("^CURLE_", "", typeName) fun
= (if (asError) stop else warning) fun(structure(list(message = msg,
call = sys.call()), class = c(typeName, "GenericCurlError", "error",
"condition"))) }(67L, "Access denied: 530", TRUE) 15:
.Call("R_curl_easy_perform", curl, .opts, isProtected, .encoding,
PACKAGE = "RCurl") 14: curlPerform(curl = curl, .opts = opts, .encoding
= .encoding) 13: getURL(url, dirlistonly = TRUE) 12:
strsplit(getURL(url, dirlistonly = TRUE), "\n") 11: (function (url,
filename, tag, verbose = TRUE) { df2 <- strsplit(getURL(url,
dirlistonly = TRUE), "\n")[[1]] df2 <- df2[grep(paste0(filename, "$"),
df2)] drop <- grepl("latest", df2) | grepl("00-", df2) df2 <-
df2[!drop] df2 <- paste0(url, df2) result <- lapply(df2, function(x) {
if (verbose) message(basename(x)) tryCatch({ h =
suppressWarnings(GET(x, config = config(nobody = TRUE, filetime =
TRUE))) nams <- names(headers(h)) if ("last-modified" %in% nams)
headers(h)[c("last-modified", "content-length")] else c(`last-modified`
= NA, `content-length` = NA) }, error = function(err) {
warning(basename(x), ": ", conditionMessage(err)) list(`last-modified`
= character(), `content-length` = character()) }) }) size <-
as.numeric(sapply(result, "[[", "content-length")) date <-
strptime(sapply(result, "[[", "last-modified"), "%a, %d %b %Y
%H:%M:%S", tz = "GMT") data.frame(fileurl = url, date, size, genome =
tag, stringsAsFactors = FALSE) })(dots[[1L]][[8L]], filename =
dots[[2L]][[1L]], tag = dots[[3L]][[8L]]) 10: mapply(FUN = f, ...,
SIMPLIFY = FALSE) 9: Map(.ftpFileInfo, urls, filename = "gtf.gz", tag =
basename(urls)) 8: do.call(rbind, Map(.ftpFileInfo, urls, filename =
"gtf.gz", tag = basename(urls))) 7:
.ensemblGtfSourceUrls(.ensemblBaseUrl, justRunUnitTest) 6:
makeAnnotationHubMetadataFunction(currentMetadata, justRunUnitTest =
justRunUnitTest, ...) 5: .generalNewResources(importPreparer,
currentMetadata, makeAnnotationHubMetadataFunction, justRunUnitTest,
...) 4: .local(importPreparer, currentMetadata, ...) 3:
newResources(preparerInstance, listOfExistingResources, justRunUnitTest
= justRunUnitTest) 2: newResources(preparerInstance,
listOfExistingResources, justRunUnitTest = justRunUnitTest) 1:
updateResources(AnnotationHubRoot = getWd(), BiocVersion =
biocVersion(), preparerClasses = "EnsemblGtfToEnsDbPreparer", insert =
FALSE, metadataOnly = TRUE)



On 10 Apr 2015, at 13:09, Martin Morgan <mtmor...@fredhutch.org
<mailto:mtmor...@fredhutch.org <mailto:mtmor...@fredhutch.org>>>
wrote:

traceback()



-- Computational Biology / Fred Hutchinson Cancer Research Center 1100
Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861 Phone: (206) 667-2793

-- Thomas Maurel Bioinformatician - Ensembl Production Team European
Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory
Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD United Kingdom


[[alternative HTML version deleted]]

_______________________________________________ Bioc-devel@r-project.org
mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel

-- Thomas Maurel Bioinformatician - Ensembl Production Team European
Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory
Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD United Kingdom



--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to