On 04/13/2015 02:48 AM, Thomas Maurel wrote:
Dear Martin,I have investigated with our Web team and we believe that the command attempts to open a number of concurrent sessions in order to download all of the files. If that is the case then the problem is that our ftp server is configured to limit the number of concurrent sessions per user in order to prevent people using scripts to monopolise the server resources (and in some cases accidentally DoS attack the server).
Hi Thomas -- thank you for trouble-shooting this.The code used getURL(url, ...) without specifying a curl= argument. This causes a new CURLHandle to be constructed for each call to getURL(). These are closed when the garbage collector is run, but that is apparently too infrequent, and expensive to run explicitly.
I updated the code to include the argument curl=httr::handle_find(url)$handlewhich re-uses httr's pool of url-specific handlers hence limiting the number of simultaneous open connections. This seems to have been effective.
Thanks again, Martin
Hope this helps, Regards, ThomasOn 10 Apr 2015, at 13:40, Thomas Maurel <mau...@ebi.ac.uk> wrote: Hi Martin,On 10 Apr 2015, at 13:23, Martin Morgan <mtmor...@fredhutch.org> wrote: On 04/10/2015 04:34 AM, Rainer Johannes wrote:hi Martin, but if that's true, then I will never have a way to test whether the recipe actually works, right?I guess I don't really know what I'm talking about, and that insert=FALSE is intended to not actually do the insertion so that the (immediate) problem is not with AnnotationHubData. From the traceback below it seems like the error occurs in calls like the following library(RCurl) getURL("ftp://ftp.ensembl.org/pub/release-78/gtf/ailuropoda_melanoleuca/ <ftp://ftp.ensembl.org/pub/release-78/gtf/ailuropoda_melanoleuca/>", dirlistonly=TRUE) This seems to sometimes work and sometimes noturls[1][1] "ftp://ftp.ensembl.org/pub/release-78/gtf/ailuropoda_melanoleuca/ <ftp://ftp.ensembl.org/pub/release-78/gtf/ailuropoda_melanoleuca/>"getURL(urls[1], dirlistonly=TRUE)[1] "Ailuropoda_melanoleuca.ailMel1.78.gtf.gz\nCHECKSUMS\nREADME\n"getURL(urls[1], dirlistonly=TRUE)[1] "Ailuropoda_melanoleuca.ailMel1.78.gtf.gz\nCHECKSUMS\nREADME\n"getURL(urls[1], dirlistonly=TRUE)Error in function (type, msg, asError = TRUE) : Access denied: 530You are right, I�ve noticed the same thing. I will investigate and see if there is something wrong with our FTP site machine. Regards, Thomasthat's the full traceback:updateResources(AnnotationHubRoot=getWd(), BiocVersion=biocVersion(),preparerClasses="EnsemblGtfToEnsDbPreparer", insert=FALSE, metadataOnly=TRUE) INFO [2015-04-10 13:32:18] Preparer Class: EnsemblGtfToEnsDbPreparer Ailuropoda_melanoleuca.ailMel1.78.gtf.gz Anas_platyrhynchos.BGI_duck_1.0.78.gtf.gz Anolis_carolinensis.AnoCar2.0.78.gtf.gz Astyanax_mexicanus.AstMex102.78.gtf.gz Bos_taurus.UMD3.1.78.gtf.gz Caenorhabditis_elegans.WBcel235.78.gtf.gz Callithrix_jacchus.C_jacchus3.2.1.78.gtf.gz Error in function (type, msg, asError = TRUE) : Access denied: 530traceback()17: fun(structure(list(message = msg, call = sys.call()), class = c(typeName, "GenericCurlError", "error", "condition"))) 16: function (type, msg, asError = TRUE) { if (!is.character(type)) { i = match(type, CURLcodeValues) typeName = if (is.na(i)) character() else names(CURLcodeValues)[i] } typeName = gsub("^CURLE_", "", typeName) fun = (if (asError) stop else warning) fun(structure(list(message = msg, call = sys.call()), class = c(typeName, "GenericCurlError", "error", "condition"))) }(67L, "Access denied: 530", TRUE) 15: .Call("R_curl_easy_perform", curl, .opts, isProtected, .encoding, PACKAGE = "RCurl") 14: curlPerform(curl = curl, .opts = opts, .encoding = .encoding) 13: getURL(url, dirlistonly = TRUE) 12: strsplit(getURL(url, dirlistonly = TRUE), "\n") 11: (function (url, filename, tag, verbose = TRUE) { df2 <- strsplit(getURL(url, dirlistonly = TRUE), "\n")[[1]] df2 <- df2[grep(paste0(filename, "$"), df2)] drop <- grepl("latest", df2) | grepl("00-", df2) df2 <- df2[!drop] df2 <- paste0(url, df2) result <- lapply(df2, function(x) { if (verbose) message(basename(x)) tryCatch({ h = suppressWarnings(GET(x, config = config(nobody = TRUE, filetime = TRUE))) nams <- names(headers(h)) if ("last-modified" %in% nams) headers(h)[c("last-modified", "content-length")] else c(`last-modified` = NA, `content-length` = NA) }, error = function(err) { warning(basename(x), ": ", conditionMessage(err)) list(`last-modified` = character(), `content-length` = character()) }) }) size <- as.numeric(sapply(result, "[[", "content-length")) date <- strptime(sapply(result, "[[", "last-modified"), "%a, %d %b %Y %H:%M:%S", tz = "GMT") data.frame(fileurl = url, date, size, genome = tag, stringsAsFactors = FALSE) })(dots[[1L]][[8L]], filename = dots[[2L]][[1L]], tag = dots[[3L]][[8L]]) 10: mapply(FUN = f, ..., SIMPLIFY = FALSE) 9: Map(.ftpFileInfo, urls, filename = "gtf.gz", tag = basename(urls)) 8: do.call(rbind, Map(.ftpFileInfo, urls, filename = "gtf.gz", tag = basename(urls))) 7: .ensemblGtfSourceUrls(.ensemblBaseUrl, justRunUnitTest) 6: makeAnnotationHubMetadataFunction(currentMetadata, justRunUnitTest = justRunUnitTest, ...) 5: .generalNewResources(importPreparer, currentMetadata, makeAnnotationHubMetadataFunction, justRunUnitTest, ...) 4: .local(importPreparer, currentMetadata, ...) 3: newResources(preparerInstance, listOfExistingResources, justRunUnitTest = justRunUnitTest) 2: newResources(preparerInstance, listOfExistingResources, justRunUnitTest = justRunUnitTest) 1: updateResources(AnnotationHubRoot = getWd(), BiocVersion = biocVersion(), preparerClasses = "EnsemblGtfToEnsDbPreparer", insert = FALSE, metadataOnly = TRUE)On 10 Apr 2015, at 13:09, Martin Morgan <mtmor...@fredhutch.org <mailto:mtmor...@fredhutch.org <mailto:mtmor...@fredhutch.org>>> wrote: traceback()-- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793-- Thomas Maurel Bioinformatician - Ensembl Production Team European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD United Kingdom [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel-- Thomas Maurel Bioinformatician - Ensembl Production Team European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD United Kingdom
-- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel