Hi,
On 10/26/2010 02:01 AM, Prof Brian Ripley wrote:
The example works for me (eventually: the site was very slow to respond)
--- nanoftp reads the response in 1024 byte chunks and makes sense of it.
We do provide debugging facilites via, say,
options(internet.info=0, warn=1, warning.length=4000)
which may help you debug this. Simply fiddling with the buffer size
doesn't help understanding and might well break something else.
I'll have a second look at this and will try to provide more
information. Thanks for your answer.
The code is essentially unchanged since 2006 when inter alia the buffer
size was doubled to 1024 (because libxml2 2.6.6 did), and AFAICS is
essentially unchanged in the current snapshots of libxml2.
I can surmise that the nanoftp C code might break if the actual control
code spanned 1024-byte chunks, but it needs someone with the problem to
debug in more detail.
This seems to be actually the case. I've added some printf statements
to the nanoftp module and what happens to the 220 code is the same as
here:
host <- "ftp.ncbi.nih.gov"
get.ftp.hello <- function(host)
{
port <- 21L
sk <- make.socket(host, port)
on.exit(close.socket(sk))
output <- character(0)
i <- 1L
repeat {
ss <- read.socket(sk, maxlen = 1024)
if (ss == "") break
cat(i, ": ", ss, "\n", sep="")
i <- i + 1L
output <- append(output, ss)
}
output
}
> hello <- get.ftp.hello(host)
1: 220-
2: Warning Notice!
This is a U.S. Gov
3: ernment computer system, which may be accessed and used
only for authorized Government business by authorized personnel.
Unauthorized access or use of this computer system may subject
violators to
criminal, civil, and/or administrative action.
All information on this computer system may be intercepted, recorded,
read,
copied, and disclosed by and to authorized personnel for official
purposes,
including criminal investigations. Such information includes sensitive
data
encrypted to comply with confidentiality and privacy requirements. Access
or use of this computer system by any person, whether authorized or
unauthorized, constitutes consent to these terms. There is no right of
privacy in this system.
---
Welcome to the NCBI ftp server! The official anonymous access URL is
ftp://ftp.ncbi.nih.gov
Public data may be downloaded by logging in as "anonymous" using your
E-mail address as a password.
Please see ftp://ftp.ncbi.nih.gov/README.ftp for hints on large file
transfers
22
4: 0 FTP Server ready.
5: 421 Login Timeout (60 seconds): closing control connection.
> nchar(hello)
[1] 6 40 1024 21 61
But there seems to be something else going on, because, as I mentioned
earlier, sometimes (once every 20 attempts or so) I get a segfault when
trying to open this url with url().
Cheers,
H.
On Thu, 21 Oct 2010, Hervé Pagès wrote:
Hi,
Trying to access files on the ftp server at ftp.ncbi.nih.gov
will either give a time out or sometimes even a segfault on Linux.
The 2 following methods give the same results:
f <- url("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SOFT/GDS/GDS10.soft.gz",
open="r")
download.file("ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SOFT/GDS/GDS10.soft.gz",
destfile=tempfile())
I've tried one or the other method with all release versions >= 2.8
and with current R devel and they always fail to connect to this
FTP server.
What's particular about this FTP server is that it sends a long HELLO
message before it finally sends the 220 control code.
Using the Unix ftp client:
Well, one of many such.
-----------------------------------------------------------------------------
hpa...@latitude:~$ ftp ftp.ncbi.nih.gov
Connected to ftp.ncbi.nih.gov.
220-
Warning Notice!
This is a U.S. Government computer system, which may be accessed and used
only for authorized Government business by authorized personnel.
Unauthorized access or use of this computer system may subject
violators to
criminal, civil, and/or administrative action.
All information on this computer system may be intercepted, recorded,
read,
copied, and disclosed by and to authorized personnel for official
purposes,
including criminal investigations. Such information includes sensitive
data
encrypted to comply with confidentiality and privacy requirements. Access
or use of this computer system by any person, whether authorized or
unauthorized, constitutes consent to these terms. There is no right of
privacy in this system.
---
Welcome to the NCBI ftp server! The official anonymous access URL is
ftp://ftp.ncbi.nih.gov
Public data may be downloaded by logging in as "anonymous" using your
E-mail address as a password.
Please see ftp://ftp.ncbi.nih.gov/README.ftp for hints on large file
transfers
220 FTP Server ready.
-----------------------------------------------------------------------------
This seems to cause problems to the nanoftp module
(src/modules/internet/nanoftp.c) used by url() and download.file()
as it doesn't seem to be able to catch the 220 control code.
I'm not familiar with the nanoftp module, or with socket programming in
general, or with RFC 959 (FTP protocal), so I'm not really in a position
to say what's going wrong exactly in the module but it seems that
increasing the value of FTP_BUF_SIZE (size of the buffer for data
received from the control connection) fixes the problem.
Currently this is:
#define FTP_BUF_SIZE 1024
but, interestingly, *any* value > 1024 seems to fix the problem (even
though the long HELLO message above is 1091 bytes).
Any idea what's going on?
Thanks,
H.
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpa...@fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpa...@fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel