On Sat, Apr 29, 2017 at 3:24 PM, lee <l...@yagibdah.de> wrote: > Mick <michaelkintz...@gmail.com> writes: > > > On Tuesday 25 Apr 2017 16:45:37 Alan McKinnon wrote: > >> On 25/04/2017 16:29, lee wrote: > >> > Hi, > >> > > >> > since the usage of FTP seems to be declining, what is a replacement > >> > which is at least as good as FTP? > >> > > >> > I'm aware that there's webdav, but that's very awkward to use and > >> > missing features. > >> > >> Why not stick with ftp? > >> Or, put another way, why do you feel you need to use something else? > >> > >> There's always dropbox > > > > > > Invariably all web hosting ISPs offer ftp(s) for file upload/download. > If you > > pay a bit more you should be able to get ssh/scp/sftp too. Indeed, many > ISPs > > throw in scp/sftp access as part of their basic package. > > > > Webdav(s) offers the same basic upload/download functionality, so I am > not > > sure what you find awkward about it, although I'd rather use lftp > instead of > > cadaver any day. ;-) > > > > As Alan mentioned, with JavaScript'ed web pages these days there are many > > webapp'ed ISP offerings like Dropbox and friends. > > > > What is the use case you have in mind? > > transferring large amounts of data and automatization in processing at > least some of it, without involving a 3rd party > > "Large amounts" can be "small" like 100MB --- or over 50k files in 12GB, > or even more. The mirror feature of lftp is extremely useful for such > things. > > I wouldn't ever want having to mess around with web pages to figure out > how to do this. Ftp is plain and simple. So you see why I'm explicitly > asking for a replacement which is at least as good as ftp. > > > -- > "Didn't work" is an error. > > Half petabyte datasets aren't really something I'd personally *ever* trust ftp with in the first place. That said, it depends entirely on the network you're working with. Are you pushing this data in/out of the network your machines live in, or are you working primarily internally? If internal, what're the network side capabilities you have? Since you're likely already using something on the order of CEPH or Gluster to back the datasets where they sit, just working with it all across network from that storage would be my first instinct.
How often does it need moved in/out of your facility, and is there no way to break up the processing into smaller chunks than a 0.6PB mass of files? Distribute out the smaller pieces with rsync, scp, or the like, operate on them, and pull back in the results, rather than trying to shift around the entire set. There's a reason Amazon will send a physical truck to a site to import large datasets into glacier... ;) -- Poison [BLX] Joshua M. Murphy