Hi Cameron, First sorry for my very very late reply, has been overloaded at work last week :( Anyways... I will reply inline this time ;)
On Wed, Feb 8, 2012 at 11:59 AM, Cameron Simpson <c...@zip.com.au> wrote: > [ Please reply inline; it makes the discussion read like a converation, > with context. - Cameron > ] > > On 08Feb2012 08:57, Sherif Shehab Aldin <silentqu...@gmail.com> wrote: > | Thanks a lot for your help, I just forgot to state that the FTP server is > | not under my command, I can't control how the file grow, or how the > records > | are added, I can only login to It, copy the whole file. > > Oh. That's a pity. > > | The reason why I am parsing the file and trying to get the diffs between > | the new file and the old one, and copy it to new_file.time_stamp is that > I > | need to cut down the file size so when server (C) grabs the file, It > grabs > | only new data, also to cut down the network bandwidth. > > Can a simple byte count help here? Copy the whole file with FTP. From > the new copy, extract the bytes from the last byte count offset onward. > Then parse the smaller file, extracting whole records for use by (C). > That way you can just keep the unparsed tail (partial record I imagine) > around for the next fetch. > > Looking at RFC959 (the FTP protocol): > > http://www.w3.org/Protocols/rfc959/4_FileTransfer.html > > it looks like you can do a partial file fetch, also, by issuing a REST > (restart) command to set a file offset and then issuing a RETR (retrieve) > command to get the rest of the file. These all need to be in binary mode > of course. > > So in principle you could track the byte offset of what you have fetched > with FTP so far, and fetch only what is new. > I am actually grabbing the file from ftp with a bash script using lftp, It seemed a simple task for python at the beginning and then I noticed the many problems. I have checked lftp and did not know how to continue downloading a file. Do I have to use ftp library, may be in python so I can use that feature? > > | One of my problems was after mounting server (B) diffs_dir into Server > (A) > | throw NFS, I used to create filename.lock first into server (B) local > file > | then start copy filename to server (B) then remove filename.lock, so when > | the daemon running on server (C) parses the files in the local_diffs dir, > | ignores the files that are still being copied, > | > | After searching more yesterday, I found that local mv is atomic, so > instead > | of creating the lock files, I will copy the new diffs to tmp dir, and > after > | the copy is over, mv it to actual diffs dir, that will avoid reading It > | while It's still being copied. > > Yes, this sounds good. Provided the mv is on the same filesystem. > > For example: "mv /tmp/foo /home/username/foo" is actually a copy and not > a rename because /tmp is normally a different filesystem from /home. > > Yes they are in same file system, I am making sure of that ;) > | Sorry if the above is bit confusing, the system is bit complex. > > Complex systems often need fiddly solutions. > > | Also there is one more factor that confuses me, I am so bad in testing, > and > | I am trying to start actually implement unit testing to test my code, > what > | I find hard is how to test code like the one that do the copy, mv and so, > | also the code that fetch data from the web. > > Ha. I used to be very bad at testing, now I am improving and am merely > weak. > > One approach to testing is to make a mock up of the other half of the > system, and test against the mockup. > > For example, you have code to FTP new data and then feed it to (C). You > don't control the server side of the FTP. So you might make a small mock > up program that writes valid (but fictitious) data records progressively > to a local data file (write record, flush, pause briefly, etc). If you > can FTP to your own test machine you could then treat _that_ growing > file as the remote server's data file. > > Then you could copy it progressively using a byte count to keep track of > the bits you have seen to skip them, and the the > > If you can't FTP to your test system, you could abstract out the "fetch > part of this file by FTP" into its own function. Write an equivalent > function that fetches part of a local file just by opening it. > > Then you could use the local file version in a test that doesn't > actually do the FTP, but could exercise the rest of it. > > It is also useful to make simple tests of small pieces of the code. > So make the code to get part of the data a simple function, and write > tests to execute it in a few ways (no new data, part of a record, > several records etc). > > You are right, my problem is that I don't care about testing until my code grows badly and then I notice what I got myself into :) But ur suggestion is cool. I will try to implement that once I get back to that project again... As I got some problems with another project currently so I had to go fix them first.. and then the customer wanted some changes.. ;-) There are many people better than I to teach testing. > > I really appreciate your help. I am trying to learn from the mailing list, I noticed many interesting posts in the list already. I wish I could read the python-list same way.. but unfortunately the mail digest they send is quiet annoying :( Many thanks to you, and I will keep you posted if I got other ideas. :) > Cheers, > -- > Cameron Simpson <c...@zip.com.au> DoD#743 > http://www.cskk.ezoshosting.com/cs/ > > Testing can show the presence of bugs, but not their absence. - Dijkstra >
-- http://mail.python.org/mailman/listinfo/python-list