On 04/21/2015 03:56 AM, subhabrata.bane...@gmail.com wrote:


Yes. They do not. They are opening one by one.
I have some big chunk of data I am getting by crawling etc.
now as I run the code it is fetching data.
I am trying to fetch the data from various sites.
The contents of the file are getting getting stored in
separate files.
For example, if I open the site of "http://www.theguardian.com/international";, then the result may be stored 
in file in file "file1.txt", and the contents of site "http://edition.cnn.com/";, may be stored in 
file "file2.txt".

But the contents of each site changes everyday. So everyday as you open these 
sites and store the results, you should store in different text files. These 
text files I am looking to be created on its own, as you do not know its 
numbers, how many numbers you need to fetch the data.

I am trying to do some results with import datetime as datetime.datetime.now()
may change everytime. I am trying to put it as name of file. But you may 
suggest better.


To get the text version of today's date, use something like:

import datetime
import itertools
SUFFIX = datetime.datetime.now().strftime("%Y$%m%d")

To write a filename generator that generates names sequentially  (untested):

def filenames(suffix=SUFFIX):
    for integer in itertools.count(1):
        yield "{0:04d}".format(integer) + "-" + SUFFIX


for filename in filenames():
    f = open (filename, "w")
    ...Do some work here which writes to the file, and does "break"
    ... if we don't need any more files
    f.close()

Note that this is literally open-ended. If you don't put some logic in that loop which will break, it'll write files till your OS stops you, either because of disk full, or directory too large, or whatever.

I suggest you test the loop out by using a print() before using it to actually create the files.

In the format above, I used 4 digits, on the assumption that usually that is enough. If you need more than that on the occasional day, it won't break, but the names won't be nicely sorted when you view them.


If this were my problem, I'd also use a generator for the web page names. If you write that generator, then you could do something like (untested):

for filename, url in zip(filenames(), urlames():
    f = open (filename, "w")
    ... process the url, writing to file f
    f.close()

In this loop, it'll automatically end when you run out of urlnames.


I also think you should consider making the date the directory name you use, rather than putting many days files in a single directory. But this mainly affects the way you concatenate the parts together. You'd use os.path.join() rather than "+" to combine parts.


--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to