Tim Chase wrote: > On 2015-05-20 22:58, Chris Angelico wrote: >> On Wed, May 20, 2015 at 9:44 PM, Parul Mogra <scoria....@gmail.com> >> wrote: >> > My objective is to create large amount of data files (say a >> > million *.json files), using a pre-existing template file >> > (*.json). Each file would have a unique name, possibly by >> > incorporating time stamp information. The files have to be >> > generated in a folder specified. > [snip] >> try a simple sequential integer. >> >> All you'd need would be a loop that creates a bunch of files... most >> of your code will be figuring out what parts of the template need to >> change. Not too difficult. > > If you store your template as a Python string-formatting template, > you can just use string-formatting to do your dirty work: > > > import random > HOW_MANY = 1000000 > template = """{ > "some_string": "%(string)s", > "some_int": %(int)i > } > """ > > wordlist = [ > word.rstrip() > for word in open('/usr/share/dict/words') > ] > wordlist[:] = [ # just lowercase all-alpha words > word > for word in wordlist > if word.isalpha() and word.islower() > ] > > for i in xrange(HOW_MANY): > fname = "data_%08i.json" % i > with open(fname, "w") as f: > f.write(template % { > "string_value": random.choice(wordlist), > "int_value": random.randint(0, 1000), > })
Just a quick reminder: if the data is user-provided you have to sanitize it: >>> template = """{"access": "restricted", "user": "%(user)s"}""" >>> json.loads(template % dict(user="""tim", "access": "unlimited""")) {'user': 'tim', 'access': 'unlimited'} That can't happen when you load the template, replace some keys and dump the result: >>> template = json.loads("""{"access": "restricted", "user": "placeholder"}""") >>> template["user"] = """tim", "access": "unlimited""" >>> json.dumps(template) '{"user": "tim\\", \\"access\\": \\"unlimited", "access": "restricted"}' >>> json.loads(_) {'user': 'tim", "access": "unlimited', 'access': 'restricted'} >>> _["access"] 'restricted' I expect that performance will be dominated by I/O; if that's correct the extra work of serializing the JSON should not do much harm. -- https://mail.python.org/mailman/listinfo/python-list