I concur with Evan re: backend instances. I also suggest that you make copious use of tasks. In your step #3, rather than actually inserting each row into Cloud SQL, I would drop an individual task onto a push queue for each row insert, and then have another process which fires for each task to insert an individual row. Your "main" process which is iterating through your parsed data will run a _lot_ faster if it doesn't have to wait on the database for each row insert. If your experience is like ours, you'll find that one process parsing a text file and dropping tasks can keep 5-10 instances busy doing actual database inserts. Depending on the size of your input files, you might also want to consider splitting up the processing of your input file, or using map-reduce if your files are truly "big".
On Sunday, September 18, 2016 at 9:37:19 AM UTC-5, Niklas Andersson wrote: > > I have a bunch of files at Cloud Storage that I need to parse and insert > into Cloud SQL once per day. > The operation takes about an hour on a n1-standard-2 instance, and then > ~10mins to insert the data into Cloud SQL. > > My idea of design to solve this problem with GAE is: > > 1. Define a scheduled task in GAE, it does a HTTP request to a URL > 2. Create a URL handler that can respond to the HTTP req mentioned > above in GAE, it’s main task would be to boot a GCE instance > 3. The GCE instance would be based on a template where it does a few > things automatically at boot: > 1. Download the file from Cloud Storage > 2. Parses it > 3. Inserts the data into Cloud SQL > 4. Shuts down > > > Would this be the best design to solve the problem? > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/e68a4262-158f-4071-8211-b1e22c0a6e57%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
