On Fri, Dec 2, 2011 at 6:22 AM, Cal Leeming [Simplicity Media Ltd] <cal.leem...@simplicitymedialtd.co.uk> wrote: > Faster in what sense? Prototyping/development time, or run time?
Well I can count the lines in each file in a few seconds, so I think the SQL stuff is slowing everything down (using postgres through psycodb2) > > If it's only a few MB, I see little reason to go as far as to writing it in > C. Unless you are performing the same import tens of thousands of times, and > the overhead in Python adds up so much that you get problems. > > But, quite frankly, you'll max out MySQL INSERT performance before you max > out Pythons performance lol - as long as you don't use the ORM for inserts > :) when you say 'as long as you don't use the ORM for inserts', do you mean don't do: currentDataset.comment="blah" currentDataset.name="abc12" currentDataset.relatedObject=otherCurrentObject.id etc,etc? Are you saying I should be doing all that in python, but using raw SQL instead of the fancy python object-like way? like this: https://docs.djangoproject.com/en/dev/topics/db/sql/#executing-custom-sql-directly > > Cal > > > On Fri, Dec 2, 2011 at 5:21 AM, Nathan McCorkle <nmz...@gmail.com> wrote: >> >> would interfacing with SQL via C or C++ be faster to parse and load >> data in bulk? I have files that are only a few MB worth of text, but >> can take hours to load due to the amount of parsing I do, and the >> number of database entries each item in a file makes >> >> On Mon, Nov 28, 2011 at 3:28 AM, Anler Hernandez Peral >> <anle...@gmail.com> wrote: >> > Hi, this is probably not your case, but in case it is, here is my story: >> > Creating a script for import CSV files is the best solution as long as >> > they >> > are few, but in my case, the problem was that I need to import nearly 40 >> > VERY BIG CSV files, each one mapping a database table, and I needed to >> > do it >> > quickly. I thought that the best way was to use MySQL's "load data in >> > local..." functionality since it works very fast and I could create only >> > one >> > function to import all the files. The problem was that my CSV files were >> > pretty big and my database server were eating big amounts of memory and >> > crashing my site so I ended up slicing each file in smaller chunks. >> > Again, this is a very specific need, but in case you find yourself in >> > such >> > situation, here's my base code from which you can extend ;) >> > >> > https://gist.github.com/1dc28cd496d52ad67b29 >> > -- >> > anler >> > >> > >> > On Sun, Nov 27, 2011 at 7:56 PM, Andre Terra <andrete...@gmail.com> >> > wrote: >> >> >> >> This should be run asynchronously (i.e. celery) when importing large >> >> files. >> >> If you have a lot of categories/subcategories, you will need to bulk >> >> insert them instead of looping through the data and just using >> >> get_or_create. A single, long transaction will definitely bring great >> >> improvements to speed. >> >> One tool is DSE, which I've mentioned before. >> >> Good luck! >> >> >> >> Cheers, >> >> AT >> >> >> >> On Sat, Nov 26, 2011 at 8:44 PM, Petr Přikryl <prik...@atlas.cz> wrote: >> >>> >> >>> >>> import csv >> >>> >>> data = csv.reader(open('/path/to/csv', 'r'), delimiter=';') >> >>> >>> for row in data: >> >>> >>> category = Category.objects.get_or_create(name=row[0]) >> >>> >>> sub_category = SubCategory.objects.get_or_create(name=row[1], >> >>> >>> defaults={'parent_category': category}) >> >>> >>> product = Product.objects.get_or_create(name=row[2], >> >>> >>> defaults={'sub_category': sub_category}) >> >>> >> >>> There are few potential problems with the cvs as used here. >> >>> >> >>> Firstly, the file should be opened in binary mode. In Unix-based >> >>> systems, the binary mode is technically similar to text mode. >> >>> However, you may once observe problems when you move >> >>> the code to another environment (Windows). >> >>> >> >>> Secondly, the opened file should always be closed -- especially >> >>> when building application (web) that may run for a long time. >> >>> You can do it like this: >> >>> >> >>> ... >> >>> f = open('/path/to/csv', 'rb') >> >>> data = csv.reader(f, delimiter=';') >> >>> for ... >> >>> ... >> >>> f.close() >> >>> >> >>> Or you can use the new Python construct "with". >> >>> >> >>> P. >> >>> >> >>> -- >> >>> You received this message because you are subscribed to the Google >> >>> Groups >> >>> "Django users" group. >> >>> To post to this group, send email to django-users@googlegroups.com. >> >>> To unsubscribe from this group, send email to >> >>> django-users+unsubscr...@googlegroups.com. >> >>> For more options, visit this group at >> >>> http://groups.google.com/group/django-users?hl=en. >> >>> >> >> >> >> -- >> >> You received this message because you are subscribed to the Google >> >> Groups >> >> "Django users" group. >> >> To post to this group, send email to django-users@googlegroups.com. >> >> To unsubscribe from this group, send email to >> >> django-users+unsubscr...@googlegroups.com. >> >> For more options, visit this group at >> >> http://groups.google.com/group/django-users?hl=en. >> > >> > -- >> > You received this message because you are subscribed to the Google >> > Groups >> > "Django users" group. >> > To post to this group, send email to django-users@googlegroups.com. >> > To unsubscribe from this group, send email to >> > django-users+unsubscr...@googlegroups.com. >> > For more options, visit this group at >> > http://groups.google.com/group/django-users?hl=en. >> > >> >> >> >> -- >> Nathan McCorkle >> Rochester Institute of Technology >> College of Science, Biotechnology/Bioinformatics >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Django users" group. >> To post to this group, send email to django-users@googlegroups.com. >> To unsubscribe from this group, send email to >> django-users+unsubscr...@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/django-users?hl=en. >> > > -- > You received this message because you are subscribed to the Google Groups > "Django users" group. > To post to this group, send email to django-users@googlegroups.com. > To unsubscribe from this group, send email to > django-users+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/django-users?hl=en. -- Nathan McCorkle Rochester Institute of Technology College of Science, Biotechnology/Bioinformatics -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.