On Fri, Dec 2, 2011 at 6:22 AM, Cal Leeming [Simplicity Media Ltd]
<cal.leem...@simplicitymedialtd.co.uk> wrote:
> Faster in what sense? Prototyping/development time, or run time?

Well I can count the lines in each file in a few seconds, so I think
the SQL stuff is slowing everything down (using postgres through
psycodb2)

>
> If it's only a few MB, I see little reason to go as far as to writing it in
> C. Unless you are performing the same import tens of thousands of times, and
> the overhead in Python adds up so much that you get problems.
>
> But, quite frankly, you'll max out MySQL INSERT performance before you max
> out Pythons performance lol - as long as you don't use the ORM for inserts
> :)

when you say 'as long as you don't use the ORM for inserts', do you
mean don't do:
currentDataset.comment="blah"
currentDataset.name="abc12"
currentDataset.relatedObject=otherCurrentObject.id

etc,etc?

Are you saying I should be doing all that in python, but using raw SQL
instead of the fancy python object-like way? like this:
https://docs.djangoproject.com/en/dev/topics/db/sql/#executing-custom-sql-directly

>
> Cal
>
>
> On Fri, Dec 2, 2011 at 5:21 AM, Nathan McCorkle <nmz...@gmail.com> wrote:
>>
>> would interfacing with SQL via C or C++ be faster to parse and load
>> data in bulk? I have files that are only a few MB worth of text, but
>> can take hours to load due to the amount of parsing I do, and the
>> number of database entries each item in a file makes
>>
>> On Mon, Nov 28, 2011 at 3:28 AM, Anler Hernandez Peral
>> <anle...@gmail.com> wrote:
>> > Hi, this is probably not your case, but in case it is, here is my story:
>> > Creating a script for import CSV files is the best solution as long as
>> > they
>> > are few, but in my case, the problem was that I need to import nearly 40
>> > VERY BIG CSV files, each one mapping a database table, and I needed to
>> > do it
>> > quickly. I thought that the best way was to use MySQL's "load data in
>> > local..." functionality since it works very fast and I could create only
>> > one
>> > function to import all the files. The problem was that my CSV files were
>> > pretty big and my database server were eating big amounts of memory and
>> > crashing my site so I ended up slicing each file in smaller chunks.
>> > Again, this is a very specific need, but in case you find yourself in
>> > such
>> > situation, here's my base code from which you can extend ;)
>> >
>> > https://gist.github.com/1dc28cd496d52ad67b29
>> > --
>> > anler
>> >
>> >
>> > On Sun, Nov 27, 2011 at 7:56 PM, Andre Terra <andrete...@gmail.com>
>> > wrote:
>> >>
>> >> This should be run asynchronously (i.e. celery) when importing large
>> >> files.
>> >> If you have a lot of categories/subcategories, you will need to bulk
>> >> insert them instead of looping through the data and just using
>> >> get_or_create. A single, long transaction will definitely bring great
>> >> improvements to speed.
>> >> One tool is DSE, which I've mentioned before.
>> >> Good luck!
>> >>
>> >> Cheers,
>> >> AT
>> >>
>> >> On Sat, Nov 26, 2011 at 8:44 PM, Petr Přikryl <prik...@atlas.cz> wrote:
>> >>>
>> >>> >>> import csv
>> >>> >>> data = csv.reader(open('/path/to/csv', 'r'), delimiter=';')
>> >>> >>> for row in data:
>> >>> >>> category = Category.objects.get_or_create(name=row[0])
>> >>> >>> sub_category = SubCategory.objects.get_or_create(name=row[1],
>> >>> >>> defaults={'parent_category': category})
>> >>> >>> product = Product.objects.get_or_create(name=row[2],
>> >>> >>> defaults={'sub_category': sub_category})
>> >>>
>> >>> There are few potential problems with the cvs as used here.
>> >>>
>> >>> Firstly, the file should be opened in binary mode.  In Unix-based
>> >>> systems, the binary mode is technically similar to text mode.
>> >>> However, you may once observe problems when you move
>> >>> the code to another environment (Windows).
>> >>>
>> >>> Secondly, the opened file should always be closed -- especially
>> >>> when building application (web) that may run for a long time.
>> >>> You can do it like this:
>> >>>
>> >>> ...
>> >>> f = open('/path/to/csv', 'rb')
>> >>> data = csv.reader(f, delimiter=';')
>> >>> for ...
>> >>> ...
>> >>> f.close()
>> >>>
>> >>> Or you can use the new Python construct "with".
>> >>>
>> >>> P.
>> >>>
>> >>> --
>> >>> You received this message because you are subscribed to the Google
>> >>> Groups
>> >>> "Django users" group.
>> >>> To post to this group, send email to django-users@googlegroups.com.
>> >>> To unsubscribe from this group, send email to
>> >>> django-users+unsubscr...@googlegroups.com.
>> >>> For more options, visit this group at
>> >>> http://groups.google.com/group/django-users?hl=en.
>> >>>
>> >>
>> >> --
>> >> You received this message because you are subscribed to the Google
>> >> Groups
>> >> "Django users" group.
>> >> To post to this group, send email to django-users@googlegroups.com.
>> >> To unsubscribe from this group, send email to
>> >> django-users+unsubscr...@googlegroups.com.
>> >> For more options, visit this group at
>> >> http://groups.google.com/group/django-users?hl=en.
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "Django users" group.
>> > To post to this group, send email to django-users@googlegroups.com.
>> > To unsubscribe from this group, send email to
>> > django-users+unsubscr...@googlegroups.com.
>> > For more options, visit this group at
>> > http://groups.google.com/group/django-users?hl=en.
>> >
>>
>>
>>
>> --
>> Nathan McCorkle
>> Rochester Institute of Technology
>> College of Science, Biotechnology/Bioinformatics
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Django users" group.
>> To post to this group, send email to django-users@googlegroups.com.
>> To unsubscribe from this group, send email to
>> django-users+unsubscr...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/django-users?hl=en.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django-users@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.



-- 
Nathan McCorkle
Rochester Institute of Technology
College of Science, Biotechnology/Bioinformatics

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Reply via email to