Django Models Joining and Normalising

2015-10-14 Thread Yunti
I'm still relatively new to Django and now working on a project that has 
much more complex models than I've used before.  

The question is about when is it best to separate out large models into 
separate models (and the impact caused by joining the smaller models 
together and also keeping data normalised).  

I'm trying to model magazine subscriptions, which are available from 
different suppliers - the costs vary slightly per region (due to delivery 
costs) and the available subscriptions vary slightly dependent on payment 
type (direct debit, card, cheque etc...). 

The subscriptions have a lot of different fields for each subscriptions so 
I'm not clear on how best to represent this in django - one large model or 
split into smaller models.

Firstly,  each supplier will have multiple magazines - to keep the data 
normalised should the suppliers be kept in a separate table/model separate 
to the subscriptions table? How will this impact performance when having to 
join the data back together when querying a list of subscriptions. 
(e.g. similarly for payment types of which there are only 4 should these be 
pulled out into a separate table? and regions)

There will be a form field for: 
supplier
payment type,
region,
should a separate model be made for these 3 fields to ease with making the 
form and how should that tie into the above tables (if they should be 
separated? Should I make e.g. an Input model class with ForeignKeys to each 
of the separate tables?). 

Thanks.  


-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/e233ed53-d0cc-4540-a93b-45471887be4c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Validating data for use directly in a model (not via forms)

2015-10-21 Thread Yunti
I have a django project where I want to save web scraped data to the 
database via the django ORM- which will be used in the django app. 

The data is currently in the form of JSON -> converted to python dict via 
json.loads

I setup my model with the narrowest/ most constrained field type possible - 
(e.g. DecimalField with decimal_places=2 and max_digits=4 for prices)

I naively tried to save the data/values from the relevant keys in the JSON 
directly to the relevant model field, however, this raised errors due to 
data format.

It looks like data entered via a form is converted to the relevant python 
object in the form validation - e.g. a date string '24 May 2015' is 
converted to a datetime object and the date format validated.  

None of this appears to happen when saving directly to a model? (would be 
good to have my understanding here confirmed?) and so saving '24 May 2015' 
directly to a DateField in a model produces a format error. 

What validation (if any) does Django do when saving to a database directly 
via the ORM? - Does it just rely on the type checking in the database (so 
for sqlite this would be nothing but would for postgres)?

Thanks.


-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/f5e1da4d-50af-4ded-8658-829413ab0c49%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Validating data for use directly in a model (not via forms)

2015-10-21 Thread Yunti
Thanks, yes I did look at model validation in the docs - it looked to be 
tied to use via forms.  I hadn't realised that it's not called 
automatically when an instance is created.  So if I was to validate when 
using get_or_create method would I then manually call full_clean() after 
get_or_create() ? 

I think you may be right regarding just setting up forms for the models.  I 
don't need a form but do need the validation.  

Thanks for your help. 

On Wednesday, 21 October 2015 19:09:01 UTC+1, Simon Charette wrote:
>
> Hi Yunti,
>
> Did you read about model level validation 
> <https://docs.djangoproject.com/en/1.8/ref/models/instances/#validating-objects>?
>  
> Calling model_instance.full_clean() triggers validation but it's not 
> implicitly called when you save an instance.
>
> For your date case you'll have to include a layer that feeds Django models 
> with datetime.date objects from your string representation. That's what 
> Django forms do under the hood using the DATE_INPUT_FORMAT setting 
> <https://www.google.com/url?q=https%3A%2F%2Fdocs.djangoproject.com%2Fen%2F1.8%2Fref%2Fsettings%2F%23date-input-formats&sa=D&sntz=1&usg=AFQjCNG918aoHFS_G4IDK9V4zC0StTrS8g>
> .
>
> I would suggest you define Model form for your models and use them to 
> perform conversion and validation of your scrapped data.
>
> Simon
>
> Le mercredi 21 octobre 2015 13:49:08 UTC-4, Yunti a écrit :
>>
>> I have a django project where I want to save web scraped data to the 
>> database via the django ORM- which will be used in the django app. 
>>
>> The data is currently in the form of JSON -> converted to python dict via 
>> json.loads
>>
>> I setup my model with the narrowest/ most constrained field type possible 
>> - (e.g. DecimalField with decimal_places=2 and max_digits=4 for prices)
>>
>> I naively tried to save the data/values from the relevant keys in the 
>> JSON directly to the relevant model field, however, this raised errors due 
>> to data format.
>>
>> It looks like data entered via a form is converted to the relevant python 
>> object in the form validation - e.g. a date string '24 May 2015' is 
>> converted to a datetime object and the date format validated.  
>>
>> None of this appears to happen when saving directly to a model? (would be 
>> good to have my understanding here confirmed?) and so saving '24 May 2015' 
>> directly to a DateField in a model produces a format error. 
>>
>> What validation (if any) does Django do when saving to a database 
>> directly via the ORM? - Does it just rely on the type checking in the 
>> database (so for sqlite this would be nothing but would for postgres)?
>>
>> Thanks.
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/f8f33490-8dff-4e9f-8230-14c195609e6f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


update_or_create() always creates (or recreates)

2015-11-05 Thread Yunti
I have tried to use the update_or_create() method assuming that it would 
either, create a new entry in the db if it found none or update an existing 
one if it found one and had differences to the defaults passed in  - or 
wouldn't update if there was no difference.  However it just seemed to 
recreate entries each time even if there were no changes.

I think the issue was that I wanted to:
1)  get an entry if all fields were the same,
2) or create a new entry if it didn't find an existing entry with the 
unique_id
3) or if there was an entry with the same unique_id, update that entry with 
remaining fields. 

The update_or_create() method doesn't seem to work as I had hoped using how 
I have called it below - it just always seems to do an update if it finds a 
match on the given kwargs. 

Or if I tried passing in all That would 
would have to be passing in all the fields as keyword args to check that 
nothing had changed but then that would miss option 3) finding an existing 
entry that 






supplier, created = 
Supplier.objects.update_or_create(unique_id=product_detail['supplierId'],
   defaults={
   'name': 
product_detail['supplierName'],
   'entity_name_1': 
entity_name_1,
   'entity_name_2': 
entity_name_1,
   'rating': 
product_detail['supplierRating']})





class Supplier(models.Model):
unique_id = models.IntegerField(unique=True)
name = models.CharField(max_length=255, unique=True)
entity_name_1 = models.CharField(max_length=255, blank=True)
entity_name_2 = models.CharField(max_length=255, blank=True)
rating = models.CharField(max_length=255)

last_updated = models.DateTimeField(auto_now=True)


def __str__(self):
return self.name


Not being convinced that update_or_create() would give me what I needed I made 
the below function:


def create_or_update_if_diff(defaults, model):
try:
instance = model.objects.get(**defaults)
# if no exception, the product doesn't need to be updated
except model.DoesNotExist:
# the product needs to be created or updated
try:
model.objects.get(unique_id=defaults['unique_id'])
except model.DoesNotExist:
# needs to be created
instance = model.objects.create(**defaults)
# model(**defaults).save()
sys.stdout.write('New {} created: {}\n'.format(model, 
instance.name)) 
return instance, True
else:
# needs to be updated
instance = model.objects.update(**defaults)
sys.stdout.write('{}:'
 ' {} updated \n'.format(model, 
instance.unique_id)) 
return instance, True
return instance, False


However I can't get it to be quite right.  I key a key error on update possibly 
because the defaults passed in now include unique_id. Should the unique_id be 
separated and both passed into the function to fix this?  (And should I have 
created a function to achieve this - or would have update_or_create() have been 
able to do this.?)



supplier_defaults={
   'unique_id': 
product_detail['supplierId'],
   'name': 
product_detail['supplierName'],
   'entity_name_1': 
entity_name_1,
   'entity_name_2': 
entity_name_2,
   'rating': 
product_detail['supplierRating']}



-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/a0b6e1dd-d583-480e-9c6e-540c1ad4511a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: update_or_create() always creates (or recreates)

2015-11-06 Thread Yunti
Carsten ,

Thanks for your reply,

A note about the last statement: If a Supplier object has the same 
unique_id, and all 
other fields (in `defaults`) are the same as well, logically there is no 
difference 
between updating and not updating – the result is the same. 

The entry in the database is the same - apart from the last_updated flag if 
it's not rewritten over the top of it.  This means I can check for new data 
often and be alerted when there is an actual update (i.e. a change to the 
data).  If it rewrites the data everytime it checks then I have no idea 
when data was actually updated.

Have you checked? How? 
In your create_or_update_if_diff() you seem to try to re-invent 
update_or_create(), but 
have you actually examined the results of the 

 supplier, created = Supplier.objects.update_or_create(...) 

call? 

I checked by seeing that the last_updated field in the database was updated 
everytime.  (I suppose the issue could be with how that field gets reset to 
the next time it's run- I didnt)



On Thursday, 5 November 2015 20:05:39 UTC, Carsten Fuchs wrote:
>
> Hi Yunti, 
>
> Am 05.11.2015 um 18:19 schrieb Yunti: 
> > I have tried to use the update_or_create() method assuming that it would 
> either, create 
> > a new entry in the db if it found none or update an existing one if it 
> found one and had 
> > differences to the defaults passed in  - or wouldn't update if there was 
> no difference. 
>
> A note about the last statement: If a Supplier object has the same 
> unique_id, and all 
> other fields (in `defaults`) are the same as well, logically there is no 
> difference 
> between updating and not updating – the result is the same. 
>
> >   However it just seemed to recreate entries each time even if there 
> were no changes. 
>
> Have you checked? How? 
> In your create_or_update_if_diff() you seem to try to re-invent 
> update_or_create(), but 
> have you actually examined the results of the 
>
>  supplier, created = Supplier.objects.update_or_create(...) 
>
> call? 
>
> > I think the issue was that I wanted to: 
> > 1)  get an entry if all fields were the same, 
>
> update_or_create() updates an object with the given kwargs, the match is 
> not made 
> against *all* fields (i.e. for the match the fields in `defaults` are not 
> accounted for). 
>
> > 2) or create a new entry if it didn't find an existing entry with the 
> unique_id 
> > 3) or if there was an entry with the same unique_id, update that entry 
> with remaining 
> > fields. 
>
> update_or_create() should achieve this. It's hard to tell more without 
> additional 
> information, but 
>
> https://docs.djangoproject.com/en/1.8/ref/models/querysets/#update-or-create 
> explains 
> the function well, including how it works. If you work through this in 
> small steps, 
> check examples and their (intermediate) results, you should be able to 
> find what the 
> original problem was. 
>
> Best regards, 
> Carsten 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/45a2e51e-d7bb-4743-aa4c-c23b17098d17%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: update_or_create() always creates (or recreates)

2015-11-06 Thread Yunti
Carsten ,

Thanks for your reply,

A note about the last statement: If a Supplier object has the same 
unique_id, and all 
other fields (in `defaults`) are the same as well, logically there is no 
difference 
between updating and not updating – the result is the same. 

The entry in the database is the same - apart from the last_updated flag if 
it's not rewritten over the top of it.  This means I can check for new data 
often and be alerted when there is an actual update (i.e. a change to the 
data).  If it rewrites the data everytime it checks then I have no idea 
when data was actually updated.

Have you checked? How? 
In your create_or_update_if_diff() you seem to try to re-invent 
update_or_create(), but 
have you actually examined the results of the 

 supplier, created = Supplier.objects.update_or_create(...) 

call? 

I checked by seeing that the last_updated field in the database was updated 
everytime.  (I suppose the issue could be with how that field gets reset to 
the next time it's run- I didn't eliminate that possibility.)

Yes I was worried that I might be recreating (a poor version) of 
update_or_create() but it didn't seem to have the option where it wouldn't 
write to the database if there was no change to the data.   
Can it do this? And how would I verify when an item has been updated or 
created (or neither) - could I output to the console? 

If it can how do I call it so it checks against all fields (unique_id and 
defaults) and updates using the defaults if it finds a difference (and 
creates if it doesn't find a unique_id)?

I'm still not sure if this is possible and how to call the function, 
particular how to pass in the remaining defaults to check against - 
**kwargs = defaults isn't right but not sure what it should be.

supplier, created = 
Supplier.objects.update_or_create(unique_id=product_detail['supplierId'], 
**kwargs=defaults, 
   defaults={
   'name': 
product_detail['supplierName'],
   'entity_name_1': 
entity_name_1,
   'entity_name_2': 
entity_name_1,
   'rating': 
product_detail['supplierRating']})



On Thursday, 5 November 2015 20:05:39 UTC, Carsten Fuchs wrote:
>
> Hi Yunti, 
>
> Am 05.11.2015 um 18:19 schrieb Yunti: 
> > I have tried to use the update_or_create() method assuming that it would 
> either, create 
> > a new entry in the db if it found none or update an existing one if it 
> found one and had 
> > differences to the defaults passed in  - or wouldn't update if there was 
> no difference. 
>
> A note about the last statement: If a Supplier object has the same 
> unique_id, and all 
> other fields (in `defaults`) are the same as well, logically there is no 
> difference 
> between updating and not updating – the result is the same. 
>
> >   However it just seemed to recreate entries each time even if there 
> were no changes. 
>
> Have you checked? How? 
> In your create_or_update_if_diff() you seem to try to re-invent 
> update_or_create(), but 
> have you actually examined the results of the 
>
>  supplier, created = Supplier.objects.update_or_create(...) 
>
> call? 
>
> > I think the issue was that I wanted to: 
> > 1)  get an entry if all fields were the same, 
>
> update_or_create() updates an object with the given kwargs, the match is 
> not made 
> against *all* fields (i.e. for the match the fields in `defaults` are not 
> accounted for). 
>
> > 2) or create a new entry if it didn't find an existing entry with the 
> unique_id 
> > 3) or if there was an entry with the same unique_id, update that entry 
> with remaining 
> > fields. 
>
> update_or_create() should achieve this. It's hard to tell more without 
> additional 
> information, but 
>
> https://docs.djangoproject.com/en/1.8/ref/models/querysets/#update-or-create 
> explains 
> the function well, including how it works. If you work through this in 
> small steps, 
> check examples and their (intermediate) results, you should be able to 
> find what the 
> original problem was. 
>
> Best regards, 
> Carsten 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/9b529e2d-7e2b-4194-a77c-8434efe6205d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: update_or_create() always creates (or recreates)

2015-11-06 Thread Yunti
Jani,

Thanks for your reply - you explained it much more concisely than I did. :)

Good to have it confirmed that update_or_create() doesn't quite do what I 
needed - I was confused as to whether it would or not.

Thanks for taking the time to do that function, that looks ideal. I'll test 
it out.


On Friday, 6 November 2015 12:52:11 UTC, Jani Tiainen wrote:
>
> Your problem lies on the way Django actually carries out create or update.
>
> As name suggest, create or update does either one. But that's what you 
> don't want - you want conditional update.
>
> Only update if certain fields have been changed. Well this can be done few 
> ways.
>
> So you want to do 
> "update_only_if_at_least_one_of_default_fields_changed_or_create"
>
> Operation is simple, if object is not found, create new one using defaults 
> if found, pull values as a dict, compare against
> default values and if at least one differs do an update. Otherwise don't 
> do anything.
>
> So basically code would look something like this:
>
> update_if_changed_or_create(**kwargs):
> defaults = kwargs.pop('defaults', None)
>
> qs = MyModel.objects.filter(**kwargs)
>
>  if not qs:
> obj = MyModel(**kwargs).save()
> return obj, True  # Created object
> else if len(qs) == 1:
> obj = qs[0]
> changed = False
> for k, v in defaults:
>  if getattr(obj, k) != v:
>  changed = True
>  setattr(obj, k, v)
> if changed:
> obj.save()
> return obj, False  # Updated object
> else:
> # Multiple objects...
>
> return obj, None  # No change.
>
>
> On 06.11.2015 14:08, Yunti wrote:
>
> Carsten , 
>
> Thanks for your reply,
>
> A note about the last statement: If a Supplier object has the same 
> unique_id, and all 
> other fields (in `defaults`) are the same as well, logically there is no 
> difference 
> between updating and not updating – the result is the same. 
>
> The entry in the database is the same - apart from the last_updated flag 
> if it's not rewritten over the top of it.  This means I can check for new 
> data often and be alerted when there is an actual update (i.e. a change to 
> the data).  If it rewrites the data everytime it checks then I have no idea 
> when data was actually updated.
>
> Have you checked? How? 
> In your create_or_update_if_diff() you seem to try to re-invent 
> update_or_create(), but 
> have you actually examined the results of the 
>
>  supplier, created = Supplier.objects.update_or_create(...) 
>
> call? 
>
> I checked by seeing that the last_updated field in the database was 
> updated everytime.  (I suppose the issue could be with how that field gets 
> reset to the next time it's run- I didn't eliminate that possibility.)
>
> Yes I was worried that I might be recreating (a poor version) of 
> update_or_create() but it didn't seem to have the option where it wouldn't 
> write to the database if there was no change to the data.   
> Can it do this? And how would I verify when an item has been updated or 
> created (or neither) - could I output to the console? 
>
> If it can how do I call it so it checks against all fields (unique_id and 
> defaults) and updates using the defaults if it finds a difference (and 
> creates if it doesn't find a unique_id)?
>
> I'm still not sure if this is possible and how to call the function, 
> particular how to pass in the remaining defaults to check against - 
> **kwargs = defaults isn't right but not sure what it should be.
>
> supplier, created = 
> Supplier.objects.update_or_create(unique_id=product_detail['supplierId'], 
> **kwargs=defaults, 
>defaults={
>'name': 
> product_detail['supplierName'],
>    'entity_name_1': 
> entity_name_1,
>'entity_name_2': 
> entity_name_1,
>'rating': 
> product_detail['supplierRating']})
>
> On Thursday, 5 November 2015 20:05:39 UTC, Carsten Fuchs wrote:
>>
>> Hi Yunti, Am 05.11.2015 um 18:19 schrieb Yunti: > I have tried to use the 
>> update_or_create() method assuming that it would either, create > a new 
>> entry in the db if it found none or update an existing one if it found one 
>> and had > differences to t

Re: update_or_create() always creates (or recreates)

2015-11-06 Thread Yunti
Hi Dan,

Thanks for the suggestion, it's a web scraper (run as a django management 
command) which then saves the data to the database via the Django ORM. 
 Given it's a scraper rather than a form (or view) is the above suggested 
function an ok way to proceed or would you suggest something else is more 
appropriate/best practice?



On Friday, 6 November 2015 14:40:59 UTC, Dan Tagg wrote:
>
> Hi Yunti,
>
>
> You could go up a level in the structure of your application and apply the 
> logic there, where there is more support.
>
> Are you using Django forms? The ModelForm class pretty much does what you 
> want, it examines form data, validating it against its type and any 
> validation rules you have set in the form or your model, compares it to the 
> instance's data in the database and only saves if there has been some kind 
> of change. 
>
> Dan
>
> On 6 November 2015 at 13:47, Yunti > wrote:
>
>> Jani,
>>
>> Thanks for your reply - you explained it much more concisely than I did. 
>> :)
>>
>> Good to have it confirmed that update_or_create() doesn't quite do what I 
>> needed - I was confused as to whether it would or not.
>>
>> Thanks for taking the time to do that function, that looks ideal. I'll 
>> test it out.
>>
>>
>> On Friday, 6 November 2015 12:52:11 UTC, Jani Tiainen wrote:
>>
>>> Your problem lies on the way Django actually carries out create or 
>>> update.
>>>
>>> As name suggest, create or update does either one. But that's what you 
>>> don't want - you want conditional update.
>>>
>>> Only update if certain fields have been changed. Well this can be done 
>>> few ways.
>>>
>>> So you want to do 
>>> "update_only_if_at_least_one_of_default_fields_changed_or_create"
>>>
>>> Operation is simple, if object is not found, create new one using 
>>> defaults if found, pull values as a dict, compare against
>>> default values and if at least one differs do an update. Otherwise don't 
>>> do anything.
>>>
>>> So basically code would look something like this:
>>>
>>> update_if_changed_or_create(**kwargs):
>>> defaults = kwargs.pop('defaults', None)
>>>
>>> qs = MyModel.objects.filter(**kwargs)
>>>
>>>  if not qs:
>>> obj = MyModel(**kwargs).save()
>>> return obj, True  # Created object
>>> else if len(qs) == 1:
>>> obj = qs[0]
>>> changed = False
>>> for k, v in defaults:
>>>  if getattr(obj, k) != v:
>>>  changed = True
>>>  setattr(obj, k, v)
>>> if changed:
>>> obj.save()
>>> return obj, False  # Updated object
>>> else:
>>> # Multiple objects...
>>>
>>> return obj, None  # No change.
>>>
>>>
>>> On 06.11.2015 14:08, Yunti wrote:
>>>
>>> Carsten , 
>>>
>>> Thanks for your reply,
>>>
>>> A note about the last statement: If a Supplier object has the same 
>>> unique_id, and all 
>>> other fields (in `defaults`) are the same as well, logically there is no 
>>> difference 
>>> between updating and not updating – the result is the same. 
>>>
>>> The entry in the database is the same - apart from the last_updated flag 
>>> if it's not rewritten over the top of it.  This means I can check for new 
>>> data often and be alerted when there is an actual update (i.e. a change to 
>>> the data).  If it rewrites the data everytime it checks then I have no idea 
>>> when data was actually updated.
>>>
>>> Have you checked? How? 
>>> In your create_or_update_if_diff() you seem to try to re-invent 
>>> update_or_create(), but 
>>> have you actually examined the results of the 
>>>
>>>  supplier, created = Supplier.objects.update_or_create(...) 
>>>
>>> call? 
>>>
>>> I checked by seeing that the last_updated field in the database was 
>>> updated everytime.  (I suppose the issue could be with how that field gets 
>>> reset to the next time it's run- I didn't eliminate that possibility.)
>>>
>>> Yes I was worried that I might be recreating (a poor version) of 
>>> update_or_create() but it didn't seem to have the option where it wouldn't 
>>> write to t

Re: update_or_create() always creates (or recreates)

2015-11-06 Thread Yunti
Thanks - you've definitely given me some stuff to think about.  I'm doing 
XHR requests - returning JSON for the scraping (but probably later will 
have normal pages so I will definitely look at your Etag suggestion - I'm 
not familiar with that so will look into it). 

Given it's XHR and JSON I presume eTag isn't relevant, so I think your idea 
of setting a flag is a good one.  So for each row in each table (e.g. a 
Supplier) that I rescrape - get that from database based on the unique_id 
and then compare each attribute to the re-scraped JSON and alter 
flag/update instance if diff.  

The data will only change about a fraction of a percent of the time (most 
of the time constant) and it will be about 70k rows with 50 -100 fields. 
 DB is postgres (on Heroku for now). 

On Friday, 6 November 2015 17:12:05 UTC, Dan Tagg wrote:
>
> If you are web scraping you really need your code to be as efficient as 
> possible and to do as little as possible. Firstly, make sure you are using 
> everything the servers of the websites you are scraping are giving you to 
> decide whether to bother downloading the page. For example, check the etag 
> and only bother to scape if it is different from the last time you scraped 
> data.. If you don't trust the server's ETag, you can hash the page when you 
> download it and check that against your stored hash so you can check 
> whether it changed and whether it's worth processing. 
>
> Your approach of trying a 'get' with all the properties set and picking up 
> the exception has costs -- Assuming your tables have enough rows that 
> scanning the entire table won't be efficient for every "get" you will need 
> to have every column you are using in you "get" indexed in the database. 
> This obviously has a storage cost as well as an additional insert/update 
> cost and a larger cost to run the query than a simple select against a 
> single key. Whether that is more efficient than getting the result and 
> comparing the fields in python I don't know. I imagine it will be dependent 
> on what your RDBMS is and how it is hosted as well as how many rows and 
> columns will be in your database table.
>
> You could initialise a flag to False and as you process your scraped data 
> you could compare it to the attributes of your instance and set the flag to 
> True if they have changed and then not bother saving if you get to the end 
> of processing your scraped data and the modified flag has not been set to 
> True.
>
> Dan 
>
> On 6 November 2015 at 16:12, Yunti > wrote:
>
>> Hi Dan,
>>
>> Thanks for the suggestion, it's a web scraper (run as a django management 
>> command) which then saves the data to the database via the Django ORM.  
>> Given it's a scraper rather than a form (or view) is the above suggested 
>> function an ok way to proceed or would you suggest something else is more 
>> appropriate/best practice?
>>
>>
>>
>> On Friday, 6 November 2015 14:40:59 UTC, Dan Tagg wrote:
>>>
>>> Hi Yunti,
>>>
>>>
>>> You could go up a level in the structure of your application and apply 
>>> the logic there, where there is more support.
>>>
>>> Are you using Django forms? The ModelForm class pretty much does what 
>>> you want, it examines form data, validating it against its type and any 
>>> validation rules you have set in the form or your model, compares it to the 
>>> instance's data in the database and only saves if there has been some kind 
>>> of change. 
>>>
>>> Dan
>>>
>>> On 6 November 2015 at 13:47, Yunti  wrote:
>>>
>>>> Jani,
>>>>
>>>> Thanks for your reply - you explained it much more concisely than I 
>>>> did. :)
>>>>
>>>> Good to have it confirmed that update_or_create() doesn't quite do what 
>>>> I needed - I was confused as to whether it would or not.
>>>>
>>>> Thanks for taking the time to do that function, that looks ideal. I'll 
>>>> test it out.
>>>>
>>>>
>>>> On Friday, 6 November 2015 12:52:11 UTC, Jani Tiainen wrote:
>>>>
>>>>> Your problem lies on the way Django actually carries out create or 
>>>>> update.
>>>>>
>>>>> As name suggest, create or update does either one. But that's what you 
>>>>> don't want - you want conditional update.
>>>>>
>>>>> Only update if certain fields have been changed. Well this can be done 
>>>>> few wa