On 08/20/2014 04:28 PM, Ivan Kharlamov wrote: > On 08/20/2014 03:52 PM, Ivan Kharlamov wrote: >> On 08/20/2014 12:46 PM, Marc Tamlyn wrote: >>> I'd say ArrayField is a straight up data field at the moment. It stores >>> 0-1 lists of data. It's no different to CommaSeparatedIntegerField >>> (seriously, why does that exists...) >>> >>> *If* PG gets the relevant update that will allow `integer[] references` >>> (i.e. ArrayField(ForeignKey)) then this would be different, and would be >>> more like a m2m field. >>> >>> There is an argument that it's 0-N anyway, but in the implementation >>> both within Django and in the database I don't think the distinction is >>> useful at the point, from an ORM point of view in any case. For a forms >>> point of view it's quite different. >>> >>> >>> On 20 August 2014 09:19, Russell Keith-Magee <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> >>> On Mon, Aug 18, 2014 at 6:03 PM, Anssi Kääriäinen >>> <[email protected] <mailto:[email protected]>> wrote: >>> >>> On Monday, August 18, 2014 7:45:17 AM UTC+3, Russell Keith-Magee >>> wrote: >>> >>> I understand what you're driving at here, and I've had >>> similar thoughts over the course of the SoC. The catch is >>> that this makes the API for get_fields() fairly complicated. >>> >>> If every field fits into one specific type, then >>> get_fields() just requires a single boolean flag (do I >>> include fields of type X) for each field type. We can also >>> easily add new field types by adding new booleans to the API. >>> >>> However, if a field fits into multiple categories, then it's >>> impossible (or, at least, exceedingly complicated) to make a >>> single call to get_fields() that will specify all your field >>> requirements. "Get me all non-virtual data fields" requires >>> "virtual=False, data=True, m2m=False", but "Get all virtual >>> data fields that represent m2ms" requires "virtual=True, >>> data=False, m2m=True". You can't pass in both sets of >>> arguments at the same time, so you either have to make >>> multiple calls to get_fields(), or you have to invent some >>> sort of query syntax for get_fields() that allows union >>> queries. >>> >>> Plus, at the end of the day, get_fields() is abstracted >>> behind highly cached and optimised properties for key >>> lookups. These properties are effectively a cached call to >>> get_fields() with a specific set of arguments - so even if >>> get_fields() doesn't expose a "one category per field" >>> requirement, the API will require, at some level, names that >>> have clear (and preferably non-overlapping) membership. >>> >>> >>> If fields are in multiple categories then users will want to do >>> the full range of set operation on the categories. Encoding that >>> in to the API doesn't sound promising. >>> >>> >>> I don't think users actually want to get fields based on >>> the suggested categorization. I feel we get an easier to >>> use and more flexible API if we have higher level >>> categories and allow fields to match multiple >>> categories. As a practical example if I want all >>> relation fields, that is going to be hard using the >>> suggested API. Getting all relation fields is a more >>> realistic use case than getting related virtual objects. >>> >>> >>> Quite probably true. As a point of interest, the current (as >>> in, 1.6) API actually doesn't differentiate between category >>> (a) "pure data" and category (b) "relating data (i.e., FK)" >>> fields - if you ask for "data fields" you get pure data >>> *and* foreign keys. So, at least as far as Django's own >>> usage is concerned, you're correct in saying that taxonomy >>> I've described isn't fully required. >>> >>> Daniel's survey of internal usage reveals that there are >>> three use cases for getting a list of fields in Django's >>> internal API: >>> >>> * Get all data and m2m fields (i.e., categories a, b, and >>> d). This is effectively "all fields on *this* model" >>> >>> * Get all data, m2m, related objects, related m2m, and >>> virtual fields (i.e., categories a, b, d, f, g, h, i - >>> excluding c and e because Django doesn't currently have any >>> fields of this type). This is "all fields on this model, or >>> related to this model" >>> >>> * Get all m2m fields (i.e., category d) >>> >>> So - at the very least, we need names to describe those >>> three groups. My intention with describing a richer taxonomy >>> is to try and give names to other groupings of interest. >>> >>> If we want to have all fields to match single and only >>> single category, then we need to redefine the categories >>> to make sure ForeignKeys as virtual fields are possible, >>> and that more esoteric custom join based fields fit in >>> to the categorization. >>> >>> >>> Agreed - that's why I threw this out there for discussion :-) >>> >>> Properties like "data", "virtual", "external", "related", >>> "relating" - these are high level concepts describing the >>> way a field manifests. However, that doesn't mean we need to >>> expose these properties as part of the formal API. >>> >>> Part of the underlying problem here -- lets say we roll out >>> Django 1.7 with some version of this API, and in 1.8, >>> foreign key fields change to become virtual. That >>> effectively becomes backwards incompatible for queries that >>> are sensitive to a "virtual" flag; but it doesn't change the >>> underlying need to identify that a field is a foreign key. >>> We need to capture the latter use case, but not necessarily >>> the former. >>> >>> >>> Could we go with a minimal API for get_fields()? Instead of >>> having categorization on the get_fields() API, we could provide >>> field flags for the categories. With field flags it is >>> straightforward to filter the return list of get_fields(). As an >>> example, fetching those fields which are relations but which >>> aren't virtual: [f for f in get_fields() if f.relational and not >>> f.virtual]. If this path is taken, then I am not sure how >>> minimal the get_fields() API should be. We likely need flags for >>> at least if the field is defined on local, parent or some remote >>> model. >>> >>> As for changing ForeignKey to virtual field plus concrete field >>> representation - I just realized this will be backwards >>> incompatible no matter what we do regarding categorization. An >>> all-fields including get_fields() call will return separate >>> author (virtual) and author_id (concrete) fields after the >>> split. I am not sure what we can do about this. It would be very >>> unfortunate if we can't refactor the way ForeignKeys work due to >>> the meta API. Any ideas how we can avoid the backwards >>> compatibility trap? >>> >>> >>> I think Daniel and I might have come up with a way to meet both >>> these requirements - a minimalist API for get_fields, with at least >>> some protection against the known incoming backwards compatibility >>> issue. >>> >>> The summary so far: it appears that a complex taxonomy isn't >>> especially helpful - firstly, because any complex taxonomy is going >>> to have edge cases that are hard to categorize, but also because a >>> complex taxonomy leads to a much more complex internal API that is >>> going to be prone to backwards compatibility problems. >>> >>> So - instead of worrying about 'virtual' and other properties like >>> that, lets look at why the _meta API is fundamentally used - to get >>> a list of fields that need to be handled in data processing. This >>> primarily means forms, but other forms of serialisation are also >>> included. In these use cases, there are always going to be per-field >>> differences (even a CharField and an IntegerField require *slightly* >>> different handling), so we won't focus on internal representations, >>> storage mechanisms, or anything like that. Instead, lets focus on >>> cardinality - a field represents some sort of data that has a >>> cardinality with the object on which it is stored. If something has >>> cardinality 1, you can display a single field. If it's cardinality >>> N, you need to display a list, or some sort of inline. >>> >>> This results in 3 categories that are mutually exclusive: >>> >>> a) "Data fields": Fields of cardinality 0-1: >>> >>> * A CharField stores 0 or 1 strings (0 is the case of a nullable >>> field). >>> >>> * An IntegerField stores 0 or 1 integers. >>> >>> * A FileField stores 0 or 1 file paths. >>> >>> * An ImageField stores 0 or 1 file paths - although in being >>> modified, it might modify some other fields. >>> >>> * A ForeignKey stores 0 or 1 references to another object. >>> >>> * A GenericForeignKey stores 0 or 1 references to another object. >>> >>> * A notional "DocumentField" on a NoSQL store references 0 or 1 >>> external documents. >>> >>> b) "ManyToMany Fields": Fields that are locally defined that >>> represent a cardinality 0-N relationship with another object: >>> >>> * Many to Many fields store 0-N references to a second model. >>> >>> c) "Related Objects": Fields that represent a cardinality 0-N >>> relationship with this object, but aren't locally defined: >>> >>> * The 'related' side of a ForeignKey >>> >>> * The 'related' side of a ManyToMany >>> >>> * A GenericRelation representing the reverse side of a >>> GenericForeignKey >>> >>> These three types are mutually exclusive - you either have >>> cardinality 1 *or* cardinality N, not both; and you're either >>> locally defined on this object or you're not. I can't think of an >>> example of "cardinality 1 data that isn't defined on this object", >>> but it would fit into this taxonomy if it were needed; I also can't >>> think of a field definition that would span models. >>> >>> In addition to this basic classification, a field can be marked as >>> "hidden". The immediate use for this is to hide the related_name='+' >>> case of a FK or M2M. Looking forward, it would be used to mask >>> fields that exist, but aren't intended to be user visible - for >>> example, in the potential future case where a ForeignKey is split in >>> two, or a Composite Key, there would be a "hidden" integer field (or >>> fields) storing the actual data, and a virtual (but non-hidden) >>> field that is the public API for manipulating the relationship. This >>> would also be backwards compatible, because the "visible" field list >>> hasn't changed. >>> >>> Fields are also tracked according to their parentage; this is used >>> by tools interacting with inheritance relationships to know which >>> fields are actually on this model, and which are inherited from a >>> base class. >>> >>> This yields the following formal API for _meta: >>> >>> * get_fields(data, many_to_many, related, include_hidden, >>> include_parents) >>> >>> * @property data_fields (=> get_fields(data=True, >>> many_to_many=False, related=False, include_hidden=False, >>> include_parents=True) >>> >>> * @property many_to_many_fields (=> get_fields(data=False, >>> many_to_many=True, related=False, include_hidden=False, >>> include_parents=True) >>> >>> * @property related_objects (=> get_fields(data=False, >>> many_to_many=False, related=True, include_hidden=False, >>> include_parents=True) >>> >>> Does this sound any more sane as an API? >>> >>> My one lingering question is whether the "many_to_many" >>> name/category is too explicit. I can conceive how an ArrayField >>> could be considered a data field (it stores 0-1 arrays of data), or >>> a "many_to_many" field (because it stores 0-N instances of some >>> data). This all hinges on whether the definition for that field >>> category is that it is a relationship with another *model*, or if >>> it's just cardinality N data. It's trivial to call it a Data field >>> and just leave it at that, but I'm wondering if there might be >>> benefit in broadening the definition of "many_to_many". >>> >>> Russ %-) >> >> When I look at this situation from the point of view of forms, there are >> >> 1. Fields of cardinality 0-1 >> 2. Fields of cardinality 0-N >> >> and >> >> a. Fields that do not represent reference to another model (object) >> b. Fields that represent reference to another model (object) >> >> 1. and 2. are mutually exclusive; a. and b. are also mutually exclusive. >> >> IMO, this way the future Django form would not need to care whether the >> field is m2m or ArrayField(ForeignKey)) or ListField(EmbeddedModelField) >> because all of them would be 2.&b. >> >> One may also want to add two mutually-exclusive subcategories to b: >> >> b1. Relationship is locally defined >> b2. Relationship is not locally defined. > > To add more examples to my proposition: > > 1) CharField(), IntegerField(), FileField(), ImageField() > > are all members of both: a. and 1. > > 2) ArrayField(), DictionaryField() > > are all members of both: a. and 2. > > 3) ForeignKey(), GenericForeignKey(), EmbeddedModelField(), > GenericRelation(), > > are all members of both: b. and 1. > > 4) ManyToManyField(), ArrayField(ForeignKey), ListField(EmbeddedModelField) > > are all members of both: b. and 2. > > > As Collin Anderson wrote about "virtual" fields on 08/18/2014 07:12 PM: > >> Also, I think we should avoid discriminating between "virtual" and >> non-virtual (as with local vs parent). Why should it matter how a field >> is stored in the database? I think the distinction will make it harder >> to use non-relational databases. > > One may want to expand his statement and say that the form, ideally, > should not care whether the field relationship is locally defined or not. > > Which is not to say that b1 and b2 subcategories are not useful at all, > but they should not be needed in form representations.
Excuse me for posting multiple emails at a time, but I'd like to make a correction: It just occured to me that I misused the term 'cardinality'. The best way to correct myself is to replace this: 1. Fields of cardinality 0-1 2. Fields of cardinality 0-N with this: 1. Fields that can have 0-1 values. 2. Fields that can have 0-N values. Thanks for brilliant work and best regards, Ivan > >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Django developers" group. >>> To unsubscribe from this group and stop receiving emails from it, >>> send an email to [email protected] >>> <mailto:[email protected]>. >>> To post to this group, send email to >>> [email protected] >>> <mailto:[email protected]>. >>> Visit this group at http://groups.google.com/group/django-developers. >>> To view this discussion on the web visit >>> >>> https://groups.google.com/d/msgid/django-developers/CAJxq84_OcibE72RKB9T60BJW9AtY8_YYhmhM5dXH36TtW3KsYw%40mail.gmail.com >>> >>> <https://groups.google.com/d/msgid/django-developers/CAJxq84_OcibE72RKB9T60BJW9AtY8_YYhmhM5dXH36TtW3KsYw%40mail.gmail.com?utm_medium=email&utm_source=footer>. >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Django developers" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected] >>> <mailto:[email protected]>. >>> To post to this group, send email to [email protected] >>> <mailto:[email protected]>. >>> Visit this group at http://groups.google.com/group/django-developers. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/django-developers/CAMwjO1HLabZ7C%3D87Y3F50PWUYDncH1ip_VgtQN-cPOXthk8yHQ%40mail.gmail.com >>> <https://groups.google.com/d/msgid/django-developers/CAMwjO1HLabZ7C%3D87Y3F50PWUYDncH1ip_VgtQN-cPOXthk8yHQ%40mail.gmail.com?utm_medium=email&utm_source=footer>. >>> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "Django developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/django-developers. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/53F497B1.1010303%40gmail.com. For more options, visit https://groups.google.com/d/optout.
