Some example queries are: give me all attributes for entity1 where entity1.attribute1 > 1000 and entity1.attribute15 = "someValue"
give me all attributes for entity1 where entity1.parentId = 1 give me all attributes for entity1 & parent_entity where entity1.attribute2 = "this" Nothing too complicated. Our application level checks more consist of validating the uploaded data against user defined rules. They are fairly basic. Scale wise... Nothing huge. In terms of a more traditional schema, where each entity was a separate table, the largest table might contain 1,000,000 rows, with the vast majority of them under 10,000. Another issue is that there might be 30 attributes for a given entity. However the distribution of data might look like 8 of 30 are 100% populated (each row has a value). 4 of 30 are 80% populated, 15 are 50% populate, and the rest are <25% populated. On Fri, Apr 14, 2017 at 11:23 AM, Vincent Elschot <vi...@xs4all.nl> wrote: > > Op 14/04/2017 om 19:03 schreef Rj Ewing: > > We do know where we want to end up. We've had the application running for > a while using a triple store db. We're looking to move away from the triple > store due to performance issues. Our core concept is that each project can > define a set of entities and their relations. Each entity has a set of > attributes. We need to be able to efficiently run ad-hoc queries across > multiple entities in a project, filtering via the attribute values > > What kind of queries are you talking about? > > > I think our business needs probably eliminate the possibility of data > integrity at the db level. We currently do application level data > validation. > > > Application level checks can only be done if you exclusively lock the > database from before you start the check until the moment you nolonger need > the certainty. > That's usually a no-go because it makes your data single-user for the > duration of the check. > > > Regarding EAV, is there a good way to do this? Most everything I read > online says that EAV is a terrible idea and performance is lack luster. > However there are 6NF advocators who say that done right, it is highly > performant. Are there any articles on the correct way to implement EAV? > > Performance always depends entirely on what you are doing with it, and on > what scale. This is something that you should experiment with. > > > would jsonb or eav likely provide better query performance? > > Can't tell without more information about what you actually do with the > data. > > But dynamic schemas can be just fine, as long as your application is 100% > in control over what can and cannot be done, > and as long as you use separate databases per customer/project/whatever. > You will probably want to scale up at some point > and move customers to different servers, so you might aswell take that > into account before you start. > > > > On Wed, Apr 12, 2017 at 7:43 AM, Merlin Moncure <mmonc...@gmail.com> > wrote: > >> On Tue, Apr 11, 2017 at 12:46 PM, Rj Ewing <ewing...@gmail.com> wrote: >> > I'm looking for thoughts on the best way to handle dynamic schemas. >> > >> > The application I am developing revolves around user defined entities. >> Each >> > entity is a tabular dataset with user defined columns and data types. >> > Entities can also be related to each other through Parent-Child >> > relationships. Some entities will be 100% user driven, while others >> (such as >> > an entity representing a photo) will be partially user driven (all photo >> > entities will have common fields + custom user additions). >> > >> > I was hoping to get opinions on whether postgresql would be a suitable >> > backend. A couple of options I have thought of are: >> > >> > 1. Each entity is represented as a table in psql. The schema would be >> > dynamically updated (with limits) when an entity mapping is updated. I >> > believe that this would provide the best data constraints and allow the >> best >> > data normalization. A concern I have is that there could be an enormous >> > amount of tables generated and the performance impacts this might have >> in >> > the future. I could then run elasticsearch as a denormalized cache for >> > efficient querying and full-text-search. >> > >> > 2. Use a nosql database. This provides the "dynamic" schema aspect. A >> > concern here is the lack of relation support, thus leading to a more >> > denormalized data structure and the potential for the data to become >> > corrupted. >> > >> > Any opinions on the use of psql for this case, or other options would be >> > greatly appreciated! >> >> Postgres can function as a nosql database -- you can use jsonb for >> example to archive data in such a way that the data model can be >> changed without making schema adjustments. Another way to do it is >> EAV pattern as noted. These might be good strategies if you're not >> sure where you want to end up. >> >> It really comes down to this: how formal do you want your data model >> to be? Adding formality leads to performance optimizations, exposes >> your data to the fantastic SQL language, and allows rigorous >> assumptions to made made from external dependencies and trusted. >> Formality also brings a degree of inflexibility since your data has to >> be forced into predetermined structures. >> >> merlin >> > > > > > >