Re: [hibernate-dev] [OGM] storing the column names in the entity keys for K/V stores

2014-11-26 Thread Gunnar Morling
2014-11-25 14:30 GMT+01:00 Emmanuel Bernard :

> Hi,
>
> With OGM-452 behind us which brings one cache per “table”, we now have
> another decision in front of us.
>
> Should we use a synthetic key for the cache key (say a
> PersistentEntityKey class containing the array of column names and the
> array of column values)?
> Or should we use the natural object key?
>
> == Natural entity key
>
> In the latter, things gets complicated quickly, let me explain:
>
> === Simple case
>
> For simple cases, the id is a simple property and the fit is very
> natural
>
> [source]
> --
> @Entity
> class User {
> @Id String name;
> ...
> }
>
> //corresponds to
> cache.put(name, mapRepresentingUser);
> --
>
> === Embedded id
>
> If the identifier is an embedded id, you have several choices that all have
> drawbacks.
>
> 1. use the embedded id class as key `cache.put( new Name("Emmanuel",
> "Bernard"), mapRepresentingUser );`
> 2. use an array of property values `cache.put( new Object[] {"Emmanuel",
> "Bernard"}, mapRepresentingUser );`
>

Will that work at all? Does ISPN really work with value equality for
array-typed keys?

In a normal hash map you wouldn't get the value back as new Object[] {
"Emmanuel", "Bernard"}.equals( new Object[] {"Emmanuel", "Bernard"} ) is
false. So you would have to put the key into a wrapper whose equals method
uses Arrays.equals() internally.


> 3. use a Map corresponding to the array `cache.put( new
> HashMap( {{ "firstname" -> "Emmanuel", "lastname"->"Bernard"
> } ), mapRepresentingUser );
> 4. use an synthetic key `cache.put( new PersistentEntityKey( new String[]
> {"firstname", "lastname" }, new String[] { "Emmanuel", "Bernard" } ),
> mapRepresentingUser);`
>
> In 1, the problem is that we lose the proper data type abstraction
> between the object model and the data stored. `Name` is a user class.
>
> In 2, I think the model is somewhat acceptable but a bit arbitrary.
>
> In 3, I suspect the map is pretty horrific to serialize - that could be
> solved by a externalizer. But more importantly the order of the id
> columns is lost - even though it might be recoverable with
> EntityKeyMetadata?
>
> In 4, we expose the person querying the grid to our OGM specific type.
>

The current implementation puts a PersistentEntityKey designed as you
describe into the cache, but the externalizer only writes the column name
and value arrays. This should be readable without knowing the PEK type,
right? Of course you need to know the structure of the persisted key in
order to read it back.

Now Davide's idea was to only write the column value array, as the column
names are not really needed (assuming that one cache never contains entries
from several tables). This seems sensible to me unless I'm missing some
special case. The persisted form would be basically the one from 2., only
that there is a wrapper used at the API level.


> Aside from this, it is essentially like 4.
>
> === Entity key approach
>
> I really like the idea of the simple case be mapped directly, it makes
> for *the* natural mapping one would have chosen. But as I explained, it
> does not scale.
> In the composite id case, I don't really know what to chose between 2, 3
> and 4.
>
> So, should we go for the simple case if we can? Or favor consistency
> between the simple and complex case?

And which of the complex case do we favor?
>

My preference would be 4, with the proposed change of only writing the
column values. For the "simple case" we'd then could either store an array
of size 1 or just the single value itself, wrapping it into an array when
reading it back. I guess that'd require an instanceof call during read
back. Not sure whether that's good or bad, probably I'd just always store
the array.


>
> == Association
>
> In the case of associations, it becomes a bit trickier because the
> "simple case" where the association key is made of a single column is
> quite uncommon. Association keys are one of these combinations:
>
> * the fk to the owning entity + the index or key of the List or Map
> * the fk to the owning entity + the fk to the target entity (Set)
> * the fk to the owning entity + the list of columns of the simple or
> * embedded type (Set)
> * the fk to the owning entity + the surrogate id of the Bag
> * all columns in case of a non id backed bag
>
> All that to say that we are most of the time in the complex case of
> EntityKey with one of the 4 choices.
>
> Any thoughts and preferences?
>
> Emmanuel
> ___
> hibernate-dev mailing list
> hibernate-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev
___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev

Re: [hibernate-dev] [OGM] storing the column names in the entity keys for K/V stores

2014-11-26 Thread Sanne Grinovero
It looks like you're aiming at a "pure" mapping into primitives for
the datagrid.

So it looks very beautiful and tempting to go for a model such as
 > cache.put( "identifier name", ...)
but it seems quite dangerous to me for the same reason that you store
(conceptually):
  {"firstname", "lastname" }, { "Emmanuel", "Bernard" }
rather than storing:
  { "Emmanuel", "Bernard" }

Obviously the second one looks more natural in the storage, but you're
not really sure what these tokens were supposed to represent in case
someone decides to refactor the model.
I understand that it's now quite safe to remove the "tablename" in the
per-cache-table model, as entries would still be isolated: that was
the goal, but also it matches exactly the model proven by the RDBMs
model.
But there are implications in terms of flexibility and schema
evolution if we remove the "column names" and generally speaking it's
our only way of validating what an entry was supposed to model.

Speaking of, like we don't normally store the "tablename" in a column
of a table in an RDBMs, we don't really store its column names either.
So an alternative solution which more closely matches the proven RDBMs
model would be to store the schema representation of the table in the
Cache:

personsCache.put( SchemaGenerationId{1}, { ORDERED_ARRAY_STRATEGY,
"firstname", "lastname") );

then you would need to store entries linking them to a specific
Schema, such as { "Emmanuel", "Bernard", SchemaGenerationId{1} }.

such a SchemaGenerationId would be a cheap singleton (one per
"table"), and could be stored as efficiently as two integers (one for
the Marshaller id and one int for the schema generation id).

ORDERED_ARRAY_STRATEGY could be an Enum, and give you some flexibility
among your proposals.  With the current model I'd stick to the Map as
they are the only one safe enough, but with a schema definition like
the above description I'd definitely want to use the ordered sequence
(array?) as it's far more efficient at all levels.
A benefit is that I suspect that you could then transactionally evolve
the schema, and it wouldn't be too hard for us to provide a tool to
perform an "online schema migration".

You make a great point about making it easier to run native queries.
Is that a new goal we have? It seems we have to define the goals we
want, as the proper data abstraction goal seems to clash with it.
I'd rather make a custom Query walker which understand how we store
things in Infinispan, and keep the safety of our more verbose and less
efficient storage model. For example an inspection tool connected to
the grid could choose to not show the "SchemaId" tokens, but use them
to be able to render the entry in some human understandable way, like
by adding the column names on a table.

Some more notes:
 For HashMap there is a specialized Marshaller already. HashMaps are
horrific to instantiate at runtime though, in terms of memory, and
also not as efficient as arrays in terms of CPU of course.
 We didn't mention javax.persistence.IdClass but I assume the same applies.

Sanne

On 25 November 2014 at 13:30, Emmanuel Bernard  wrote:
> Hi,
>
> With OGM-452 behind us which brings one cache per “table”, we now have
> another decision in front of us.
>
> Should we use a synthetic key for the cache key (say a
> PersistentEntityKey class containing the array of column names and the
> array of column values)?
> Or should we use the natural object key?
>
> == Natural entity key
>
> In the latter, things gets complicated quickly, let me explain:
>
> === Simple case
>
> For simple cases, the id is a simple property and the fit is very
> natural
>
> [source]
> --
> @Entity
> class User {
> @Id String name;
> ...
> }
>
> //corresponds to
> cache.put(name, mapRepresentingUser);
> --
>
> === Embedded id
>
> If the identifier is an embedded id, you have several choices that all have
> drawbacks.
>
> 1. use the embedded id class as key `cache.put( new Name("Emmanuel", 
> "Bernard"), mapRepresentingUser );`
> 2. use an array of property values `cache.put( new Object[] {"Emmanuel", 
> "Bernard"}, mapRepresentingUser );`
> 3. use a Map corresponding to the array `cache.put( new 
> HashMap( {{ "firstname" -> "Emmanuel", "lastname"->"Bernard" } 
> ), mapRepresentingUser );
> 4. use an synthetic key `cache.put( new PersistentEntityKey( new String[] 
> {"firstname", "lastname" }, new String[] { "Emmanuel", "Bernard" } ), 
> mapRepresentingUser);`
>
> In 1, the problem is that we lose the proper data type abstraction
> between the object model and the data stored. `Name` is a user class.
>
> In 2, I think the model is somewhat acceptable but a bit arbitrary.
>
> In 3, I suspect the map is pretty horrific to serialize - that could be
> solved by a externalizer. But more importantly the order of the id
> columns is lost - even though it might be recoverable with
> EntityKeyMetadata?
>
> In 4, we expose the person querying the grid to our OGM specific type.
> Aside from this, it is e

Re: [hibernate-dev] [OGM] storing the column names in the entity keys for K/V stores

2014-11-26 Thread Gunnar Morling
2014-11-26 12:42 GMT+01:00 Sanne Grinovero :

> It looks like you're aiming at a "pure" mapping into primitives for
> the datagrid.
>
> So it looks very beautiful and tempting to go for a model such as
>  > cache.put( "identifier name", ...)
> but it seems quite dangerous to me for the same reason that you store
> (conceptually):
>   {"firstname", "lastname" }, { "Emmanuel", "Bernard" }
> rather than storing:
>   { "Emmanuel", "Bernard" }
>
> Obviously the second one looks more natural in the storage, but you're
> not really sure what these tokens were supposed to represent in case
> someone decides to refactor the model.
> I understand that it's now quite safe to remove the "tablename" in the
> per-cache-table model, as entries would still be isolated: that was
> the goal, but also it matches exactly the model proven by the RDBMs
> model.
> But there are implications in terms of flexibility and schema
> evolution if we remove the "column names" and generally speaking it's
> our only way of validating what an entry was supposed to model.
>

Yes, evolution is a very strong argument indeed for sticking to the current
approach. Without the column names (or some other form of descriptor as
suggested below) we will not be able to recognize the version of a given
key so we cannot apply any "migrations" to it, either upon loading or via
some sort of batch run.


>
> Speaking of, like we don't normally store the "tablename" in a column
> of a table in an RDBMs, we don't really store its column names either.
> So an alternative solution which more closely matches the proven RDBMs
> model would be to store the schema representation of the table in the
> Cache:
>
> personsCache.put( SchemaGenerationId{1}, { ORDERED_ARRAY_STRATEGY,
> "firstname", "lastname") );
>
> then you would need to store entries linking them to a specific
> Schema, such as { "Emmanuel", "Bernard", SchemaGenerationId{1} }.
>
> such a SchemaGenerationId would be a cheap singleton (one per
> "table"), and could be stored as efficiently as two integers (one for
> the Marshaller id and one int for the schema generation id).
>
> ORDERED_ARRAY_STRATEGY could be an Enum, and give you some flexibility
> among your proposals.  With the current model I'd stick to the Map as
> they are the only one safe enough, but with a schema definition like
> the above description I'd definitely want to use the ordered sequence
> (array?) as it's far more efficient at all levels.
> A benefit is that I suspect that you could then transactionally evolve
> the schema, and it wouldn't be too hard for us to provide a tool to
> perform an "online schema migration".
>

That's an interesting idea. Or having a separate KeyDescriptor cache which
holds an entry for each key type? Mixing the key definition and records
using it within one cache seems a bit odd to me.

You make a great point about making it easier to run native queries.
> Is that a new goal we have? It seems we have to define the goals we
> want, as the proper data abstraction goal seems to clash with it.
> I'd rather make a custom Query walker which understand how we store
> things in Infinispan, and keep the safety of our more verbose and less
> efficient storage model. For example an inspection tool connected to
> the grid could choose to not show the "SchemaId" tokens, but use them
> to be able to render the entry in some human understandable way, like
> by adding the column names on a table.
>
> Some more notes:
>  For HashMap there is a specialized Marshaller already. HashMaps are
> horrific to instantiate at runtime though, in terms of memory, and
> also not as efficient as arrays in terms of CPU of course.
>  We didn't mention javax.persistence.IdClass but I assume the same applies.
>
> Sanne
>
> On 25 November 2014 at 13:30, Emmanuel Bernard 
> wrote:
> > Hi,
> >
> > With OGM-452 behind us which brings one cache per “table”, we now have
> > another decision in front of us.
> >
> > Should we use a synthetic key for the cache key (say a
> > PersistentEntityKey class containing the array of column names and the
> > array of column values)?
> > Or should we use the natural object key?
> >
> > == Natural entity key
> >
> > In the latter, things gets complicated quickly, let me explain:
> >
> > === Simple case
> >
> > For simple cases, the id is a simple property and the fit is very
> > natural
> >
> > [source]
> > --
> > @Entity
> > class User {
> > @Id String name;
> > ...
> > }
> >
> > //corresponds to
> > cache.put(name, mapRepresentingUser);
> > --
> >
> > === Embedded id
> >
> > If the identifier is an embedded id, you have several choices that all
> have
> > drawbacks.
> >
> > 1. use the embedded id class as key `cache.put( new Name("Emmanuel",
> "Bernard"), mapRepresentingUser );`
> > 2. use an array of property values `cache.put( new Object[] {"Emmanuel",
> "Bernard"}, mapRepresentingUser );`
> > 3. use a Map corresponding to the array `cache.put( new
> HashMap( {{ "firstname" -> "Emmanuel", "last

Re: [hibernate-dev] [OGM] storing the column names in the entity keys for K/V stores

2014-11-26 Thread Sanne Grinovero
On 26 November 2014 at 10:19, Gunnar Morling  wrote:
> 2014-11-25 14:30 GMT+01:00 Emmanuel Bernard :
>
>> Hi,
>>
>> With OGM-452 behind us which brings one cache per “table”, we now have
>> another decision in front of us.
>>
>> Should we use a synthetic key for the cache key (say a
>> PersistentEntityKey class containing the array of column names and the
>> array of column values)?
>> Or should we use the natural object key?
>>
>> == Natural entity key
>>
>> In the latter, things gets complicated quickly, let me explain:
>>
>> === Simple case
>>
>> For simple cases, the id is a simple property and the fit is very
>> natural
>>
>> [source]
>> --
>> @Entity
>> class User {
>> @Id String name;
>> ...
>> }
>>
>> //corresponds to
>> cache.put(name, mapRepresentingUser);
>> --
>>
>> === Embedded id
>>
>> If the identifier is an embedded id, you have several choices that all have
>> drawbacks.
>>
>> 1. use the embedded id class as key `cache.put( new Name("Emmanuel",
>> "Bernard"), mapRepresentingUser );`
>> 2. use an array of property values `cache.put( new Object[] {"Emmanuel",
>> "Bernard"}, mapRepresentingUser );`
>>
>
> Will that work at all? Does ISPN really work with value equality for
> array-typed keys?
>
> In a normal hash map you wouldn't get the value back as new Object[] {
> "Emmanuel", "Bernard"}.equals( new Object[] {"Emmanuel", "Bernard"} ) is
> false. So you would have to put the key into a wrapper whose equals method
> uses Arrays.equals() internally.

Good catch. But yes, you could get Infinispan to work like that as you
can override the Equality function, and I think it even has one for
arrays out of the box, although it was meant for byte[] which is of
course a strong use case.
So it can be done, but that doesn't make it a better idea of course.


>> 3. use a Map corresponding to the array `cache.put( new
>> HashMap( {{ "firstname" -> "Emmanuel", "lastname"->"Bernard"
>> } ), mapRepresentingUser );
>> 4. use an synthetic key `cache.put( new PersistentEntityKey( new String[]
>> {"firstname", "lastname" }, new String[] { "Emmanuel", "Bernard" } ),
>> mapRepresentingUser);`
>>
>> In 1, the problem is that we lose the proper data type abstraction
>> between the object model and the data stored. `Name` is a user class.
>>
>> In 2, I think the model is somewhat acceptable but a bit arbitrary.
>>
>> In 3, I suspect the map is pretty horrific to serialize - that could be
>> solved by a externalizer. But more importantly the order of the id
>> columns is lost - even though it might be recoverable with
>> EntityKeyMetadata?
>>
>> In 4, we expose the person querying the grid to our OGM specific type.
>>
>
> The current implementation puts a PersistentEntityKey designed as you
> describe into the cache, but the externalizer only writes the column name
> and value arrays. This should be readable without knowing the PEK type,
> right? Of course you need to know the structure of the persisted key in
> order to read it back.

Right. By controlling the Externalizer code, we can use OGM specific
types but different tools could use a different Externalizer to
interpret the data in a different way.
So you won't need the PersistentEntityKey class definition to explore
the cache content, but still you need some utility which knows how to
decode the byte stream.

Remember there is no such thing as a Json based console for
Infinispan, so even storing things as a simple touple of Strings..
doesn't necessarily make it easy to look into.

> Now Davide's idea was to only write the column value array, as the column
> names are not really needed (assuming that one cache never contains entries
> from several tables). This seems sensible to me unless I'm missing some
> special case. The persisted form would be basically the one from 2., only
> that there is a wrapper used at the API level.

I agree that should work, but raises the same questions from my parallel email.

Sanne

>> Aside from this, it is essentially like 4.
>>
>> === Entity key approach
>>
>> I really like the idea of the simple case be mapped directly, it makes
>> for *the* natural mapping one would have chosen. But as I explained, it
>> does not scale.
>> In the composite id case, I don't really know what to chose between 2, 3
>> and 4.
>>
>> So, should we go for the simple case if we can? Or favor consistency
>> between the simple and complex case?
>
> And which of the complex case do we favor?
>>
>
> My preference would be 4, with the proposed change of only writing the
> column values. For the "simple case" we'd then could either store an array
> of size 1 or just the single value itself, wrapping it into an array when
> reading it back. I guess that'd require an instanceof call during read
> back. Not sure whether that's good or bad, probably I'd just always store
> the array.
>
>
>>
>> == Association
>>
>> In the case of associations, it becomes a bit trickier because the
>> "simple case" where the association key is made of a single column is

Re: [hibernate-dev] [OGM] storing the column names in the entity keys for K/V stores

2014-11-26 Thread Emmanuel Bernard

> On 26 Nov 2014, at 12:42, Sanne Grinovero  wrote:
> 
> ORDERED_ARRAY_STRATEGY could be an Enum, and give you some flexibility
> among your proposals.  With the current model I'd stick to the Map as
> they are the only one safe enough,

I don’t know what you mean by ordered array strategy and enum.
The rest of the sentence seems to imply that you prefer option 3 even for the 
single id column case. Correct?

> You make a great point about making it easier to run native queries.
> Is that a new goal we have? It seems we have to define the goals we
> want, as the proper data abstraction goal seems to clash with it.
> I'd rather make a custom Query walker which understand how we store
> things in Infinispan, and keep the safety of our more verbose and less
> efficient storage model. For example an inspection tool connected to
> the grid could choose to not show the "SchemaId" tokens, but use them
> to be able to render the entry in some human understandable way, like
> by adding the column names on a table.

These are not really new goals.
Goal #1: make the most natural mapping as possible as if you were not using the 
ORM but rather use the tool directly.
Goal #2: expose the native query capabilities as a first class citizen and on 
equal footing as the JP-QL support to benefit from the specificities of your 
NoSQL backend.
Non Goal: Hibernate OGM is not in the database business - at least until we 
explore the polyglot persistence topic :)

Now Goal #1 is a bit blurred by the fact that you could consider the use of a 
data grid as a secondary and mostly temporary store of a data set persisted 
elsewhere. I  which case the JPA API prevails as the entry point and how you 
store the data in the grid is not important. But that’s one use case.

> We didn't mention javax.persistence.IdClass but I assume the same applies.

Yes, IdClass with a specific type are like embeddable ids essentially. And 
implicit IdClass (forgot what they are called) means that the entity class 
itself is also the id class.
___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev

Re: [hibernate-dev] [OGM] storing the column names in the entity keys for K/V stores

2014-11-26 Thread Emmanuel Bernard

> 2. use an array of property values `cache.put( new Object[] {"Emmanuel", 
> "Bernard"}, mapRepresentingUser );`
> 
> Will that work at all? Does ISPN really work with value equality for 
> array-typed keys?
> 
> In a normal hash map you wouldn't get the value back as new Object[] 
> {"Emmanuel", "Bernard"}.equals( new Object[] {"Emmanuel", "Bernard"} ) is 
> false. So you would have to put the key into a wrapper whose equals method 
> uses Arrays.equals() internally.

OK

>  
> 3. use a Map corresponding to the array `cache.put( new 
> HashMap( {{ "firstname" -> "Emmanuel", "lastname"->"Bernard" } 
> ), mapRepresentingUser );
> 4. use an synthetic key `cache.put( new PersistentEntityKey( new String[] 
> {"firstname", "lastname" }, new String[] { "Emmanuel", "Bernard" } ), 
> mapRepresentingUser);`
> 
> In 1, the problem is that we lose the proper data type abstraction
> between the object model and the data stored. `Name` is a user class.
> 
> In 2, I think the model is somewhat acceptable but a bit arbitrary.
> 
> In 3, I suspect the map is pretty horrific to serialize - that could be
> solved by a externalizer. But more importantly the order of the id
> columns is lost - even though it might be recoverable with
> EntityKeyMetadata?
> 
> In 4, we expose the person querying the grid to our OGM specific type.
> 
> The current implementation puts a PersistentEntityKey designed as you 
> describe into the cache, but the externalizer only writes the column name and 
> value arrays. This should be readable without knowing the PEK type, right? Of 
> course you need to know the structure of the persisted key in order to read 
> it back.

What I am not sure about is whether one can have the same externalizer id 
plugged to different unmarshallers depending on the node / classpath you live 
in.
But take the M/R framework of Infinispan today, it would require to put the 
PersistentEntityKey class in the CP of all nodes at present.
> 
> 
> Now Davide's idea was to only write the column value array, as the column 
> names are not really needed (assuming that one cache never contains entries 
> from several tables). This seems sensible to me unless I'm missing some 
> special case. The persisted form would be basically the one from 2., only 
> that there is a wrapper used at the API level.

As Sanne mentioned, a case for keeping the column names would be if the id 
structure changes somehow. One would need a way to distinguish the old from the 
new representation. There are many ways to approach that:
- keep the structure in back key
- do the schema reference Sanne mentions
- rename the existing cache TABLE_OLD and migrate the data on the fly from 
TABLE_OLD to TABLE_NEW

In most NoSQLs things are rather proposed and encouraged, in the data grid 
space it’s do whatever you want. The question is which options do we want to 
support for the grid. For example Sanne’s schema reference feels very intrusive 
to me. It’s something that should be handled by the NoSQL system if it was 
meant to offer such approach.

>  
> Aside from this, it is essentially like 4.
> 
> === Entity key approach
> 
> I really like the idea of the simple case be mapped directly, it makes
> for *the* natural mapping one would have chosen. But as I explained, it
> does not scale.
> In the composite id case, I don't really know what to chose between 2, 3
> and 4.
> 
> So, should we go for the simple case if we can? Or favor consistency
> between the simple and complex case?
> And which of the complex case do we favor?
> 
> My preference would be 4, with the proposed change of only writing the column 
> values. For the "simple case" we'd then could either store an array of size 1 
> or just the single value itself, wrapping it into an array when reading it 
> back. I guess that'd require an instanceof call during read back. Not sure 
> whether that's good or bad, probably I'd just always store the array.

I don’t think you can quite do that. To be able to do an instanceof, you need 
an instance. And to build that instance from bytes, you need a type.

___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev

Re: [hibernate-dev] [OGM] storing the column names in the entity keys for K/V stores

2014-11-26 Thread Emmanuel Bernard

> On 26 Nov 2014, at 15:21, Gunnar Morling  wrote:
> 
> -11-26 12:42 GMT+01:00 Sanne Grinovero  >:
> It looks like you're aiming at a "pure" mapping into primitives for
> the datagrid.
> 
> So it looks very beautiful and tempting to go for a model such as
>  > cache.put( "identifier name", ...)
> but it seems quite dangerous to me for the same reason that you store
> (conceptually):
>   {"firstname", "lastname" }, { "Emmanuel", "Bernard" }
> rather than storing:
>   { "Emmanuel", "Bernard" }
> 
> Obviously the second one looks more natural in the storage, but you're
> not really sure what these tokens were supposed to represent in case
> someone decides to refactor the model.
> I understand that it's now quite safe to remove the "tablename" in the
> per-cache-table model, as entries would still be isolated: that was
> the goal, but also it matches exactly the model proven by the RDBMs
> model.
> But there are implications in terms of flexibility and schema
> evolution if we remove the "column names" and generally speaking it's
> our only way of validating what an entry was supposed to model.
> 
> Yes, evolution is a very strong argument indeed for sticking to the current 
> approach. Without the column names (or some other form of descriptor as 
> suggested below) we will not be able to recognize the version of a given key 
> so we cannot apply any "migrations" to it, either upon loading or via some 
> sort of batch run.

Let me challenge that a bit even if I understand that there is a potential 
problem. type and id are the invariable part of the data you put in a datastore.
So the data migration / morphing does happen on the *value* much more than on 
the key itself.
You would be able to apply migrations in that case.

>  
> 
> Speaking of, like we don't normally store the "tablename" in a column
> of a table in an RDBMs, we don't really store its column names either.
> So an alternative solution which more closely matches the proven RDBMs
> model would be to store the schema representation of the table in the
> Cache:
> 
> personsCache.put( SchemaGenerationId{1}, { ORDERED_ARRAY_STRATEGY,
> "firstname", "lastname") );
> 
> then you would need to store entries linking them to a specific
> Schema, such as { "Emmanuel", "Bernard", SchemaGenerationId{1} }.
> 
> such a SchemaGenerationId would be a cheap singleton (one per
> "table"), and could be stored as efficiently as two integers (one for
> the Marshaller id and one int for the schema generation id).
> 
> ORDERED_ARRAY_STRATEGY could be an Enum, and give you some flexibility
> among your proposals.  With the current model I'd stick to the Map as
> they are the only one safe enough, but with a schema definition like
> the above description I'd definitely want to use the ordered sequence
> (array?) as it's far more efficient at all levels.
> A benefit is that I suspect that you could then transactionally evolve
> the schema, and it wouldn't be too hard for us to provide a tool to
> perform an "online schema migration".
> 
> That's an interesting idea. Or having a separate KeyDescriptor cache which 
> holds an entry for each key type? Mixing the key definition and records using 
> it within one cache seems a bit odd to me.

It is interesting. But are we in the database business?
If we are interested in this approach, maybe we should create a side project 
that offers schema atop the most common k/v?
___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


[hibernate-dev] org.hibernate.persister.spi.PersisterFactory and 5.0

2014-11-26 Thread Steve Ebersole
Part of the goals for ORM 5.0 is moving from Configuration to the
ServiceRegistry+Metadata for building a SessionFactory.

One of the points I ran into that will have to change
is org.hibernate.persister.spi.PersisterFactory.  The problems is that
PersisterFactory accepts a Configuration as part of building
CollectionPersisters.  The need for Configuration in the standard
CollectionPersister impls is amazingly trivial; we literally use it to
locate the associated entity's PersistentClass to grab the classes dom4j
node name, and this is right after we have just resolved the corresponding
EntityPersister.  The point being that the standard CollectionPersisters
really don't need access to the Configuration.

I am pretty sure OGM provides a custom PersisterFactory, or is it just
the PersisterClassResolver that OGM provides?  Also, I would assume OGM is
providing custom CollectionPersister impls.  This change would affect both
usages.

I wanted y'all to be aware of this upcoming change.  But I also wanted to
start a discussion about what the signature(s) should become.  Currently we
pass:

* Configuration
* Collection (the parsed mapping info)
* CollectionRegionAccessStrategy
* SessionFactoryImplementor


I suggest we pass:

* Collection
* CollectionRegionAccessStrategy
* SessionFactoryImplementor
* Mapping

(I changed order to align with the order for building EntityPersisters)

Mapping is org.hibernate.engine.spi.Mapping which is part of
Configuration.  I decided to (at least temporarily) port this contract
forward to ease migration.  Metadata implements it.

There is a similar discussion to be had wrt Integrators.  I will follow up
with an email specific to them later.
___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev