Hi Gunnar (& Emmanuel), Thanks again for the info. Chugging on slowly when I get the time.
The sequences are an interesting problem: C* does not supply built-in functionality to create sequences. That leaves us with an interesting problem because the standard approach of creating a sequence table would seem to hobble the "write fast" that C* users know and love. Alternatives to the use of a C* table to generate sequences then bring us face to face with the problem of generating id.s on multiple nodes (I assume here that C* is being used in a distributed environment) - we use a home-grown implementation of twitter snow-flake for this purpose. Cheers, John On Tue, Sep 9, 2014 at 1:06 PM, Gunnar Morling <gun...@hibernate.org> wrote: > Hi, > > 2014-09-09 12:55 GMT+02:00 John Worrell <jlesi...@gmail.com>: > >> Hi Gunnar, >> >> Wrt the <class> tags - partly it is an issue with Eclipse JPA which >> complains if the <class> tags are absent, but I think it *may* actually not >> make any difference to the examples - the real issue lies with the code not >> picking up the sequences to generate properly, and as you point out that >> may now be fixed in the latest master. >> > > To provide some more details, it's a dialect-specific implementation of > the SchemaDefiner contract which is in charge of the schema initialization. > The specific implementation type is to be returned from > DatastoreProvider#getSchemaDefinerType(). The SchemaDefiner is invoked by > the engine after session factory initialization (eventually it will only be > invoked if required so by the "hbm2ddl.auto" setting). > > That contract is still experimental at this time, we need to flesh it out > in more detail, also based on the feedback what's needed for Cassandra (as > it is the first store with a fixed schema). > > Does Cassandra have any counterpart to physical sequences as e.g. in > Oracle? If not (and it can not be emulated in a meaningful way as we do for > Neo4j), GridDialect#supportsSequences() would have to return false, and the > table-based strategy needs to be implemented. > > I'll look at a rebase. >> >> Thanks, >> >> John >> > > Hth, > > --Gunnar > > >> On Tue, Sep 9, 2014 at 10:36 AM, Gunnar Morling <gun...@hibernate.org> >> wrote: >> >>> Hi, >>> >>> 2014-09-09 11:08 GMT+02:00 John Worrell <jlesi...@gmail.com>: >>> >>>> Hi Gunnar, >>>> >>>> Many thanks for the reply - I'll yank down the master... assume it is >>>> merged back to the Jon Halliday fork otherwise I'll need to mess about a >>>> bit. >>>> >>> >>> Not sure when Jon's branch was updated for the last time. >>> >>> Probably you need to rebase (we prefer to work with rebases rather than >>> merge commits) your local branch onto the latest master from upstream. >>> There have been some changes around GridDialect in the last time, mainly >>> about query execution and id generation. Nothing dramatic, though. >>> >>> >>>> Also had some issues with getting connected to C*, understandable, but >>>> also wrt adding <class> tags for the Dog / Breed classes in the >>>> persistence.xml file. not sure whether that is intended to be needed. >>>> >>> >>> You mean the classes from the "Getting Started" example, right? The >>> <class> tags should not be required, the example runs as is e.g. on >>> Infinispan. What happens if you don't add those? >>> >>> Cheers, >>>> >>>> John >>>> >>> >>> --Gunnar >>> >>> On Tue, Sep 9, 2014 at 9:59 AM, Gunnar Morling <gun...@hibernate.org> >>>> wrote: >>>> >>>>> Hi John, >>>>> >>>>> 2014-09-09 10:33 GMT+02:00 John Worrell <jlesi...@gmail.com>: >>>>> >>>>>> Hi Emmanuel & Gunnar, >>>>>> >>>>>> Many thanks for your detailed responses - and nice to chat with >>>>>> Gunnar a >>>>>> week or so back. Again I have to apologise for radio silence - my day >>>>>> job >>>>>> suddenly ate all my waking functional time - so progress has been >>>>>> very slow. >>>>>> >>>>> >>>>> No worries, we are very glad about your help. >>>>> >>>>> I'm getting deeper into the code now, and starting a POC... which is >>>>>> leading me to some more detailed questions. Basically, what I am >>>>>> doing is >>>>>> to run the examples and to look at things that seem to be missing, >>>>>> and toi >>>>>> understand the data that is being passed around in the various options >>>>>> classes, so I can make a more informed implementation >>>>>> >>>>> >>>>> Sounds very reasonable. I also can recommend to take a look at the >>>>> MongoDB dialect and the persistent representations it creates in the >>>>> datastore as it can comfortably be browsed e.g. using the mongo command >>>>> line client. That's how I got to understand many things of the interaction >>>>> between engine and dialects. >>>>> >>>>> If you have any ideas where the dialect SPI documentation can be >>>>> improved to facilitate an easier understanding of how pieces work >>>>> together, >>>>> let me know. >>>>> >>>>> The key question in my mind at the moment is that of the relationship >>>>>> between the base Hibernate Dialect class and the GridDialect interface >>>>> >>>>> >>>>> OGM has its own pseudo implementation of ORM's Dialect contract, >>>>> OgmDialect, but this should hardly ever play a role during OGM >>>>> development. >>>>> OGM's main contract towards dialects is GridDialect. >>>>> >>>>> The reason for exposing GridDialect on the pseudo OgmDialect is that >>>>> it is our backdoor to make GridDialect available to >>>>> PersistentNoSqlIdentifierGenerator implementations. Atm. there is no way >>>>> to >>>>> inject the GridDialect in a more straight-forward way due to some >>>>> limitations in the way we integrate with the ORM engine. >>>>> >>>>> >>>>>> - I >>>>>> look at the OgmTableGenerator which is attempting to access a CF / >>>>>> table >>>>>> that is not yet created - I figured I understand what was happening >>>>>> here, >>>>>> and make appropriate extensions / fixes first. So, currently fighting >>>>>> my >>>>>> way through generating the sequence tables, and wondering why >>>>>> OgmSequnceGenerator wraps OgmtableGenerator. >>>>>> >>>>> >>>>> Just to be sure, are you looking at the latest master? There have been >>>>> some changes around these generator classes recently, they are in a much >>>>> cleaner state than they used to be. >>>>> >>>>> The reason for the wrapping is that when using the SEQUENCE strategy >>>>> in cases where the store actually does not natively support sequences, we >>>>> fall back to TABLE. Currently we only support a "native" SEQUENCE strategy >>>>> for Neo4j which allows to map sequences as nodes in a reasonable manner, >>>>> whereas all the other dialects use the table fallback. >>>>> GridDialect#supportsSequences() is evaluated to find out whether the >>>>> delegation needs to be done or not. >>>>> >>>>> You also could take a look at Neo4jSequenceGenerator which creates the >>>>> sequence nodes in the datastore based on the registered >>>>> PersistentNoSqlIdentifierGenerators. Note that this checks via instanceof >>>>> for OgmSequenceGenerator/OgmTableGenerator atm. As we don't want to expose >>>>> these types on the dialect SPI, I'm looking into ways for allowing the >>>>> distinction of the two in a more abstract way, mainly based on >>>>> IdSourceKeyMetadata. >>>>> >>>>> Hope that helps, I'll be very happy to answer any follow-up questions. >>>>> Thanks again for your help with the Cassandra dialect, I'm looking forward >>>>> to this dialect very much! >>>>> >>>>> >>>>>> >>>>>> Cheers, >>>>>> >>>>>> John >>>>>> >>>>> >>>>> --Gunnar >>>>> >>>>> >>>>>> >>>>>> >>>>>> On Fri, Aug 22, 2014 at 5:25 PM, Emmanuel Bernard < >>>>>> emman...@hibernate.org> >>>>>> wrote: >>>>>> >>>>>> > On Thu 2014-08-07 9:10, John Worrell wrote: >>>>>> > > Hi Emmanuel et al., >>>>>> > > >>>>>> > > My apologies for the log radio silence. I've taken a look at the >>>>>> > code-base >>>>>> > > on Jon Halliday's repo, and have set up a nick on freenode - >>>>>> #jlesinge. >>>>>> > >>>>>> > No worries I was on holidays. >>>>>> > And you email was the few lucky ones that I had to delay as it >>>>>> required >>>>>> > thinking ;) >>>>>> > >>>>>> > > >>>>>> > > On the time-series question I was wondering how you envisaged the >>>>>> data >>>>>> > > stored: I tend to think of a single row under an primary key with >>>>>> an >>>>>> > > object-instance per column. Now what we have typically done >>>>>> (generally >>>>>> > the >>>>>> > > data has been immutable) is to store the data serialized as a >>>>>> blob (JSON >>>>>> > or >>>>>> > > XML), but I understand you do not favour this approach. With this >>>>>> sort of >>>>>> > > model I imagine the collection is then all the objects stored in >>>>>> the row, >>>>>> > > and the challenge is to page through the objects in the row. >>>>>> > >>>>>> > Actually it is one of the valid strategies. >>>>>> > If I understand you well, you want to create: >>>>>> > >>>>>> > - one row per time series generating object (say a thermometer) >>>>>> > - the column names of that row would be a timestamp of time at bay >>>>>> > - the value would be a JSON structure containing the data at bay for >>>>>> > that specific time. >>>>>> > >>>>>> > That is one of the valid approach. But I think we need to support >>>>>> > several: >>>>>> > >>>>>> > - simple column if the data is literally a single element >>>>>> (temperature) >>>>>> > - JSON structure for more complex data per time event >>>>>> > - key pointing to the detailed data somewhere else in the cluster >>>>>> > >>>>>> > The latest would be done in two phases, you load all the keys you >>>>>> are >>>>>> > interested in matching your time range and then do a multiget of >>>>>> sort to >>>>>> > load the data. >>>>>> > >>>>>> > It seems datastax tends to recommend 1 or 2 (denormalization FTW). >>>>>> > >>>>>> > I don't know but there is also the notion of super column which is a >>>>>> > grouping of columns that might also address our composite problem >>>>>> > assuming they can be used for dynamic column families. >>>>>> > >>>>>> > >>>>>> http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra >>>>>> > >>>>>> > >>>>>> http://planetcassandra.org/blog/post/getting-started-with-time-series-data-modeling/ >>>>>> > http://www.datastax.com/docs/1.0/ddl/column_family >>>>>> > >>>>>> > > >>>>>> > > An approach we have often taken is to create multiple copies of >>>>>> data in >>>>>> > > different (obviously works well only for immutable objects) or >>>>>> better to >>>>>> > >>>>>> > Yes, that is a feature that I would like OGM to automate for the >>>>>> user. >>>>>> > It declaratively defines the denormalization approaches he wants >>>>>> and the >>>>>> > engine does the persistence. >>>>>> > Next the query engine uses that knowledge to find the best path (or >>>>>> only >>>>>> > possible path in the case of Cassandra :) ) >>>>>> > >>>>>> > > create a table of keys to a main table where in either approach >>>>>> the >>>>>> > > row-keys are effectively a foreign-key and there is column per >>>>>> object >>>>>> > > associated through the foreign-key. Another approach though might >>>>>> be to >>>>>> > use >>>>>> > > a column with type list (or set, or map) to contain keys to the >>>>>> > associated >>>>>> > > objects - this would be a little like the extensions Oracle have >>>>>> for >>>>>> > > mapping 1-* associations, though with the caveat that a column of >>>>>> > > collection type may only contain 64k elements. I wondered if some >>>>>> though >>>>>> > > had been given to this strategy (which I must admit I have not >>>>>> yet used >>>>>> > > myself). >>>>>> > >>>>>> > I am not aware of that approach. >>>>>> > >>>>>> > > >>>>>> > > It seems very likely that different mapping strategies should be >>>>>> > > specifiable, but then I have still to understand how these might >>>>>> fit with >>>>>> > > treiid. >>>>>> > >>>>>> > Forget Teiid for now. We will likely start with the HQL->Walker and >>>>>> do >>>>>> > our own proto query engine before layering Teiid. >>>>>> > >>>>>> > > >>>>>> > > Can I ask about assumptions: is it fair to assume that for >>>>>> Cassandra, OGM >>>>>> > > will target only CQL 3 (which means Cassandra 2 or maybe 1.2)? >>>>>> This would >>>>>> > > certainly make life simpler. >>>>>> > >>>>>> > Yes that's fine. >>>>>> > >>>>>> > > >>>>>> > > An issue I don't see addressed is the choice of consistency-level >>>>>> (read >>>>>> > or >>>>>> > > write) and I wondered if there was a plan for this? Assumptions >>>>>> can be >>>>>> > made >>>>>> > > on a per table basis, but, certainly for ad hoc queries, it is >>>>>> important >>>>>> > > think to have the flexibility to specify on a per-query basis. >>>>>> > >>>>>> > That's planned. We have an option system that allow for entity / >>>>>> > property overriding of a global setting. While not implemented, we >>>>>> will >>>>>> > also have the ability to override setting per session / query. >>>>>> > That was the plan all along. >>>>>> > >>>>>> > > >>>>>> > > Those are my thoughts so far... I'll see about doing a POC of >>>>>> some of >>>>>> > what >>>>>> > > I have described above >>>>>> > >>>>>> > Thanks :) >>>>>> > >>>>>> > > >>>>>> > > Cheers, >>>>>> > > >>>>>> > > John >>>>>> > > >>>>>> > > >>>>>> > > On Mon, Jul 21, 2014 at 4:48 PM, John Worrell <jlesi...@gmail.com >>>>>> > >>>>>> > wrote: >>>>>> > > >>>>>> > > > Hi Emmanuel, >>>>>> > > > >>>>>> > > > I'll take a look at what is there, and I'll get up and running >>>>>> on IRC. >>>>>> > > > >>>>>> > > > I'll particularly look at the time-series issue - non-trivial I >>>>>> think. >>>>>> > > > >>>>>> > > > Cheers, >>>>>> > > > >>>>>> > > > John >>>>>> > > > >>>>>> > > > >>>>>> > > > On Mon, Jul 21, 2014 at 1:06 PM, Emmanuel Bernard < >>>>>> > emman...@hibernate.org> >>>>>> > > > wrote: >>>>>> > > > >>>>>> > > >> Hi John, >>>>>> > > >> >>>>>> > > >> I thought I had replied to you on Friday but apparently the >>>>>> email >>>>>> > never >>>>>> > > >> went through :/ >>>>>> > > >> >>>>>> > > >> That is good news :) >>>>>> > > >> Jonathan worked on a Cassandra prototype but had to drop due >>>>>> to other >>>>>> > > >> duties. He pushed everything at >>>>>> > > >> >>>>>> https://github.com/jhalliday/hibernate-ogm/tree/jonathan_cassandra >>>>>> > > >> >>>>>> > > >> Have a look at what he has done and come ask any question to >>>>>> Gunnar, >>>>>> > > >> Davide or me. There are a bunch of moving pieces. We are >>>>>> mostly on >>>>>> > > >> freenode’s #hibernate-dev ( you need a freenode login >>>>>> > > >> http://freenode.net/faq.shtml#nicksetup ). If you are >>>>>> allergic to >>>>>> > IRC, >>>>>> > > >> let me know and we will find alternatives. >>>>>> > > >> >>>>>> > > >> The most interesting challenge will be to see how we can map >>>>>> time >>>>>> > series >>>>>> > > >> into a collection and make sure we let the user decide how >>>>>> much he >>>>>> > wants to >>>>>> > > >> load. >>>>>> > > >> >>>>>> > > >> Emmanuel >>>>>> > > >> >>>>>> > > >> On 16 Jul 2014, at 13:17, John Worrell <jlesi...@gmail.com> >>>>>> wrote: >>>>>> > > >> >>>>>> > > >> > Hi, >>>>>> > > >> > >>>>>> > > >> > I'm interested in contributing to the Cassandra module of >>>>>> > Hibernate-OGM >>>>>> > > >> - >>>>>> > > >> > what would be the baest way to go about this? >>>>>> > > >> > >>>>>> > > >> > Thanks, >>>>>> > > >> > >>>>>> > > >> > John >>>>>> > > >> > _______________________________________________ >>>>>> > > >> > hibernate-dev mailing list >>>>>> > > >> > hibernate-dev@lists.jboss.org >>>>>> > > >> > https://lists.jboss.org/mailman/listinfo/hibernate-dev >>>>>> > > >> >>>>>> > > >> >>>>>> > > > >>>>>> > >>>>>> _______________________________________________ >>>>>> hibernate-dev mailing list >>>>>> hibernate-dev@lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/hibernate-dev >>>>>> >>>>> >>>>> >>>> >>> >> > _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev