No problem. The association is lazy but I will investigate about Hibernate.initialize
On Tue, Mar 12, 2013 at 8:01 PM, Emmanuel Bernard <emman...@hibernate.org> wrote: > I have not forgotten, I'm just in a middle of a Bean Validation crisis > that delayed my look into this issue. > Could it be BTW that the mass indexer does not ask for these objects to > be loaded using Hibernate.initialize ? It coudl also be a bug in OGM but > not necessarily. In particular is the association lazy or eager? > > Emmanuel > > On Mon 2013-03-11 11:00, Davide D'Alto wrote: >> I have created a branch for OGM-228 (OGM MassIndexer) that includes >> OGM-151 (Metamodel) and OGM-273 (load entities from tuple): >> https://github.com/DavideD/hibernate-ogm/tree/OGM-228 >> >> A test I've added fails though (AssociationMassIndexerTest): >> https://github.com/DavideD/hibernate-ogm/blob/74549a4d264af30fa88960c30e2a872da6afd596/hibernate-ogm-core/src/test/java/org/hibernate/ogm/test/massindex/AssociationMassIndexerTest.java >> >> The test uses two entitties IndexedNews and IndexedLabel, with a >> relationship one to many from news to label. >> The mass indexing works fine but when I retrieve the list of indexed >> labels with the query "FROM IndexedLabel", the result contains a list >> of proxy and the equals fails because the class of the objects in the >> list is not IndexedLabel. >> >> If I first get the list of news and than for each of them I called the >> method news.getLabels(), everything works fine. >> >> Any thoughts >> >> Thanks >> >> On Thu, Mar 7, 2013 at 10:15 AM, Emmanuel Bernard >> <emman...@hibernate.org> wrote: >> > I have no more coin for this one so I have dumped what I have so far >> > https://github.com/hibernate/hibernate-ogm/pull/175 >> > >> > Emmanuel >> > >> > On Wed 2013-03-06 19:18, Emmanuel Bernard wrote: >> >> I've successfully implemented OGM-151 for EntityKey which is the one we >> >> need to move OGM-273 forward for now. >> >> I am trying to implement it for AssociationKey but caching here is >> >> significantly harder as data is cross reference across associations. >> >> >> >> Sanne, when you worked on the profiling of OGM, do you remember >> >> AssociationKey putting a pressure in build time or memory wise? Because >> >> caching them per persister means some rather complex race conditions and >> >> more memory used permanently (as opposed to on demand). >> >> >> >> So I'm wondering if that's worth it. As an intermediary step, I could >> >> introduce AssociationKeyMetadata but build it on-demand - that one is >> >> easier to achieve. >> >> >> >> Emmanuel >> >> >> >> On Wed 2013-03-06 15:32, Davide D'Alto wrote: >> >> > it's ok for me >> >> > >> >> > Davide >> >> > >> >> > On Wed, Mar 6, 2013 at 3:28 PM, Emmanuel Bernard >> >> > <emman...@hibernate.org> wrote: >> >> > > I'm planning on working on OGM-151. Fine with everyone? >> >> > > That will likely be my last before I move back to BVAL and close the >> >> > > final issues there. >> >> > > >> >> > > Emmanuel >> >> > > >> >> > > On Tue 2013-03-05 19:04, Sanne Grinovero wrote: >> >> > >> Nice! >> >> > >> n+1 is something Hibernate Search has to deal with too, that's why I >> >> > >> was interested in the fetch profiles and graph loading in JPA 2.1 >> >> > >> >> >> > >> On 5 March 2013 17:44, Emmanuel Bernard <emman...@hibernate.org> >> >> > >> wrote: >> >> > >> > I have implemented a solution that gives an entity based on a >> >> > >> > tuple. >> >> > >> > https://hibernate.onjira.com/browse/OGM-273#comment-50082 >> >> > >> > >> >> > >> > Note that it does not currently works for MongoDB, but that's >> >> > >> > waiting >> >> > >> > for the dedicated GridDialect method as well as OGM-151. >> >> > >> > Also note that I have no idea how that will work for associations. >> >> > >> > I >> >> > >> > suspect some nasty n+1 is happening as best. Worse case, an >> >> > >> > exception :) >> >> > >> > >> >> > >> > Emmanuel >> >> > >> > >> >> > >> > On Tue 2013-03-05 10:30, Emmanuel Bernard wrote: >> >> > >> >> We might hope for a stable enough contract on Hibernate Search and >> >> > >> >> hope that we won't break serializability between micro or minor >> >> > >> >> versions. That will need to be taken into account in the test >> >> > >> >> suite and >> >> > >> >> design. >> >> > >> >> On the OGM side though, we are not at that level of maturity and >> >> > >> >> we will >> >> > >> >> force homogenous Hibernate OGM version across all the cluster. >> >> > >> >> The grid >> >> > >> >> will have to go down for upgrades or enforce that no mpa reduce >> >> > >> >> job >> >> > >> >> using OGM is used while the version roll out is in process. >> >> > >> >> >> >> > >> >> Emmanuel >> >> > >> >> >> >> > >> >> On Mon 2013-03-04 18:30, Sanne Grinovero wrote: >> >> > >> >> > Found an example, this is all the code it needs to have a >> >> > >> >> > MassIndexer working >> >> > >> >> > on top of Infinispan's Map/Reduce: >> >> > >> >> > >> >> > >> >> > https://github.com/infinispan/infinispan/blob/master/query/src/main/java/org/infinispan/query/impl/massindex/IndexingMapper.java#L40 >> >> > >> >> > >> >> > >> >> > Note it's initialize method which injects needed components; the >> >> > >> >> > implementation is serialized across nodes. >> >> > >> >> > >> >> > >> >> > Sanne >> >> > >> >> > >> >> > >> >> > On 4 March 2013 18:26, Sanne Grinovero <sa...@hibernate.org> >> >> > >> >> > wrote: >> >> > >> >> > > We finished this discussion on IRC, in case someone else was >> >> > >> >> > > interested: >> >> > >> >> > > >> >> > >> >> > > <sanne> hum I forgot the first step.. transformation from >> >> > >> >> > > entry into entity >> >> > >> >> > > <sanne> updated >> >> > >> >> > > <sanne> emmanuel, the "hidrate" step is what DavideD is >> >> > >> >> > > bashing is >> >> > >> >> > > head against, but let's assume he finds a workaround and we >> >> > >> >> > > focus on >> >> > >> >> > > the pattern as first step? >> >> > >> >> > > <emmanuel> https://gist.github.com/emmanuelbernard/5084039 >> >> > >> >> > > <emmanuel> sanne: ^ that's how I would do it if I had an >> >> > >> >> > > Iterator from the tuple >> >> > >> >> > > <emmanuel> assuming pushToExecutor pushes to whatever >> >> > >> >> > > concurrent work >> >> > >> >> > > mechanism you planned to use on consumes >> >> > >> >> > > <emmanuel> Plus I am not folloing exactly how you plan >> >> > >> >> > > consumes(Entry) >> >> > >> >> > > to be executed concurrently >> >> > >> >> > > <emmanuel> is that the GridDialect responsibility? >> >> > >> >> > > <emmanuel> That looks like a lot of work on the dialect's side >> >> > >> >> > > <sanne> emmanuel, imagine the backend is Infinispan and has >> >> > >> >> > > some large >> >> > >> >> > > amount of data per node, plus that each node has its own >> >> > >> >> > > backend >> >> > >> >> > > IndexManager (like and ideal sharding) >> >> > >> >> > > <emmanuel> ie pool mgt and cap + queuing >> >> > >> >> > > <sanne> then with your approach the iterator needs to fetch >> >> > >> >> > > data from >> >> > >> >> > > all remote nodes, and then enqueue in a local blocking queue >> >> > >> >> > > which is >> >> > >> >> > > returning the data to the original owners >> >> > >> >> > > <sanne> but if you skip that step, you can just forward the >> >> > >> >> > > statless >> >> > >> >> > > consumer to each node and have it run on data locality >> >> > >> >> > > <emmanuel> I was thinking that if you had the luncene index >> >> > >> >> > > locally on >> >> > >> >> > > each node you would ahve a different impl of the MassIndexer >> >> > >> >> > > anyways >> >> > >> >> > > <emmanuel> that would simply send a command to each local node >> >> > >> >> > > <sanne> To answer your question: that would be an optional >> >> > >> >> > > GridDialect >> >> > >> >> > > responsibility. I would endorse a trivial first draft doing a >> >> > >> >> > > single-threaded loop. >> >> > >> >> > > <emmanuel> and have GridDialect.getDataFor() returnlocal data >> >> > >> >> > > <sanne> The "consumes" implementation can be either >> >> > >> >> > > implemented with a >> >> > >> >> > > simple iterator - as in your design - so I don't think it >> >> > >> >> > > pushes much >> >> > >> >> > > complexity to the GridDialect implementor? >> >> > >> >> > > <sanne> The benefit of the consumer is that *optionally* it >> >> > >> >> > > can be >> >> > >> >> > > mapped on the Map phase, and that's trivial if your backend >> >> > >> >> > > supports >> >> > >> >> > > Map/Reduce >> >> > >> >> > > <emmanuel> sanne: I don't follow that soory >> >> > >> >> > > <emmanuel> how does that make it mappable to the Map phase? >> >> > >> >> > > <sanne> "public void consume(Entry e) " is a degenerate >> >> > >> >> > > (simplified) >> >> > >> >> > > form of map. >> >> > >> >> > > <sanne> mm infinispan IDE crashes at the right moment. >> >> > >> >> > > <emmanuel> I thought Map was about *filtering* >> >> > >> >> > > <emmanuel> not processing >> >> > >> >> > > <sanne> you can decide to accept 100% of values (without >> >> > >> >> > > filtering), >> >> > >> >> > > but actually you might want to filter on the specified tables >> >> > >> >> > > only. >> >> > >> >> > > <sanne> also, the return type doesn't have to match the input >> >> > >> >> > > type: >> >> > >> >> > > hence you define a transformation function, which is >> >> > >> >> > > inherently >> >> > >> >> > > applied in parallel on all matching entries. >> >> > >> >> > > <emmanuel> sanne: but then you require the OGM code to be >> >> > >> >> > > everywhere >> >> > >> >> > > (ie on each node of the targetNoSQL >> >> > >> >> > > <emmanuel> to eb able to do tuple -> entity >> >> > >> >> > > <emmanuel> that's not realistic >> >> > >> >> > > <emmanuel> assuming your transform phase is about tuple -> >> >> > >> >> > > entity and >> >> > >> >> > > some HSearch ops >> >> > >> >> > > <sanne> yes right >> >> > >> >> > > <sanne> but isn;t it worth it? it's optional and much more >> >> > >> >> > > efficient, >> >> > >> >> > > as you avoid transferring any data. >> >> > >> >> > > <sanne> btw we often assume all nodes in the grid are equally >> >> > >> >> > > configured, so having same apps & libraries deployed. >> >> > >> >> > > <emmanuel> sanne: let me try and summarize what I understand >> >> > >> >> > > <emmanuel> it's more efficient if you store the Lucene index >> >> > >> >> > > locally >> >> > >> >> > > with the data, and if the grid is written in Java or at least >> >> > >> >> > > can run >> >> > >> >> > > code in Java including libraries and if you distribute the OGM >> >> > >> >> > > configuration across the whole grid >> >> > >> >> > > <emmanuel> Otherwise, it does not make any difference >> >> > >> >> > > <emmanuel> Also the GridDialect implementation need to know >> >> > >> >> > > if you are >> >> > >> >> > > doing this trick to only return local data >> >> > >> >> > > <sanne> no there are other drawbacks which get defeated, but >> >> > >> >> > > minor so >> >> > >> >> > > I didn't mention them >> >> > >> >> > > <emmanuel> am I right? >> >> > >> >> > > <sanne> mainly, you skip the need for the contentions point >> >> > >> >> > > as there >> >> > >> >> > > is no push to a shared blocking queue >> >> > >> >> > > <sanne> no the GridDialect doesn't need to know. >> >> > >> >> > > <emmanuel> sanne: sure if you can process the code on each >> >> > >> >> > > node you >> >> > >> >> > > avoid the shared blocking queue, at lest until you reach the >> >> > >> >> > > IndexManager >> >> > >> >> > > <sanne> you'll just forward a simple (standard) M/R task, and >> >> > >> >> > > it will >> >> > >> >> > > need to execute it as always. >> >> > >> >> > > <sanne> the IndexManager is parallel ;) >> >> > >> >> > > <emmanuel> sanne: parallel on a single node >> >> > >> >> > > <sanne> yes, but no contentions points other than the internal >> >> > >> >> > > structure of the IW >> >> > >> >> > > <emmanuel> I mean updating the index for a given table is >> >> > >> >> > > better done >> >> > >> >> > > on a singlle node >> >> > >> >> > > <sanne> IndexWriter >> >> > >> >> > > <emmanuel> sorry I meant IndexWriter >> >> > >> >> > > <emmanuel> ah but ou mention perfect sharding >> >> > >> >> > > <emmanuel> you need cosmological alignment for this shit to >> >> > >> >> > > happen >> >> > >> >> > > <sanne> not if we plan for it :) >> >> > >> >> > > <sanne> you might remember the changes to Segments in the >> >> > >> >> > > ISPN code, >> >> > >> >> > > to accomodate index storage consistent with the data locality >> >> > >> >> > > <sanne> that's expected in 6.0 >> >> > >> >> > > <emmanuel> So gridDialect.getData(Consumer consumer, String.. >> >> > >> >> > > tables) is wrong >> >> > >> >> > > <emmanuel> it's more gridDialect.getData(ConsumerImpl.class, >> >> > >> >> > > String... tables) >> >> > >> >> > > <emmanuel> as you ened to send the Comsumer impl >> >> > >> >> > > <emmanuel> not simply use it >> >> > >> >> > > <sanne> hu, it needs a reference to the current SearchFactory >> >> > >> >> > > at very least >> >> > >> >> > > <emmanuel> sanne: but you're telling me you send the M/R task >> >> > >> >> > > <emmanuel> so you need to send the M/R code as well >> >> > >> >> > > <sanne> yes but here we enter Infinspan specific >> >> > >> >> > > implementation >> >> > >> >> > > <sanne> I would register the needed components in Infinispan >> >> > >> >> > > and use >> >> > >> >> > > the ServiceRegistry to look them up remotely >> >> > >> >> > > <sanne> not to mention Infinispan could accomodate a custom >> >> > >> >> > > command for it >> >> > >> >> > > <emmanuel> What I am saying is that you don't pass the >> >> > >> >> > > Consumer >> >> > >> >> > > *instance* tot he grid dialect but rather the impl, no? >> >> > >> >> > > <sanne> the impl class definition? >> >> > >> >> > > <emmanuel> sanne: you tell me. How do I send M/R code today? >> >> > >> >> > > <emmanuel> certainly not an impl instance >> >> > >> >> > > <sanne> yes you do >> >> > >> >> > > <sanne> JBMar will take care of it, including state. >> >> > >> >> > > <sanne> but in this case that would be wrong of course as I >> >> > >> >> > > don't want >> >> > >> >> > > to serialize the whole SearchFactory so I'd use injection and >> >> > >> >> > > lookup, >> >> > >> >> > > but that's a detail of Infinispan. >> >> > >> >> > > <sanne> But this shouldn't be MassIndexer specific right? >> >> > >> >> > > it's good to >> >> > >> >> > > expose a general "execute on all" method, and I think >> >> > >> >> > > accepting >> >> > >> >> > > instances would make life easier for most - even though we >> >> > >> >> > > might need >> >> > >> >> > > to document some limitations. >> >> > >> >> > > <emmanuel> alright, I guess 'll have to live with a visitor >> >> > >> >> > > pattern >> >> > >> >> > > for a feature that has 5% chance of happening :) >> >> > >> >> > > <sanne> I'm going to punch Davide >> >> > >> >> > > <sanne> as he's yelling "it's not a visitor" but doesn't have >> >> > >> >> > > the guts >> >> > >> >> > > to write it down :) >> >> > >> >> > > <emmanuel> sanne: DavideD 's would have nothing to do about >> >> > >> >> > > it, that's >> >> > >> >> > > requires a lot of config and Infinispan machinery I'm not >> >> > >> >> > > sure is here >> >> > >> >> > > today >> >> > >> >> > > <DavideD> :) >> >> > >> >> > > <emmanuel> ah >> >> > >> >> > > <emmanuel> I don't care how it's called, it's one of those >> >> > >> >> > > patterns >> >> > >> >> > > that make the code harder to follow >> >> > >> >> > > <DavideD> I was actually trying to remember the name of the >> >> > >> >> > > pattern >> >> > >> >> > > <sanne> ok now we agree :) >> >> > >> >> > > <emmanuel> Obfuscator pattern family >> >> > >> >> > > <sanne> very popular among consultants, I don't understand >> >> > >> >> > > why you complain :P >> >> > >> >> > > <sanne> Anyway, let's wrap up and broaden the horizon: >> >> > >> >> > > <emmanuel> ok so we are left with findin to to load a entity >> >> > >> >> > > from a tuple >> >> > >> >> > > <sanne> you don't think it's useful as a general purpose >> >> > >> >> > > method? >> >> > >> >> > > <emmanuel> sanne: wil be for queries >> >> > >> >> > > <emmanuel> It's just that it's non obvious >> >> > >> >> > > <sanne> Exactly. Also I think lambda methods are getting >> >> > >> >> > > widely better known. >> >> > >> >> > > <emmanuel> syntactically yes >> >> > >> >> > > <emmanuel> VM wise, perf improvements will come later >> >> > >> >> > > <sanne> what I mean is that by defining the SPI this way, I >> >> > >> >> > > don't >> >> > >> >> > > expect it to be more complex for the GridDialect >> >> > >> >> > > implementors, while >> >> > >> >> > > we can reuse it for a wider scope of needs. >> >> > >> >> > > >> >> > >> >> > > --Sanne >> >> > >> >> > > >> >> > >> >> > > On 4 March 2013 17:02, Emmanuel Bernard >> >> > >> >> > > <emman...@hibernate.org> wrote: >> >> > >> >> > >> >> >> > >> >> > >> >> >> > >> >> > >> On 4 mars 2013, at 17:39, Sanne Grinovero >> >> > >> >> > >> <sa...@hibernate.org> wrote: >> >> > >> >> > >> >> >> > >> >> > >>> On 4 March 2013 16:20, Emmanuel Bernard >> >> > >> >> > >>> <emman...@hibernate.org> wrote: >> >> > >> >> > >>>> I already gave what I knew on how to load an entity from a >> >> > >> >> > >>>> tuple (which >> >> > >> >> > >>>> isn't much) but we can try and dig together. Something I >> >> > >> >> > >>>> thought about >> >> > >> >> > >>>> is that ORM probably has a mechanism to load an entity >> >> > >> >> > >>>> from a resultset >> >> > >> >> > >>>> via the query parser. And that probably looks also like >> >> > >> >> > >>>> the second half >> >> > >> >> > >>>> of OgmLoader.load. We could look at this part and see if >> >> > >> >> > >>>> we can make an >> >> > >> >> > >>>> OGM version of it. We never had the need before as we >> >> > >> >> > >>>> never had query >> >> > >> >> > >>>> support (the way SQL does it). >> >> > >> >> > >>> >> >> > >> >> > >>> I would also need to study the ORM code, but to add a high >> >> > >> >> > >>> level observation, >> >> > >> >> > >>> the methods currently defined by the GridDialect are >> >> > >> >> > >>> focusing on >> >> > >> >> > >>> loading from well known key instances, >> >> > >> >> > >>> there is nothing to makes us able to scan/inspect for all >> >> > >> >> > >>> values. >> >> > >> >> > >>> >> >> > >> >> > >>> In other words: even if we wanted to load keys first, we >> >> > >> >> > >>> don't have definitions >> >> > >> >> > >>> of functions from raw->primary key instances either. >> >> > >> >> > >> >> >> > >> >> > >> I understand that. I'm not denying the need for the method. >> >> > >> >> > >> >> >> > >> >> > >>> >> >> > >> >> > >>> >> >> > >> >> > >>>> On the visitor vs Iterator approach, I still don't see how >> >> > >> >> > >>>> implementing >> >> > >> >> > >>>> an Iterator on a map / reduce backend would be harder than >> >> > >> >> > >>>> the visitor >> >> > >> >> > >>>> but maybe I'm missing something. >> >> > >> >> > >>>> >> >> > >> >> > >>>> class IteratorAsStream { >> >> > >> >> > >>>> final Query someMapReduceQuery = ...; >> >> > >> >> > >>>> >> >> > >> >> > >>>> public Object next() { >> >> > >> >> > >>>> if (!someMapReduceQuery.started()) { >> >> > >> >> > >>>> // execute and collect results in parallel >> >> > >> >> > >>>> someMapReduceQuery.execute(); >> >> > >> >> > >>>> } >> >> > >> >> > >>>> Object result = someMapReduce.getNextOrBlock(); >> >> > >> >> > >>>> return result; >> >> > >> >> > >>>> } >> >> > >> >> > >>>> } >> >> > >> >> > >>> >> >> > >> >> > >>> That could work to *load* all entities in parallel, but I'd >> >> > >> >> > >>> like to >> >> > >> >> > >>> process the entities in parallel as well. >> >> > >> >> > >>> And I'd rather not force the GridDialect implementors to >> >> > >> >> > >>> write some >> >> > >> >> > >>> Hibernate Search specific code, >> >> > >> >> > >>> so to break out we need some form of "Execute X on each": a >> >> > >> >> > >>> closure or a lambda. >> >> > >> >> > >>> >> >> > >> >> > >> >> >> > >> >> > >> I can't see how the visitor model helps in your processing >> >> > >> >> > >> of entities in parallel. To me both approaches are strictly >> >> > >> >> > >> equivalent. Care to show some pseudo-code? >> >> > >> >> _______________________________________________ >> >> > >> >> hibernate-dev mailing list >> >> > >> >> hibernate-dev@lists.jboss.org >> >> > >> >> https://lists.jboss.org/mailman/listinfo/hibernate-dev >> >> > > _______________________________________________ >> >> > > hibernate-dev mailing list >> >> > > hibernate-dev@lists.jboss.org >> >> > > https://lists.jboss.org/mailman/listinfo/hibernate-dev >> >> _______________________________________________ >> >> hibernate-dev mailing list >> >> hibernate-dev@lists.jboss.org >> >> https://lists.jboss.org/mailman/listinfo/hibernate-dev _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev