Re: [hibernate-dev] Hibernate Developer IRC meeting - 5/03

2012-05-04 Thread Emmanuel Bernard
Performance I don't know, you are probably right. But memory wise, that could 
be way different.
Even ignoring the overhead of the object + pointer in memory, the alignment of 
boolean or other small objects would make a significant impact.

Of course if we are talking about 20 values, we should not bother. But 
persisters and the like store more than 20 values and we have more than one 
persister / loader. It might be inconsequential in the end but that might be 
worth testing.

On a related note it's up for debate whether or not putting data in a hash map 
for faster lookup later is worth it in all cases:

- it takes much more space than raw arrays
- array scan might be as fast or faster for a small enough array. As we have 
seen in Infinispan and OGM, computing a hash is not a cheap operation.

Again this require testing but I am guilty as charge of using collections in 
AnnotationBinder when doing some computations that would be better off written 
as an array + array scan.


On 3 mai 2012, at 19:32, Steve Ebersole wrote:

> I seriously doubt the performance cost of 20 'parallel arrays' versus 1 array 
> of Objects holding those 20 values is anything but negligible at best.
> 
> 
> On Thu 03 May 2012 11:04:30 AM CDT, Emmanuel Bernard wrote:
>> Sorry I could not assist the meeting live but reviewed the logs and had some 
>> remarks, so ehre are the logs and my remarks :)
>> 
>> ## Meeting logs
>> 
>> http://transcripts.jboss.org/meeting/irc.freenode.org/hibernate-dev/2012/hibernate-dev.2012-05-03-14.01.html
>> Minutes 
>> (text):http://transcripts.jboss.org/meeting/irc.freenode.org/hibernate-dev/2012/hibernate-dev.2012-05-03-14.01.txt
>> Log:
>> http://transcripts.jboss.org/meeting/irc.freenode.org/hibernate-dev/2012/hibernate-dev.2012-05-03-14.01.log.html
>> 
>> ## About parallel arrays vs arrays of objects
>> 
>> Even today, there is a cost in using Objects or HashMap as data 
>> placeholders. This might not be relevant for us especially since we are not 
>> using them for "live" data but since memory footprint has always been a 
>> concern for Hibernate, it's worth mentioning.
>> 
>> This presentation is quite interesting and show the extra memory cost of 
>> such structure compared to arrays 
>> http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-tutorial.pdf
>> 
>> ## PessimisticLockException versus LockAcquisitionException
>> 
>> I am not certain of that and Git history got cut short with the migration of 
>> entitymanager into core. But I do have recollections of some lock exceptions 
>> that were not catch up and I had to use PessimisticLockException. It could 
>> have been a change triggered by the TCK. Again I might be wrong, it was a 
>> long time ago. Plus ORM's core has evolved since.
>> 
>> Emmanuel
>> ___
>> hibernate-dev mailing list
>> hibernate-dev@lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
> 
> --
> st...@hibernate.org
> http://hibernate.org


___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] Hibernate Developer IRC meeting - 5/03

2012-05-04 Thread Sanne Grinovero
tricky subject.
I'm confident that there are many cases in which we should have used
arrays rather than maps, especially for temporary objects which aren't
short lived enough (an HashMap living in the scope of a single method
is going to be cheap). We should have either objects allocated for
very long (like forever in the scope of the SessionFactory), or very
short.

In the case of how we keep metadata, I think performance would be
dominated not that much by the fact it's a slightly bigger object but
by prefetching and what is going to be available in the cache lines
you just have filled in: obviously cache is way faster than memory so
being clever in the sequence you lay out your data structure could
speed you up by a couple of orders of magnitude.

Using primitives and array matrixes makes the data smaller, hence more
likely to fit in the cache; but if using an array of objects in which
each object collects the needed fields in one group, that's likely
going to be faster.. but I'm making assumptions on how this structure
is going to be read more frequently.

For example when declaring a matrix as an [ ][ ], performance will be
very different depending if you read by columns or rows - forgot which
one is better now - but in that case if the common use case is using
the slower path it's usually a good idea to invert the matrix.

I'd love it if we could enter this space, or even if it's not suited
for it, at least be considered "lite":
http://stackoverflow.com/questions/10374735/lucene-and-ormlite-on-android

Sanne

On 4 May 2012 10:07, Emmanuel Bernard  wrote:
> Performance I don't know, you are probably right. But memory wise, that could 
> be way different.
> Even ignoring the overhead of the object + pointer in memory, the alignment 
> of boolean or other small objects would make a significant impact.
>
> Of course if we are talking about 20 values, we should not bother. But 
> persisters and the like store more than 20 values and we have more than one 
> persister / loader. It might be inconsequential in the end but that might be 
> worth testing.
>
> On a related note it's up for debate whether or not putting data in a hash 
> map for faster lookup later is worth it in all cases:
>
> - it takes much more space than raw arrays
> - array scan might be as fast or faster for a small enough array. As we have 
> seen in Infinispan and OGM, computing a hash is not a cheap operation.
>
> Again this require testing but I am guilty as charge of using collections in 
> AnnotationBinder when doing some computations that would be better off 
> written as an array + array scan.
>
>
> On 3 mai 2012, at 19:32, Steve Ebersole wrote:
>
>> I seriously doubt the performance cost of 20 'parallel arrays' versus 1 
>> array of Objects holding those 20 values is anything but negligible at best.
>>
>>
>> On Thu 03 May 2012 11:04:30 AM CDT, Emmanuel Bernard wrote:
>>> Sorry I could not assist the meeting live but reviewed the logs and had 
>>> some remarks, so ehre are the logs and my remarks :)
>>>
>>> ## Meeting logs
>>>
>>> http://transcripts.jboss.org/meeting/irc.freenode.org/hibernate-dev/2012/hibernate-dev.2012-05-03-14.01.html
>>> Minutes 
>>> (text):http://transcripts.jboss.org/meeting/irc.freenode.org/hibernate-dev/2012/hibernate-dev.2012-05-03-14.01.txt
>>> Log:            
>>> http://transcripts.jboss.org/meeting/irc.freenode.org/hibernate-dev/2012/hibernate-dev.2012-05-03-14.01.log.html
>>>
>>> ## About parallel arrays vs arrays of objects
>>>
>>> Even today, there is a cost in using Objects or HashMap as data 
>>> placeholders. This might not be relevant for us especially since we are not 
>>> using them for "live" data but since memory footprint has always been a 
>>> concern for Hibernate, it's worth mentioning.
>>>
>>> This presentation is quite interesting and show the extra memory cost of 
>>> such structure compared to arrays 
>>> http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-tutorial.pdf
>>>
>>> ## PessimisticLockException versus LockAcquisitionException
>>>
>>> I am not certain of that and Git history got cut short with the migration 
>>> of entitymanager into core. But I do have recollections of some lock 
>>> exceptions that were not catch up and I had to use 
>>> PessimisticLockException. It could have been a change triggered by the TCK. 
>>> Again I might be wrong, it was a long time ago. Plus ORM's core has evolved 
>>> since.
>>>
>>> Emmanuel
>>> ___
>>> hibernate-dev mailing list
>>> hibernate-dev@lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>
>> --
>> st...@hibernate.org
>> http://hibernate.org
>
>
> ___
> hibernate-dev mailing list
> hibernate-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev

___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/m

Re: [hibernate-dev] Hibernate Developer IRC meeting - 5/03

2012-05-04 Thread Hardy Ferentschik
Even taking the risk of pouring oil onto the fire, I think a simpler data 
structure wins in most cases over
the parallel arrays. It is much harder to use the latter and easier to make 
mistakes which leads to more
bugs and higher maintenance costs. 

As Sanne is saying performance questions are tricky. So many thing are 
happening with the code our days
before they are getting executed on the bare metal that it is hard to know what 
performance impacts a certain 
change has. In the end you just have to measure. 

Personally I think we should primarily strive for a better and easier to use 
API. Oppertunities to optimizes arise 
then often naturally. 

And now my dear disciples let me close with:
 "The First Rule of Program Optimization: Don't do it. The Second Rule of 
Program Optimization (for experts only!): Don't do it yet." — Michael A. Jackson

:-)

--Hardy

On May 4, 2012, at 11:58 AM, Sanne Grinovero wrote:

> tricky subject.
> I'm confident that there are many cases in which we should have used
> arrays rather than maps, especially for temporary objects which aren't
> short lived enough (an HashMap living in the scope of a single method
> is going to be cheap). We should have either objects allocated for
> very long (like forever in the scope of the SessionFactory), or very
> short.
> 
> In the case of how we keep metadata, I think performance would be
> dominated not that much by the fact it's a slightly bigger object but
> by prefetching and what is going to be available in the cache lines
> you just have filled in: obviously cache is way faster than memory so
> being clever in the sequence you lay out your data structure could
> speed you up by a couple of orders of magnitude.
> 
> Using primitives and array matrixes makes the data smaller, hence more
> likely to fit in the cache; but if using an array of objects in which
> each object collects the needed fields in one group, that's likely
> going to be faster.. but I'm making assumptions on how this structure
> is going to be read more frequently.
> 
> For example when declaring a matrix as an [ ][ ], performance will be
> very different depending if you read by columns or rows - forgot which
> one is better now - but in that case if the common use case is using
> the slower path it's usually a good idea to invert the matrix.
> 
> I'd love it if we could enter this space, or even if it's not suited
> for it, at least be considered "lite":
> http://stackoverflow.com/questions/10374735/lucene-and-ormlite-on-android
> 
> Sanne
> 
> On 4 May 2012 10:07, Emmanuel Bernard  wrote:
>> Performance I don't know, you are probably right. But memory wise, that 
>> could be way different.
>> Even ignoring the overhead of the object + pointer in memory, the alignment 
>> of boolean or other small objects would make a significant impact.
>> 
>> Of course if we are talking about 20 values, we should not bother. But 
>> persisters and the like store more than 20 values and we have more than one 
>> persister / loader. It might be inconsequential in the end but that might be 
>> worth testing.
>> 
>> On a related note it's up for debate whether or not putting data in a hash 
>> map for faster lookup later is worth it in all cases:
>> 
>> - it takes much more space than raw arrays
>> - array scan might be as fast or faster for a small enough array. As we have 
>> seen in Infinispan and OGM, computing a hash is not a cheap operation.
>> 
>> Again this require testing but I am guilty as charge of using collections in 
>> AnnotationBinder when doing some computations that would be better off 
>> written as an array + array scan.
>> 
>> 
>> On 3 mai 2012, at 19:32, Steve Ebersole wrote:
>> 
>>> I seriously doubt the performance cost of 20 'parallel arrays' versus 1 
>>> array of Objects holding those 20 values is anything but negligible at best.


___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] Jenkins Jobs

2012-05-04 Thread Steve Ebersole
Touche ;)

Ok I'll just chaulk it up to that then.

Overall do you think that is an issue with Jenkins? Or with the JBoss setup?
On May 3, 2012 11:15 PM, "Strong Liu"  wrote:

>
> On May 4, 2012, at 3:45 AM, Steve Ebersole wrote:
>
> But I killed it and #243 started *immediately*
>
>
> well, you already know, it sucks
>
>
> On Thu 03 May 2012 02:39:13 PM CDT, Strong Liu wrote:
>
> I think it was just waiting for next available executors, basically
>
> there were too many jobs in queue waiting to be run
>
>
> On May 4, 2012, at 3:27 AM, Steve Ebersole wrote:
>
>
> Strong, I had to kill hibernate-core-master-matrix job #242. It had
>
> been hanging for the last 8+ hours without every even having started any
>
> of the individual matrix builds.
>
>
> Is there any way to tell why it has hanging?
>
>
> --
>
> st...@hibernate.org 
>
> http://hibernate.org
>
> ___
>
> hibernate-dev mailing list
>
> hibernate-dev@lists.jboss.org
>
> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>
>
> -
>
> Best Regards,
>
>
> Strong Liu http://hibernate.org/>>
>
> http://about.me/stliu/bio
>
>
>
> --
> st...@hibernate.org
> http://hibernate.org
>
>
> -
> Best Regards,
>
> Strong Liu 
> http://about.me/stliu/bio
>
>
___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] Hibernate Developer IRC meeting - 5/03

2012-05-04 Thread Emmanuel Bernard
Yes, but we can't blindly go for the nicer approach: we are a library and hence 
used by many. 
Look at the bottleneck we found on Hibernate ORM due to the change of magnitude 
Hibernate OGM involved. Look at the perf issues we found in OGM itself just 
because I used the builder pattern for some objects using in critical paths. 
Same for hashCode that was not cached in some critical objects.

I'm all fine with the nice approach if it's followed by a perf test before 
being pushed :)

On 4 mai 2012, at 12:39, Hardy Ferentschik wrote:

> Even taking the risk of pouring oil onto the fire, I think a simpler data 
> structure wins in most cases over
> the parallel arrays. It is much harder to use the latter and easier to make 
> mistakes which leads to more
> bugs and higher maintenance costs. 
> 
> As Sanne is saying performance questions are tricky. So many thing are 
> happening with the code our days
> before they are getting executed on the bare metal that it is hard to know 
> what performance impacts a certain 
> change has. In the end you just have to measure. 
> 
> Personally I think we should primarily strive for a better and easier to use 
> API. Oppertunities to optimizes arise 
> then often naturally. 
> 
> And now my dear disciples let me close with:
> "The First Rule of Program Optimization: Don't do it. The Second Rule of 
> Program Optimization (for experts only!): Don't do it yet." — Michael A. 
> Jackson
> 
> :-)
> 
> --Hardy
> 
> On May 4, 2012, at 11:58 AM, Sanne Grinovero wrote:
> 
>> tricky subject.
>> I'm confident that there are many cases in which we should have used
>> arrays rather than maps, especially for temporary objects which aren't
>> short lived enough (an HashMap living in the scope of a single method
>> is going to be cheap). We should have either objects allocated for
>> very long (like forever in the scope of the SessionFactory), or very
>> short.
>> 
>> In the case of how we keep metadata, I think performance would be
>> dominated not that much by the fact it's a slightly bigger object but
>> by prefetching and what is going to be available in the cache lines
>> you just have filled in: obviously cache is way faster than memory so
>> being clever in the sequence you lay out your data structure could
>> speed you up by a couple of orders of magnitude.
>> 
>> Using primitives and array matrixes makes the data smaller, hence more
>> likely to fit in the cache; but if using an array of objects in which
>> each object collects the needed fields in one group, that's likely
>> going to be faster.. but I'm making assumptions on how this structure
>> is going to be read more frequently.
>> 
>> For example when declaring a matrix as an [ ][ ], performance will be
>> very different depending if you read by columns or rows - forgot which
>> one is better now - but in that case if the common use case is using
>> the slower path it's usually a good idea to invert the matrix.
>> 
>> I'd love it if we could enter this space, or even if it's not suited
>> for it, at least be considered "lite":
>> http://stackoverflow.com/questions/10374735/lucene-and-ormlite-on-android
>> 
>> Sanne
>> 
>> On 4 May 2012 10:07, Emmanuel Bernard  wrote:
>>> Performance I don't know, you are probably right. But memory wise, that 
>>> could be way different.
>>> Even ignoring the overhead of the object + pointer in memory, the alignment 
>>> of boolean or other small objects would make a significant impact.
>>> 
>>> Of course if we are talking about 20 values, we should not bother. But 
>>> persisters and the like store more than 20 values and we have more than one 
>>> persister / loader. It might be inconsequential in the end but that might 
>>> be worth testing.
>>> 
>>> On a related note it's up for debate whether or not putting data in a hash 
>>> map for faster lookup later is worth it in all cases:
>>> 
>>> - it takes much more space than raw arrays
>>> - array scan might be as fast or faster for a small enough array. As we 
>>> have seen in Infinispan and OGM, computing a hash is not a cheap operation.
>>> 
>>> Again this require testing but I am guilty as charge of using collections 
>>> in AnnotationBinder when doing some computations that would be better off 
>>> written as an array + array scan.
>>> 
>>> 
>>> On 3 mai 2012, at 19:32, Steve Ebersole wrote:
>>> 
 I seriously doubt the performance cost of 20 'parallel arrays' versus 1 
 array of Objects holding those 20 values is anything but negligible at 
 best.
> 
> 
> ___
> hibernate-dev mailing list
> hibernate-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev


___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] Hibernate Developer IRC meeting - 5/03

2012-05-04 Thread Sanne Grinovero
+1!
Agreed to be the a good library in the area it's critical to be user
friendly: good APIs, good docs and good quality, but to be the best
out there it's not enough: users demand top efficiency and are
extremely annoyed when they find a performance issue in our code, to
the point they will advertise problems loudly and discourage the use
of any ORM.

It shouldn't be hard to defend that using Hibernate won't slow you
down significantly.. still currently when this subject comes up in
public bar meetings or talks it's like opening pandora's vase, plus
OGM would be nonsense if we weren't sure we can improve on this; it's
not bad at all currently, but since you're having fun rewriting a
critical area it makes sense to design things from the ground up
keeping this "new age" requirements in consideration.

I don't think striving for top design and excellent performance are
necessarily conflicting goals.. make something to be very proud off!
;-)

Sanne

On 4 May 2012 12:41, Emmanuel Bernard  wrote:
> Yes, but we can't blindly go for the nicer approach: we are a library and 
> hence used by many.
> Look at the bottleneck we found on Hibernate ORM due to the change of 
> magnitude Hibernate OGM involved. Look at the perf issues we found in OGM 
> itself just because I used the builder pattern for some objects using in 
> critical paths. Same for hashCode that was not cached in some critical 
> objects.
>
> I'm all fine with the nice approach if it's followed by a perf test before 
> being pushed :)
>
> On 4 mai 2012, at 12:39, Hardy Ferentschik wrote:
>
>> Even taking the risk of pouring oil onto the fire, I think a simpler data 
>> structure wins in most cases over
>> the parallel arrays. It is much harder to use the latter and easier to make 
>> mistakes which leads to more
>> bugs and higher maintenance costs.
>>
>> As Sanne is saying performance questions are tricky. So many thing are 
>> happening with the code our days
>> before they are getting executed on the bare metal that it is hard to know 
>> what performance impacts a certain
>> change has. In the end you just have to measure.
>>
>> Personally I think we should primarily strive for a better and easier to use 
>> API. Oppertunities to optimizes arise
>> then often naturally.
>>
>> And now my dear disciples let me close with:
>> "The First Rule of Program Optimization: Don't do it. The Second Rule of 
>> Program Optimization (for experts only!): Don't do it yet." — Michael A. 
>> Jackson
>>
>> :-)
>>
>> --Hardy
>>
>> On May 4, 2012, at 11:58 AM, Sanne Grinovero wrote:
>>
>>> tricky subject.
>>> I'm confident that there are many cases in which we should have used
>>> arrays rather than maps, especially for temporary objects which aren't
>>> short lived enough (an HashMap living in the scope of a single method
>>> is going to be cheap). We should have either objects allocated for
>>> very long (like forever in the scope of the SessionFactory), or very
>>> short.
>>>
>>> In the case of how we keep metadata, I think performance would be
>>> dominated not that much by the fact it's a slightly bigger object but
>>> by prefetching and what is going to be available in the cache lines
>>> you just have filled in: obviously cache is way faster than memory so
>>> being clever in the sequence you lay out your data structure could
>>> speed you up by a couple of orders of magnitude.
>>>
>>> Using primitives and array matrixes makes the data smaller, hence more
>>> likely to fit in the cache; but if using an array of objects in which
>>> each object collects the needed fields in one group, that's likely
>>> going to be faster.. but I'm making assumptions on how this structure
>>> is going to be read more frequently.
>>>
>>> For example when declaring a matrix as an [ ][ ], performance will be
>>> very different depending if you read by columns or rows - forgot which
>>> one is better now - but in that case if the common use case is using
>>> the slower path it's usually a good idea to invert the matrix.
>>>
>>> I'd love it if we could enter this space, or even if it's not suited
>>> for it, at least be considered "lite":
>>> http://stackoverflow.com/questions/10374735/lucene-and-ormlite-on-android
>>>
>>> Sanne
>>>
>>> On 4 May 2012 10:07, Emmanuel Bernard  wrote:
 Performance I don't know, you are probably right. But memory wise, that 
 could be way different.
 Even ignoring the overhead of the object + pointer in memory, the 
 alignment of boolean or other small objects would make a significant 
 impact.

 Of course if we are talking about 20 values, we should not bother. But 
 persisters and the like store more than 20 values and we have more than 
 one persister / loader. It might be inconsequential in the end but that 
 might be worth testing.

 On a related note it's up for debate whether or not putting data in a hash 
 map for faster lookup later is worth it in all cases:

 - it takes much more space

Re: [hibernate-dev] Hibernate Developer IRC meeting - 5/03

2012-05-04 Thread Hardy Ferentschik

On May 4, 2012, at 1:51 PM, Steve Ebersole wrote:

> As for perf issues OGM uncovered in ORM... First, none of those involved 
> anything close to the discussion here; they dealt mostly with small things 
> (like hashCode caching) that became big because of size of scale.  Second, 
> thats actually exactly how you should discover and treat perf issues. 

Right,  especially the second sentence.


___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] Hibernate Developer IRC meeting - 5/03

2012-05-04 Thread Steve Ebersole
Apparently this did not go through to the list the first time, sorry...

Completely agree.

I focused on perf in my last comment but I dont think memory is all that 
much different.  The declaration of all that state already has to be 
accounted for in its current "flattened" parallel-array representation. 
  The trades off here are:
1) X number of array declarations versus 1
2) overhead of the class definition; again its actual state field memory 
footprint is already accounted for so we really are just talking about 
small amount of memory here.

Certainly I think its a great idea to try to actually calculate and 
compare the memory diffs here. I am pretty confident the difference is 
negligible.  But either way Hardy's point about higher likelihood of 
bugs is the biggest concern.  In my experience lack of cohesive 
encapsulation is just a recipe for situations where hard to find 
problems creep into the code.


On 05/04/2012 05:39 AM, Hardy Ferentschik wrote:
> Even taking the risk of pouring oil onto the fire, I think a simpler data 
> structure wins in most cases over
> the parallel arrays. It is much harder to use the latter and easier to make 
> mistakes which leads to more
> bugs and higher maintenance costs.
>
> As Sanne is saying performance questions are tricky. So many thing are 
> happening with the code our days
> before they are getting executed on the bare metal that it is hard to know 
> what performance impacts a certain
> change has. In the end you just have to measure.
>
> Personally I think we should primarily strive for a better and easier to use 
> API. Oppertunities to optimizes arise
> then often naturally.
>
> And now my dear disciples let me close with:
>   "The First Rule of Program Optimization: Don't do it. The Second Rule of 
> Program Optimization (for experts only!): Don't do it yet." — Michael A. 
> Jackson
>
> :-)
>
> --Hardy
>
> On May 4, 2012, at 11:58 AM, Sanne Grinovero wrote:
>
>> tricky subject.
>> I'm confident that there are many cases in which we should have used
>> arrays rather than maps, especially for temporary objects which aren't
>> short lived enough (an HashMap living in the scope of a single method
>> is going to be cheap). We should have either objects allocated for
>> very long (like forever in the scope of the SessionFactory), or very
>> short.
>>
>> In the case of how we keep metadata, I think performance would be
>> dominated not that much by the fact it's a slightly bigger object but
>> by prefetching and what is going to be available in the cache lines
>> you just have filled in: obviously cache is way faster than memory so
>> being clever in the sequence you lay out your data structure could
>> speed you up by a couple of orders of magnitude.
>>
>> Using primitives and array matrixes makes the data smaller, hence more
>> likely to fit in the cache; but if using an array of objects in which
>> each object collects the needed fields in one group, that's likely
>> going to be faster.. but I'm making assumptions on how this structure
>> is going to be read more frequently.
>>
>> For example when declaring a matrix as an [ ][ ], performance will be
>> very different depending if you read by columns or rows - forgot which
>> one is better now - but in that case if the common use case is using
>> the slower path it's usually a good idea to invert the matrix.
>>
>> I'd love it if we could enter this space, or even if it's not suited
>> for it, at least be considered "lite":
>> http://stackoverflow.com/questions/10374735/lucene-and-ormlite-on-android
>>
>> Sanne
>>
>> On 4 May 2012 10:07, Emmanuel Bernard  wrote:
>>> Performance I don't know, you are probably right. But memory wise, that 
>>> could be way different.
>>> Even ignoring the overhead of the object + pointer in memory, the alignment 
>>> of boolean or other small objects would make a significant impact.
>>>
>>> Of course if we are talking about 20 values, we should not bother. But 
>>> persisters and the like store more than 20 values and we have more than one 
>>> persister / loader. It might be inconsequential in the end but that might 
>>> be worth testing.
>>>
>>> On a related note it's up for debate whether or not putting data in a hash 
>>> map for faster lookup later is worth it in all cases:
>>>
>>> - it takes much more space than raw arrays
>>> - array scan might be as fast or faster for a small enough array. As we 
>>> have seen in Infinispan and OGM, computing a hash is not a cheap operation.
>>>
>>> Again this require testing but I am guilty as charge of using collections 
>>> in AnnotationBinder when doing some computations that would be better off 
>>> written as an array + array scan.
>>>
>>>
>>> On 3 mai 2012, at 19:32, Steve Ebersole wrote:
>>>
 I seriously doubt the performance cost of 20 'parallel arrays' versus 1 
 array of Objects holding those 20 values is anything but negligible at 
 best.
>
>
> ___
> hibernate-dev

Re: [hibernate-dev] Hibernate Developer IRC meeting - 5/03

2012-05-04 Thread Steve Ebersole
Not this one :(

This library argument is just not relevant to how persisters store 
internal state, which is really all I am talking about.  And in terms of 
the SPI, if we opt to expose this change, is happening in a major rev; 
so again not relevant.

As for perf issues OGM uncovered in ORM... First, none of those involved 
anything close to the discussion here; they dealt mostly with small 
things (like hashCode caching) that became big because of size of scale. 
  Second, thats actually exactly how you should discover and treat perf 
issues.


On 05/04/2012 06:41 AM, Emmanuel Bernard wrote:
> Yes, but we can't blindly go for the nicer approach: we are a library and 
> hence used by many.
> Look at the bottleneck we found on Hibernate ORM due to the change of 
> magnitude Hibernate OGM involved. Look at the perf issues we found in OGM 
> itself just because I used the builder pattern for some objects using in 
> critical paths. Same for hashCode that was not cached in some critical 
> objects.
>
> I'm all fine with the nice approach if it's followed by a perf test before 
> being pushed :)
>
> On 4 mai 2012, at 12:39, Hardy Ferentschik wrote:
>
>> Even taking the risk of pouring oil onto the fire, I think a simpler data 
>> structure wins in most cases over
>> the parallel arrays. It is much harder to use the latter and easier to make 
>> mistakes which leads to more
>> bugs and higher maintenance costs.
>>
>> As Sanne is saying performance questions are tricky. So many thing are 
>> happening with the code our days
>> before they are getting executed on the bare metal that it is hard to know 
>> what performance impacts a certain
>> change has. In the end you just have to measure.
>>
>> Personally I think we should primarily strive for a better and easier to use 
>> API. Oppertunities to optimizes arise
>> then often naturally.
>>
>> And now my dear disciples let me close with:
>> "The First Rule of Program Optimization: Don't do it. The Second Rule of 
>> Program Optimization (for experts only!): Don't do it yet." — Michael A. 
>> Jackson
>>
>> :-)
>>
>> --Hardy
>>
>> On May 4, 2012, at 11:58 AM, Sanne Grinovero wrote:
>>
>>> tricky subject.
>>> I'm confident that there are many cases in which we should have used
>>> arrays rather than maps, especially for temporary objects which aren't
>>> short lived enough (an HashMap living in the scope of a single method
>>> is going to be cheap). We should have either objects allocated for
>>> very long (like forever in the scope of the SessionFactory), or very
>>> short.
>>>
>>> In the case of how we keep metadata, I think performance would be
>>> dominated not that much by the fact it's a slightly bigger object but
>>> by prefetching and what is going to be available in the cache lines
>>> you just have filled in: obviously cache is way faster than memory so
>>> being clever in the sequence you lay out your data structure could
>>> speed you up by a couple of orders of magnitude.
>>>
>>> Using primitives and array matrixes makes the data smaller, hence more
>>> likely to fit in the cache; but if using an array of objects in which
>>> each object collects the needed fields in one group, that's likely
>>> going to be faster.. but I'm making assumptions on how this structure
>>> is going to be read more frequently.
>>>
>>> For example when declaring a matrix as an [ ][ ], performance will be
>>> very different depending if you read by columns or rows - forgot which
>>> one is better now - but in that case if the common use case is using
>>> the slower path it's usually a good idea to invert the matrix.
>>>
>>> I'd love it if we could enter this space, or even if it's not suited
>>> for it, at least be considered "lite":
>>> http://stackoverflow.com/questions/10374735/lucene-and-ormlite-on-android
>>>
>>> Sanne
>>>
>>> On 4 May 2012 10:07, Emmanuel Bernard  wrote:
 Performance I don't know, you are probably right. But memory wise, that 
 could be way different.
 Even ignoring the overhead of the object + pointer in memory, the 
 alignment of boolean or other small objects would make a significant 
 impact.

 Of course if we are talking about 20 values, we should not bother. But 
 persisters and the like store more than 20 values and we have more than 
 one persister / loader. It might be inconsequential in the end but that 
 might be worth testing.

 On a related note it's up for debate whether or not putting data in a hash 
 map for faster lookup later is worth it in all cases:

 - it takes much more space than raw arrays
 - array scan might be as fast or faster for a small enough array. As we 
 have seen in Infinispan and OGM, computing a hash is not a cheap operation.

 Again this require testing but I am guilty as charge of using collections 
 in AnnotationBinder when doing some computations that would be better off 
 written as an array + array scan.


 On 3 mai 2

Re: [hibernate-dev] Hibernate Developer IRC meeting - 5/03

2012-05-04 Thread Eric Dalquist
Next time I grab a heap dump from one of our prod boxes I can poke 
around in the org.hibernate classes a bit and let you know what YourKit 
says if you guys want :)

-Eric

On 5/4/12 7:14 AM, Steve Ebersole wrote:
> Apparently this did not go through to the list the first time, sorry...
>
> Completely agree.
>
> I focused on perf in my last comment but I dont think memory is all that
> much different.  The declaration of all that state already has to be
> accounted for in its current "flattened" parallel-array representation.
>The trades off here are:
> 1) X number of array declarations versus 1
> 2) overhead of the class definition; again its actual state field memory
> footprint is already accounted for so we really are just talking about
> small amount of memory here.
>
> Certainly I think its a great idea to try to actually calculate and
> compare the memory diffs here. I am pretty confident the difference is
> negligible.  But either way Hardy's point about higher likelihood of
> bugs is the biggest concern.  In my experience lack of cohesive
> encapsulation is just a recipe for situations where hard to find
> problems creep into the code.
>
>
> On 05/04/2012 05:39 AM, Hardy Ferentschik wrote:
>> Even taking the risk of pouring oil onto the fire, I think a simpler data 
>> structure wins in most cases over
>> the parallel arrays. It is much harder to use the latter and easier to make 
>> mistakes which leads to more
>> bugs and higher maintenance costs.
>>
>> As Sanne is saying performance questions are tricky. So many thing are 
>> happening with the code our days
>> before they are getting executed on the bare metal that it is hard to know 
>> what performance impacts a certain
>> change has. In the end you just have to measure.
>>
>> Personally I think we should primarily strive for a better and easier to use 
>> API. Oppertunities to optimizes arise
>> then often naturally.
>>
>> And now my dear disciples let me close with:
>>"The First Rule of Program Optimization: Don't do it. The Second Rule of 
>> Program Optimization (for experts only!): Don't do it yet." — Michael A. 
>> Jackson
>>
>> :-)
>>
>> --Hardy
>>
>> On May 4, 2012, at 11:58 AM, Sanne Grinovero wrote:
>>
>>> tricky subject.
>>> I'm confident that there are many cases in which we should have used
>>> arrays rather than maps, especially for temporary objects which aren't
>>> short lived enough (an HashMap living in the scope of a single method
>>> is going to be cheap). We should have either objects allocated for
>>> very long (like forever in the scope of the SessionFactory), or very
>>> short.
>>>
>>> In the case of how we keep metadata, I think performance would be
>>> dominated not that much by the fact it's a slightly bigger object but
>>> by prefetching and what is going to be available in the cache lines
>>> you just have filled in: obviously cache is way faster than memory so
>>> being clever in the sequence you lay out your data structure could
>>> speed you up by a couple of orders of magnitude.
>>>
>>> Using primitives and array matrixes makes the data smaller, hence more
>>> likely to fit in the cache; but if using an array of objects in which
>>> each object collects the needed fields in one group, that's likely
>>> going to be faster.. but I'm making assumptions on how this structure
>>> is going to be read more frequently.
>>>
>>> For example when declaring a matrix as an [ ][ ], performance will be
>>> very different depending if you read by columns or rows - forgot which
>>> one is better now - but in that case if the common use case is using
>>> the slower path it's usually a good idea to invert the matrix.
>>>
>>> I'd love it if we could enter this space, or even if it's not suited
>>> for it, at least be considered "lite":
>>> http://stackoverflow.com/questions/10374735/lucene-and-ormlite-on-android
>>>
>>> Sanne
>>>
>>> On 4 May 2012 10:07, Emmanuel Bernard   wrote:
 Performance I don't know, you are probably right. But memory wise, that 
 could be way different.
 Even ignoring the overhead of the object + pointer in memory, the 
 alignment of boolean or other small objects would make a significant 
 impact.

 Of course if we are talking about 20 values, we should not bother. But 
 persisters and the like store more than 20 values and we have more than 
 one persister / loader. It might be inconsequential in the end but that 
 might be worth testing.

 On a related note it's up for debate whether or not putting data in a hash 
 map for faster lookup later is worth it in all cases:

 - it takes much more space than raw arrays
 - array scan might be as fast or faster for a small enough array. As we 
 have seen in Infinispan and OGM, computing a hash is not a cheap operation.

 Again this require testing but I am guilty as charge of using collections 
 in AnnotationBinder when doing some computations that would be better off 

Re: [hibernate-dev] Jenkins Jobs

2012-05-04 Thread Strong Liu

On May 4, 2012, at 7:33 PM, Steve Ebersole wrote:

> Touche ;)
> 
> Ok I'll just chaulk it up to that then.
> 
> Overall do you think that is an issue with Jenkins? Or with the JBoss setup?
> 
> 

not sure, but we may improve this.

the slaves used by Jenkins can be labeled, and all hibernate jobs are tight  to 
be only run on "hibernate" label slaves.
so, we can label more slaves, but I doubt if it would help a lot, since there 
are just too many jobs running on the Jenkins, and the hardware …. ,
 see the quoted mail from QA (I asked them this before)

{quote}
Also there is a current rush for executors and everyone is complaining as you 
see in forwarded mail below,so not sure if that too will improve the situation 
:-)
{quote}


> On May 3, 2012 11:15 PM, "Strong Liu"  wrote:
> 
> On May 4, 2012, at 3:45 AM, Steve Ebersole wrote:
> 
>> But I killed it and #243 started *immediately*
> 
> well, you already know, it sucks
> 
>> 
>> On Thu 03 May 2012 02:39:13 PM CDT, Strong Liu wrote:
>>> I think it was just waiting for next available executors, basically
>>> there were too many jobs in queue waiting to be run
>>> 
>>> On May 4, 2012, at 3:27 AM, Steve Ebersole wrote:
>>> 
 Strong, I had to kill hibernate-core-master-matrix job #242. It had
 been hanging for the last 8+ hours without every even having started any
 of the individual matrix builds.
 
 Is there any way to tell why it has hanging?
 
 --
 st...@hibernate.org 
 http://hibernate.org
 ___
 hibernate-dev mailing list
 hibernate-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>> 
>>> -
>>> Best Regards,
>>> 
>>> Strong Liu http://hibernate.org/>>
>>> http://about.me/stliu/bio
>>> 
>> 
>> --
>> st...@hibernate.org
>> http://hibernate.org
> 
> -
> Best Regards,
> 
> Strong Liu 
> http://about.me/stliu/bio
> 

-
Best Regards,

Strong Liu 
http://about.me/stliu/bio

___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] Where are the batched fetch statements generated?

2012-05-04 Thread Clemens Eisserer
Hi,

Because of the lack of feedback, I decided to start hacking blindly ;)

Generating the Loaders on demand broke for an unknown reason, so what
I do now is to pad the batch-load request with non-existant primary
keys up to the next larger batch-size and distribute the batch-sizes a
bit more uniform.
Works great so far - for my use-cases I almost always get only one
query - so I get almost the same amount of roundtrips as with
SUBSELECT fetching, only with a lot less complex queries.

If there is some interest, I would be happy to share the patches.
(although its quite hackish, not ready for general consumption)

Thanks, Clemens
___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev