Another thing to note if this is a long request to a database housed in an ESX 
vm. We had similar problems with long requests timing out between two systems, 
with one hosted by esx 4.x. Such long requests were caught by some low level 
interface muxing issue and my whole EOF stack was frozen when the underlying db 
connection was lost mid-transaction. I resolved it by moving this application 
off of a vm.



On May 31, 2010, at 5:33 PM, Pascal Robert <[email protected]> wrote:

> Ok, will try with ERXWOLongResponsePage since it look like it's locking and 
> unlocking all ECs in the thread.
> 
>> There's a bunch of stuff wrong here. First, the only actually locked thread 
>> is:
>> 
>> - 
>> com.webobjects.eocontrol.EOObjectStoreCoordinator.addCooperatingObjectStore(com.webobjects.eocontrol.EOCooperatingObjectStore)
>>  @bci=5, line=130 (Interpreted frame)
>> - 
>> com.webobjects.eoaccess.EODatabaseChannel.setCurrentEditingContext(com.webobjects.eocontrol.EOEditingContext)
>>  @bci=34, line=166 (Interpreted frame)
>> - 
>> com.webobjects.eoaccess.EODatabaseChannel._selectWithFetchSpecificationEditingContext(com.webobjects.eocontrol.EOFetchSpecification,
>>  com.webobjects.eocontrol.EOEditingContext) @bci=158, line=788 (Interpreted 
>> frame)
>> - 
>> com.webobjects.eoaccess.EODatabaseChannel.selectObjectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification,
>>  com.webobjects.eocontrol.EOEditingContext) @bci=64, line=215 (Interpreted 
>> frame)
>> - 
>> com.webobjects.eoaccess.EODatabaseContext._objectsWithFetchSpecificationEditingContext(com.webobjects.eocontrol.EOFetchSpecification,
>>  com.webobjects.eocontrol.EOEditingContext) @bci=219, line=3205 (Interpreted 
>> frame)
>> - 
>> com.webobjects.eoaccess.EODatabaseContext.objectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification,
>>  com.webobjects.eocontrol.EOEditingContext) @bci=34, line=3346 (Interpreted 
>> frame)
>> - 
>> com.webobjects.eocontrol.EOObjectStoreCoordinator.objectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification,
>>  com.webobjects.eocontrol.EOEditingContext) @bci=97, line=539 (Interpreted 
>> frame)
>> - 
>> com.webobjects.eocontrol.EOEditingContext.objectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification,
>>  com.webobjects.eocontrol.EOEditingContext) @bci=79, line=4114 (Interpreted 
>> frame)
>> - 
>> er.extensions.eof.ERXEC.objectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification,
>>  com.webobjects.eocontrol.EOEditingContext) @bci=72, line=1211 (Interpreted 
>> frame)
>> - 
>> com.webobjects.eocontrol.EOEditingContext.objectsWithFetchSpecification(com.webobjects.eocontrol.EOFetchSpecification)
>>  @bci=3, line=4500 (Interpreted frame)
>> - 
>> com.acaiq.fondation.acaiqCore._Licence.fetchLicences(com.webobjects.eocontrol.EOEditingContext,
>>  com.webobjects.eocontrol.EOQualifier, com.webobjects.foundation.NSArray) 
>> @bci=19, line=1062 (Interpreted frame)
>> - 
>> com.acaiq.fondation.acaiqCore._Membre.licences(com.webobjects.eocontrol.EOQualifier,
>>  com.webobjects.foundation.NSArray, boolean) @bci=77, line=8920 (Interpreted 
>> frame)
>> - 
>> com.acaiq.fondation.acaiqCore._Membre.licences(com.webobjects.eocontrol.EOQualifier,
>>  boolean) @bci=4, line=8893 (Interpreted frame)
>> - 
>> com.acaiq.fondation.acaiqCore.Membre.licencesParEtats(com.acaiq.fondation.acaiqCore.EtatMembre[])
>>  @bci=100, line=980 (Interpreted frame)
>> - com.acaiq.fondation.acaiqCore.Membre.licencesValides() @bci=11, line=996 
>> (Interpreted frame)
>> - com.acaiq.fondation.acaiqCore.Membre.estCourtier() @bci=5, line=1035 
>> (Interpreted frame)
>> - sun.reflect.GeneratedMethodAccessor87.invoke(java.lang.Object, 
>> java.lang.Object[]) @bci=40 (Interpreted frame)
>> 
>> Which reminds me of an unlocked EC/OSC. Second:
>> 
>>> java.lang.IllegalArgumentException: Attribute noCommandeOracle can't 
>>> receive a null parameter :
>>>     at 
>>> com.acaiq.fondation.depot.lbaArticle._CommandesEcom.setNoCommandeOracle(_CommandesEcom.java:419)
>> 
>> This is a *template* that throws on null?? You sure that's such a bright 
>> idea? Isn't this what validation is for? And third:
>> 
>>>     at 
>>> com.acaiq.depot.component.TransactionAchat.performAction(TransactionAchat.java:63)
>>>     at 
>>> com.webobjects.woextensions.WOLongResponsePage.run(WOLongResponsePage.java:119)
>> 
>> As you're throwing from inside a normal 
>> com.webobjects.woextensions.WOLongResponsePage, I seriously hope you're 
>> doing your part of try{} finally{} and EC unlocking.
>> 
>> 
>> Cheers, Anjo
>> 
>> 
>> 
>> Am 31.05.2010 um 20:02 schrieb Pascal Robert:
>> 
>>> One of our apps have deadlocked 5 times over 3 days, strangely enough it 
>>> started when we moved our Oracle Database 10gR2 DB to our VMWare ESX 4.0 
>>> cluster. e didn't re-install Oracle, I simply did a P2V (Physical to VM) 
>>> conversion, so it's the exact same version of Oracle DB as before.
>>> 
>>> What's happenning is that we store some information on our Oracle database, 
>>> save it, and we built a copy of some of the data to a new EO (different 
>>> entity) in a SQL Server 2005 db so the accounting system take care of 
>>> billing.
>>> 
>>> The exception that cause the deadlock (or at least the last thing written 
>>> to the log before the deadlock) :
>>> 
>>> java.lang.IllegalArgumentException: Attribute noCommandeOracle can't 
>>> receive a null parameter :
>>>     at 
>>> com.acaiq.fondation.depot.lbaArticle._CommandesEcom.setNoCommandeOracle(_CommandesEcom.java:419)
>>>     at com.acaiq.fondation.depot.Caissier.copiePourLBA(Caissier.java:267)
>>>     at com.acaiq.fondation.depot.Caissier.paye(Caissier.java:137)
>>>     at 
>>> com.acaiq.depot.component.TransactionAchat.performAction(TransactionAchat.java:63)
>>>     at 
>>> com.webobjects.woextensions.WOLongResponsePage.run(WOLongResponsePage.java:119)
>>>     at java.lang.Thread.run(Thread.java:613)
>>> 
>>> And it happens here :
>>> 
>>>     commandeEcom.setNoCommandeOracle(((Integer) _commande.clefsPrimaire()));
>>> 
>>> _commande.clefsPrimaire() is a method that simply do :
>>> 
>>>     return ERXEOControlUtilities.primaryKeyObjectForObject(this);
>>> 
>>> So clefsPrimaire() returns null, even if the data was stored in the Oracle 
>>> DB (and it's really there) and a primary key was generated (EOF did it, 
>>> it's not a "human generated" PK).
>>> 
>>> The whole block :
>>> 
>>> try {
>>>     _commande.setEstPaye(true);
>>>     _commande.editingContext().saveChanges();  // This is when we save the 
>>> EO in Oracle
>>> } catch (Exception e) {
>>>     NSLog.err.appendln(e.getMessage());
>>>     
>>> remboursementPayflow((Transaction)pfd.valueForKey("transaction"),_commande, 
>>> (String) pfd.valueForKey("PNREF"));
>>>     pfd = new NSDictionary<Object, String>(CODE_RESUTAT, "REMBOURSEMENT");
>>> } finally {
>>>     try {
>>>             copiePourLBA(_commande);  // This is the method where we copy 
>>> some data to SQL Server
>>>     } catch (Exception e) {
>>>             NSLog.err.appendln(e.getMessage());
>>>     }
>>>     _commande.editingContext().saveChanges();
>>> }
>>> 
>>> This problem didn't happen in the past (but we also had less requests 
>>> coming him) and it doesn't always happen :-/ jstack give me tons of this :
>>> 
>>> Thread t...@71683: (state = BLOCKED)
>>> - java.net.PlainSocketImpl.accept(java.net.SocketImpl) @bci=0, line=382 
>>> (Interpreted frame)
>>> - java.net.ServerSocket.implAccept(java.net.Socket) @bci=50, line=450 
>>> (Interpreted frame)
>>> - java.net.ServerSocket.accept() @bci=48, line=421 (Interpreted frame)
>>> - com.webobjects.appserver._private.WOWorkerThread.run() @bci=26, line=238 
>>> (Interpreted frame)
>>> - java.lang.Thread.run() @bci=11, line=613 (Interpreted frame)
>>> 
>>> The only non-blocked thread is :
>>> 
>>> Thread t...@78083: (state = IN_NATIVE)
>>> - java.net.PlainSocketImpl.socketAccept(java.net.SocketImpl) @bci=0 
>>> (Interpreted frame)
>>> - java.net.PlainSocketImpl.accept(java.net.SocketImpl) @bci=7, line=384 
>>> (Interpreted frame)
>>> - java.net.ServerSocket.implAccept(java.net.Socket) @bci=50, line=450 
>>> (Interpreted frame)
>>> - java.net.ServerSocket.accept() @bci=48, line=421 (Interpreted frame)
>>> - com.webobjects.appserver._private.WOWorkerThread.run() @bci=26, line=238 
>>> (Interpreted frame)
>>> - java.lang.Thread.run() @bci=11, line=613 (Interpreted frame)
>>> 
>>> Beside going back to our Oracle physical server, I have no idea of why I'm 
>>> getting this. Java 1.5 on OS X 10.4.11 Server, WO 5.3.3. W
>>> 
>>> <pid23991.txt>
>>> 
>>> 
>>> ----
>>> Pascal Robert
>>> [email protected]
>>> 
>>> AIM: MacTICanada
>>> Twitter : MacTICanada
>>> LinkedIn : http://www.linkedin.com/in/macti
>>> WO Community profile : http://wocommunity.org/page/member?name=probert
>>> 
>>> _______________________________________________
>>> Do not post admin requests to the list. They will be ignored.
>>> Webobjects-dev mailing list      ([email protected])
>>> Help/Unsubscribe/Update your Subscription:
>>> http://lists.apple.com/mailman/options/webobjects-dev/anjo%40krank.net
>>> 
>>> This email sent to [email protected]
>> 
> 
> _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Webobjects-dev mailing list      ([email protected])
> Help/Unsubscribe/Update your Subscription:
> http://lists.apple.com/mailman/options/webobjects-dev/jmlittle%40gmail.com
> 
> This email sent to [email protected]
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      ([email protected])
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com

This email sent to [email protected]

Reply via email to