I'd like to react to a couple of your points, bearing in mind that I'm not claiming to be "right", just putting it out there for discussion.
i wasnt claiming you were "wrong" :) as you said you wanted a discussion.
>> Is the Hibernate L2 cache a distributed cache?
>in hibernate it is a pluggable implementation. by default it uses ehcache
>which as of 1.2 has clustering support afaik. but i hope it doesnt replicate
>entities over the cluster and just replicates the evict calls instead.
The problem with a non-replicated cache is that, it doesn't work well for your use case -- serializing state accros members of a cluster. What will happen is that your state get's unserialized on some machine, then a so called "sepuku" eviction occurs cluster wide, and the fresh results get reloaded *from the database*, onto that single machine where your session now is live, and all the other nodes drop the televant stuff from cache.
in roundrobin clustering that might be somewhat true. but the eviction only occurs once the cached entity has been modified. so the impact of that is proportional to you write frequency.
in sticky session with a backup buddy, which i think is a much better approach, this is not an issue because you work on the same node until failover, so you should have a fairly high hit rate.
also the point is not to replicate entities that you have cached in your session. you only store the id, and later retrieve it from cache.
Certainly believe you are seeing benefit from your cache, but I suspect that most of the benefit is not in session replication or failover. You definitely will see benefits if you are doing N+1 loading, but why do N+1 loading?
this is a more general use case then n+1 loading, namely retrieval of objects via their database identity. this kind of lookup is very common/frequent in the webapp world.
take a simple example of a shopping cart backed by a cookie and the shopping cart is shown in a left-nav frame of your app - so visible on every request. you want product names/stock quantity to always be up to date. your cookie stores the list of products and their quantities, then on every request when you render the shopping cart you have to load each one of those objects from the db to get access to their properties. you can either do a [select where id in()] or you can do load by id and bet on them being in the 2nd level cache.
there are also a lot of examples where there is no coherent list of objects, so a [select where id in()] is not possible, there you have no other choice but to do a lookup by id. keeping in mind there doesnt have to be many queries per request to cause high db load for busy apps.
>the cache doesnt live in httpsession so that is ok. httpsession is the
>important part because it is expensive to replicate for clustering so you
>want to keep it as small as possible. of course expensive is a relative
?term.
I see this perspective. It's always a "time vs. space" tradeoff. You want to incur the cost of hitting the database, in order to minimize the amount of state you have to squirt between machines. However, I'd like to point out that unless you are using a distributed, hard cache, this architecture is just going to batter the database and turn it into a bottleneck, for some applications,
yep, that is the case in most web apps, db is almost always your bottleneck exactly for the reason you mentioned. and the point of the 2nd level cache is to make it much less so by taking a lot of load off it. you can also rewrite some queries to use n loading instead of a [select in()] and bet on your cache.
and it might not be a good idea to assume the system has a distributed hard cache.
why not? the cache is part of your application's architecture. it is a service your application provides not something outside like the database or the application server. sometimes you even optimize your apps for this cache.
I think Shades might just be optimized enough that serializing the DatabaseSession probably only approximately doubles the space that you would need to serialize the Pojo's alone. It seems to me that if your pojos have a dozen or so primitive fields, that simply detaching the serializable pojo, and hanging onto it for attachment yields a better performing system than hitting the database once for each pojo being displayed on the page, when you attach. I think the determining factor is the size of the POJO's themselves. My rule of thumb would be, "if the pojo's have a few dozen primitve fields, just serialize them out along with the pojo."
i dont know about that. take the phonebook's list page. you think serializing fields of all those 10 contacts is better then serializing 10 longs and betting on cache hit for all of them? i really dont think so. not only is your session smaller/faster replicated, but you are working with current data. if you serialized the properties of the pojo, on next request you would display stale data if it changed in between the two requests. in fact, how do you know when to refresh the data? it wont happen until you do a next select list query, which might be a while.
>hope that example explains it. notice i didnt make cache transient to really
>drive the point home. so when serialized the model only keeps the long id,
>the other information is redundant and would be a waste to serialized and
>replicate because
>(a) it can be recreated
yes, but it is very expensive unless you use distributed cache,
well yes, isnt that the point of having the cache? :)
and it shouldnt be _that_ expensive because as you said the db has a query cache, and pk is a clustered index. so most of the expense is incurred by the wire transfer
in which case, they are not really "recreated", and you then you still have the volatility issue
i dont think the volatility issue is still there because the cache is up to date and the db reload only occurs _if_ the entity has been updated.
Is (a) is a hard/fast rule? I tend to think it depends on how big a footprint the POJO has.
the footprint is not the only issue. sure smaller objects are cheaper to transfer over wire. but as you yourself said the db is a bottleneck because nodes flood it with requests, 2nd level cache can eliminate a lot of requests entirely.
And, data volatility depends on the application. Volatility can be addresses through optimistic oncurrency, which Shades uses.
that takes care of data being out-of-date for an update, but not for it being out of date for redisplaying. in webapps often large portions of the page go unchanged between requests, that means all the db data in those portions is just a redisplay/requery from the db.
Also, I'm going to add "read only" queries to Shades, in which case there will be close to zero space-overhead associated with each POJO beyond the size of the POJO itself, so long as you don't plan on updating the data.
cool
Considering the phonebook example, the query that puts the 10 items on the page, is a read-only query. When you go the detail page, there is only one record at a time that can be edited. So solve volatility, and state-size, in one fell swoop, by using read-only style queries for the "listing" pages, and regular queries, with optimistic concurrency for the "edit" pages.
but what if the contact is updated by someone else between you accessing the list page and clicking the edit link? how do you know that has happened? you wouldnt. only when you try to save you would get an optimistic lock error, but if you wouldve reloaded that contact when displaying the edit page also, you wouldnt have this problem.
This is a great discussion --- I think I will go add read only queries to Shades! :-)
agreed
-igor
------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________ Wicket-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/wicket-user
