Does JDO support the features in EclipseLink in this article? http://java-persistence-performance.blogspot.com/2010/08/batch-fetching-optimizing-object-graph.html
On Fri, Nov 30, 2012 at 11:36 AM, Benyi Wang <bewang.t...@gmail.com> wrote: > We have some tables with 15K ~ 20K partitions. If I run a query scanning a > lot of partitions, Hive could use more than 10 minutes to commit the mapred > job. > > The problem is caused by ObjectStore.getPartitionsByNames when Hive > semantic analyzer tries to prune partitions. This method sends a lot of > queries to our MySQL database to retrieve ALL information about partitions. > Because MPartition and MStroageDescriptor are converted into Partition and > StorageDescriptor, every field will be accessed during conversion, in other > words, even the fields has nothing to do with partition pruning, such as > BucketCols. In our case, 10 queries for each partition will be sent to the > database and each query may take 40ms. > > This is known ORM 1+N problem. But it is really bad user experience. > > Actually we assembly Partition objects manually, it would only need about > 10 queries for a group of partitions (default size is 300). In our > environment, it only needs 40 seconds for 30K partitions: 30K / 300 * 10 * > 40. > > I tried to this way: > > 1. Fetch MPartition with fetch group and fetch_size_greedy, so one query > can get MPartition's primary fields and MStorageDescriptor cached. > 2. Get all descriptors into a list "msds", run another query to get > MStorageDescriptor with filter like "msds.contains(this)", all cached > descriptors will be refreshed in one query instead of n queries. > > This works well for 1-1 relations, but not on 1-N relation like > MPartition.values. I didn't find a way to populate those fields in just one > query. > > Because JDO mapping doesn't work well in the conversion (MPartition -> > Partition), I'm wondering if it is worth doing like this: > > 1. Query each table in SQL directly PARTITIONS, SDS, etcs. > 2. Assembly Partition objects > > This is a hack and the code will be really bad. But I didn't find JDO > support "FETCH JOIN" or "Batch fetch". > > Any thoughts? >