On Wed, 28 Oct 2009, Mindaugas Kavaliauskas wrote: Hi,
> in the beginning I was going to write a private email to Przemek and > ask about CDX detail, but more deep test gave me an answer. > The problem of the customers was that sometimes browse does not show > records (in scoped alias), but records for sure exist in database. > We found that ORDKEYCOUNT() return zero but scope is not empty. The > problem of ADS RDD is that it does not check return values of ADS > API functions in many places, for exmaple, in processing of > DBOI_KEYCOUNT. Yes and as usually I always suggest to check error code and report them if any as RT errors to catch problems as soon as possible and inform user about them. > Here AdsGetRecordCount() return error 7022. > Advantage Error Guide: > ---------------------- > 7022 Maximum index levels > Problem: The current index is unbalanced. This problem is most > likely to occur when using very long index keys and new keys are > being added to the index. This error could also occur if the index > is corrupt. It may happen in CDX format when index key size is greater then ( 512 - 12 ) / 3 - 8 = 158. In such case maximum number of keys in interior node is 2. It means that algorithms which do not actively balance keys in pages during updates may store only one key per page. Because in CDX format keys in interior nodes are not real keys but only dummy key value then such key value have to be repeated on each next level. In theory some algorithms may end with infinite path from root to leaf page. Well designed algorithm should not have such problems. CDX format have bindings between pages on the same level so it's quite easy to implement inter page key balancing and I implemented it in Harbour. It causes that in Harbour for each three pages on the same level only one may have 1 key and two others will have 2 keys so for sure it has finite path from root node to key in the leaf. > Solution: Reindex the file. The reindexed file will be built using > the minimum number of levels. This will speed up all future > operations using the index. If this does not resolve the problem, > reindex with a larger page size (see Index Page Size, or see the API > AdsReindex61 in the Advantage Client Engine Help or the method > AdsIndexPageSize in the Advantage TDataSet Descendant Help). With a > larger index page size, more keys will be stored in each page and > will thus reduce the number of levels required. For specific > equations showing the correlation between key and page size, see > Index Key Size and Page Size Relationships. > ---------------------- If you increase page size then more keys can be stored in single node so even more primitive algorithms should not have problems with degenerated tree though usually they will create indexes noticeable bigger then optimal just after reindexation. If you check size of CDX indexes updated record by record by Harbour and other CDX drivers then you should find that indexes created by Harbour are much smaller. Very often the size is comparable to indexes created with [re]index command. It happens because Harbour strongly balance keys in index pages during update with some additional cost in write operation. > Unfortunately CDX page size is fixed 512 and can not be changed. I > have a database table from our customers. It has 50342 records only > and index key length is 133 bytes. I have a feeling this is a too > small limits for real life a database. Documented maximum key length > is 240 bytes. For such key length you can store 3 index keys in any index node. If you can exploit this problem using such key then it means that the CDX driver you are using does not have even basic key balance for two neighbour pages because in such case it should keep 2 as minimum keys for interior node so the maximum path from root node to leaf for 2^32 records is 32. > I had an idea that index tree level count can be limited by CDX > structure, but after I've wrote a self contained test and compared > ADSCDX and DBFCDX results, I understand it is a limit of ADS > implementation. Both my real life key length 133, and a maximum > documented key length 240 was tested (see test source below): > RDD: ADSCDX KeyLen: 240 LASTREC(): 6058 ORDKEYCOUNT(): 0 > RDD: ADSCDX KeyLen: 133 LASTREC(): 250052 ORDKEYCOUNT(): 0 > So, using a maximum documented key length (240 bytes) ADS can manage > tables having only(!) 6058 records. The same results are for both > 8.1 and 9.1 version of ADS. So it's not good CDX implementation and probably the same code is in all ADS clients. > (512-some_header) / (240 + pointer_size) = 2 keys per page > log(6058)/log(2+1)~= 7,927 ~= 8 > (512-some_header) / (133 + pointer_size) = 3 keys per page > log(250052)/log(3+1)~= 8,966 ~= 9 > So, maximum index tree level in ADS is 8, or 9, or a little more, if > index is unbalanced. Perhaps ADS uses a fixed length array to store > pages in index navigation operations. I've not tested yet, if > DBSKIP() or DBSEEK() has also the same limits, or it is a problem of > ORDKEYCOUNT() only. Such test does not give you precise answer about maximum tree depth because the after updating record by record the index tree is not optimal. Depending on key values and their order in updating the difference is bigger or smaller. If you want to check maximum tree depth then you should create table and then use INDEX ON command to create optimal index. Anyhow if ADS have maximum depth size set to 32 then you will not be able to exploit the problem using DBFs which may have upto 2^32 records. But your test shows how efficient is page balancing during index updates and here some trivial algorithms which do not balance keys between pages keeping some minimal number of keys can be exploited. > I was unable to find a index level limit in Harbour's DBFCDX driver > using the test. Because it's not possible to exploit this problem in Harbour CDX implementation. If you are using ADS then you can try to use ADI indexes. Looking at .ADI files with binary editor they seem to be slightly modified CDX format. They allow to use different page set in header size and instead of using page offsets in bytes they use page offsets as page number what increase maximum index size from 2^32 to 2^32 * page_size. Exactly the same trick I'm using in Harbour NTX and NSX RDDs if user set large file support what increase maximum size for this indexes to 2^32 * 2^10 = 2^42 = 4TB. best regards, Przemek _______________________________________________ Harbour mailing list Harbour@harbour-project.org http://lists.harbour-project.org/mailman/listinfo/harbour