On Wed, 28 Oct 2009, Mindaugas Kavaliauskas wrote:

Hi,

> in the beginning I was going to write a private email to Przemek and
> ask about CDX detail, but more deep test gave me an answer.
> The problem of the customers was that sometimes browse does not show
> records (in scoped alias), but records for sure exist in database.
> We found that ORDKEYCOUNT() return zero but scope is not empty. The
> problem of ADS RDD is that it does not check return values of ADS
> API functions in many places, for exmaple, in processing of
> DBOI_KEYCOUNT.

Yes and as usually I always suggest to check error code and report them
if any as RT errors to catch problems as soon as possible and inform
user about them.

> Here AdsGetRecordCount() return error 7022.
> Advantage Error Guide:
> ----------------------
> 7022 Maximum index levels
> Problem: The current index is unbalanced. This problem is most
> likely to occur when using very long index keys and new keys are
> being added to the index. This error could also occur if the index
> is corrupt.

It may happen in CDX format when index key size is greater then
( 512 - 12 ) / 3 - 8 = 158. In such case maximum number of keys
in interior node is 2. It means that algorithms which do not
actively balance keys in pages during updates may store only
one key per page. Because in CDX format keys in interior nodes
are not real keys but only dummy key value then such key value
have to be repeated on each next level. In theory some algorithms
may end with infinite path from root to leaf page.
Well designed algorithm should not have such problems. CDX format
have bindings between pages on the same level so it's quite easy
to implement inter page key balancing and I implemented it in
Harbour. It causes that in Harbour for each three pages on the same
level only one may have 1 key and two others will have 2 keys so
for sure it has finite path from root node to key in the leaf.

> Solution: Reindex the file. The reindexed file will be built using
> the minimum number of levels. This will speed up all future
> operations using the index. If this does not resolve the problem,
> reindex with a larger page size (see Index Page Size, or see the API
> AdsReindex61 in the Advantage Client Engine Help or the method
> AdsIndexPageSize in the Advantage TDataSet Descendant Help). With a
> larger index page size, more keys will be stored in each page and
> will thus reduce the number of levels required. For specific
> equations showing the correlation between key and page size, see
> Index Key Size and Page Size Relationships.
> ----------------------

If you increase page size then more keys can be stored in single
node so even more primitive algorithms should not have problems
with degenerated tree though usually they will create indexes
noticeable bigger then optimal just after reindexation. If you
check size of CDX indexes updated record by record by Harbour
and other CDX drivers then you should find that indexes created
by Harbour are much smaller. Very often the size is comparable
to indexes created with [re]index command. It happens because
Harbour strongly balance keys in index pages during update with
some additional cost in write operation.

> Unfortunately CDX page size is fixed 512 and can not be changed. I
> have a database table from our customers. It has 50342 records only
> and index key length is 133 bytes. I have a feeling this is a too
> small limits for real life a database. Documented maximum key length
> is 240 bytes.

For such key length you can store 3 index keys in any index node.
If you can exploit this problem using such key then it means that
the CDX driver you are using does not have even basic key balance
for two neighbour pages because in such case it should keep 2 as
minimum keys for interior node so the maximum path from root node
to leaf for 2^32 records is 32.

> I had an idea that index tree level count can be limited by CDX
> structure, but after I've wrote a self contained test and compared
> ADSCDX and DBFCDX results, I understand it is a limit of ADS
> implementation. Both my real life key length 133, and a maximum
> documented key length 240 was tested (see test source below):
>   RDD: ADSCDX KeyLen:    240 LASTREC():   6058 ORDKEYCOUNT():     0
>   RDD: ADSCDX KeyLen:    133 LASTREC(): 250052 ORDKEYCOUNT():     0
> So, using a maximum documented key length (240 bytes) ADS can manage
> tables having only(!) 6058 records. The same results are for both
> 8.1 and 9.1 version of ADS.

So it's not good CDX implementation and probably the same code
is in all ADS clients.

> (512-some_header) / (240 + pointer_size) = 2 keys per page
> log(6058)/log(2+1)~= 7,927 ~= 8
> (512-some_header) / (133 + pointer_size) = 3 keys per page
> log(250052)/log(3+1)~= 8,966 ~= 9
> So, maximum index tree level in ADS is 8, or 9, or a little more, if
> index is unbalanced. Perhaps ADS uses a fixed length array to store
> pages in index navigation operations. I've not tested yet, if
> DBSKIP() or DBSEEK() has also the same limits, or it is a problem of
> ORDKEYCOUNT() only.

Such test does not give you precise answer about maximum tree depth
because the after updating record by record the index tree is not
optimal. Depending on key values and their order in updating the
difference is bigger or smaller. If you want to check maximum tree
depth then you should create table and then use INDEX ON command to
create optimal index. Anyhow if ADS have maximum depth size set to
32 then you will not be able to exploit the problem using DBFs which
may have upto 2^32 records.
But your test shows how efficient is page balancing during index updates
and here some trivial algorithms which do not balance keys between pages
keeping some minimal number of keys can be exploited.

> I was unable to find a index level limit in Harbour's DBFCDX driver
> using the test.

Because it's not possible to exploit this problem in Harbour CDX
implementation.
If you are using ADS then you can try to use ADI indexes. Looking at
.ADI files with binary editor they seem to be slightly modified CDX
format. They allow to use different page set in header size and instead
of using page offsets in bytes they use page offsets as page number what
increase maximum index size from 2^32 to 2^32 * page_size. Exactly the
same trick I'm using in Harbour NTX and NSX RDDs if user set large file
support what increase maximum size for this indexes to 2^32 * 2^10 =
2^42 = 4TB.

best regards,
Przemek
_______________________________________________
Harbour mailing list
Harbour@harbour-project.org
http://lists.harbour-project.org/mailman/listinfo/harbour

Reply via email to