ZoneMap is only one factor. Attention should be given to compaction, import. When doing compaction there may be too many versions, which each will occupy many memory.
Thanks, Zhao Chun Lee Happen <happen...@hotmail.com> 于2021年7月20日周二 下午7:20写道: > > We should suppot limit zonemap index. > > In the current scenario of doris, excessively long zonemap indexes do not > contribute significantly to query improvement, but rather consume a lot of > additional memory and storage resources. So it is wise to limit the length > of zonemap for large types of objects and long strings > > For example, postgresql's Brin index, by default, records 128 rows of data > once. The current Doris default page size is 64k, so a reasonable zonemap > index length should be about 512byte. > > Thanks, > Li HaoPeng > > > ________________________________ > From: Zhao Chun <zh...@apache.org> > Sent: July 20, 2021 2:42 > To: dev@doris.apache.org <dev@doris.apache.org> > Subject: Re: [Proposal] Support large variable-length string type > > This solution we have done some POC internally. > > In the current code, this will cause the memory usage to become larger and > more likely to trigger OOM, > also the size of batch will exceed the maximum value of int32 during RPC. > > So to use this scheme, you need to design carefully in several places. > > Thanks, > Zhao Chun > > > 寒江雪 <yangz...@gmail.com> 于2021年7月20日周二 上午10:31写道: > > > Hi All > > I want to submit a proposal to support larger string types. > > Background > > > > There are currently two types of strings: CHAR and VARCHAR. Char stores > > fixed-length strings and VARCHAR stores variable-length strings. The > > maximum length of VARCHAR is 65533. This length can meet most demand > > scenarios, but for some scenarios. In the scenario of storing larger > > strings in doris, it is not enough, so we need to add a new data type > > String. String can correspond to blob or text storage in mysql. The > maximum > > length is 4GB, but we still don't recommend it. Store more than 64K > strings > > in DORIS > > Other system implementation > > > > - > > > > MYSQL: Mysql uses blob or TEXT as the storage type for very long > > strings. MySQL can perform string operations on these types, but > > performance is not guaranteed. In actual storage, the data will be > > stored > > in the overflow page. And according to the version and storage engine > in > > the data page, the first n characters will be stored for indexing > > - > > > > parquet/ORC: These two pairs and large strings are directly stored in > > the data area, and there is no special processing and only dictionary > > encoding > > > > Design > > > > - > > > > Added the String type, which represents a string of any length. In > order > > to be compatible with mysql, the maximum length is set to 4G-4, and 4 > > bytes > > are used to store the length of the string > > - > > > > The data storage is similar to the varchar type, the previous length > > identifier is changed to 4 bytes > > - > > > > Indexes are not currently supported, and zonemap indexes will be > enabled > > after the zonemap length limit is ready. > > >