Re: [DISCUSS] Pre splitting an existing table to avoid hotspotting issues.

Rushabh Shah Wed, 07 Feb 2024 09:49:53 -0800

Thank you Andrew, Bryan and Duo for your responses.

> My main thought is that a migration like this should use bulk loading,
> But also, I think, that data transfer should be in bulk


We are working on moving to bulk loading.

> With Admin.splitRegion, you can specify a split point. You can use that to
iteratively add a bunch of regions wherever you need them in the keyspace.
Yes, it's 2 at a time, but it should still be quick enough in the grand
scheme of a large migration.


Trying to do some back of the envelope calculations.
In a production environment, it took around 4 minutes to split a recently
split region which had 4 store files with a total of 5 GB of data.
Assuming we are migrating 5000 tenants at a time and normally we have
around 10% of the tenants (500 tenants) which have data
 spread across more than 1000 regions. We have around 10 huge tables where
we store the tenant's data for different use cases.
All the above numbers are on the *conservative* side.

To create a split structure for 1000 regions, we need 10 iterations of the
splits (2^10 = 1024). This assumes we are parallely splitting the regions.
Each split takes around 4 minutes. So to create 1000 regions just for 1
tenant and for 1 table, it takes around 40 minutes.
For 10 tables for 1 tenant, it takes around 400 minutes.

For 500 tenants, this will take around *140 days*. To reduce this time
further, we can also create a split structure for each tenant and each
table in parallel.
But this would put a lot of pressure on the cluster and also it will
require a lot of operational overhead and still we will end up with
 the whole process taking days, if not months.

Since we are moving our infrastructure to Public Cloud, we anticipate this
huge migration happening once every month.


> Adding a splitRegion method that takes byte[][] for multiple split points
would be a nice UX improvement, but not
strictly necessary.

IMHO for all the reasons stated above, I believe this is necessary.





On Mon, Jan 29, 2024 at 6:25 AM 张铎(Duo Zhang) <[email protected]> wrote:

> As it is called 'pre' split, it means that it can only happen when
> there is no data in table.
>
> If there are already data in the table, you can not always create
> 'empty' regions, as you do not know whether there are already data in
> the given range...
>
> And technically, if you want to split a HFile into more than 2 parts,
> you need to design new algorithm as now in HBase we only support top
> reference and bottom reference...
>
> Thanks.
>
> Bryan Beaudreault <[email protected]> 于2024年1月27日周六 02:16写道：
> >
> > My main thought is that a migration like this should use bulk loading,
> > which should be relatively easy given you already use MR
> > (HFileOutputFormat2). It doesn't solve the region-splitting problem. With
> > Admin.splitRegion, you can specify a split point. You can use that to
> > iteratively add a bunch of regions wherever you need them in the
> keyspace.
> > Yes, it's 2 at a time, but it should still be quick enough in the grand
> > scheme of a large migration. Adding a splitRegion method that takes
> > byte[][] for multiple split points would be a nice UX improvement, but
> not
> > strictly necessary.
> >
> > On Fri, Jan 26, 2024 at 12:10 PM Rushabh Shah
> > <[email protected]> wrote:
> >
> > > Hi Everyone,
> > > At my workplace, we use HBase + Phoenix to run our customer workloads.
> Most
> > > of our phoenix tables are multi-tenant and we store the tenantID as the
> > > leading part of the rowkey. Each tenant belongs to only 1 hbase
> cluster.
> > > Due to capacity planning, hardware refresh cycles and most recently
> move to
> > > public cloud initiatives, we have to migrate a tenant from one hbase
> > > cluster (source cluster) to another hbase cluster (target cluster).
> > > Normally we migrate a lot of tenants (in 10s of thousands) at a time
> and
> > > hence we have to copy a huge amount of data (in TBs) from multiple
> source
> > > clusters to a single target cluster. We have our internal tool which
> uses
> > > MapReduce framework to copy the data. Since all of these tenants don’t
> have
> > > any presence on the target cluster (Note that the table is NOT empty
> since
> > > we have data for other tenants in the target cluster), they start with
> one
> > > region and due to an organic split process, the data gets distributed
> among
> > > different regions and different regionservers. But the organic
> splitting
> > > process takes a lot of time and due to the distributed nature of the MR
> > > framework, it causes hotspotting issues on the target cluster which
> often
> > > lasts for days. This causes availability issues where the CPU is
> saturated
> > > and/or disk saturation on the regionservers ingesting the data. Also
> this
> > > causes a lot of replication related alerts (Age of last ship, LogQueue
> > > size) which goes on for days.
> > >
> > > In order to handle the huge influx of data, we should ideally
> pre-split the
> > > table on the target based on the split structure present on the source
> > > cluster. If we pre-split and create empty regions with right region
> > > boundaries it will help to distribute the load to different regions and
> > > region servers and will prevent hotspotting.
> > >
> > > Problems with the above approach:
> > > 1. Currently we allow pre splitting only while creating a new table.
> But in
> > > our production env, we already have the table created for other
> tenants. So
> > > we would like to pre-split an existing table for new tenants.
> > > 2. Currently we split a given region into just 2 daughter regions. But
> if
> > > we have the split points information from the source cluster and if the
> > > data for the to-be-migrated tenant is split across 100 regions on the
> > > source side, we would ideally like to create 100 empty regions on the
> > > target cluster.
> > >
> > > Trying to get early feedback from the community. Do you all think this
> is a
> > > good idea? Open to other suggestions also.
> > >
> > >
> > > Thank you,
> > > Rushabh.
> > >
>

Re: [DISCUSS] Pre splitting an existing table to avoid hotspotting issues.

Reply via email to