Thanks for the info Anil. I first tried a MR which did Put's, based on the examples at [1] but this was much too slow, as you said. I switching to writing HFiles directly via HFileOutputFormat solves the issue.
Also, I wanted to post an issue I ran into, in case anyone runs into it in the future. For a table re-write doing a reduce can be bad, because the MR framework will try to sort the whole table, potentially multiple TB. You can avoid this by calling job.setNumReduceTasks(0). However, if you use HFileOutputFormat.configureIncrementalLoad(), that call will also set up the reducer, which may be a bit surprising (at least it was to me). So the order matters: // This will have a (potentially long) reduce phase. Bad for large tables. job.setNumReduceTasks(0); HFileOutputFormat.configureIncrementalLoad(job, hTable); // Overrides # of reduce tasks Instead this works better for large tables: // This will skip reduce phase HFileOutputFormat.configureIncrementalLoad(job, hTable); job.setNumReduceTasks(0); Followed by a major compaction that will do the sorting for locality. [1] http://hbase.apache.org/0.94/book/mapreduce.example.html On Tue, Feb 20, 2018 at 6:44 AM, anil gupta <anilgupt...@gmail.com> wrote: > Hi Marcell, > > Since key is changing you will need to rewrite the entire table. I think > generating HFlies(rather than doing puts) will be the most efficient here. > IIRC, you will need to use HFileOutputFormat in your MR job. > For locality, i dont think you should worry that much because major > compaction usually takes care of it. If you want very high locality from > beginning then you can run a major compaction on new table after your > initial load. > > HTH, > Anil Gupta > > On Mon, Feb 19, 2018 at 11:46 PM, Marcell Ortutay <mortu...@23andme.com> > wrote: > > > I have a large HBase table (~10 TB) that has an existing key structure. > > Based on some recent analysis, the key structure is causing performance > > problems for our current query load. I would like to re-write the table > > with a new key structure that performs substantially better. > > > > What is the best way to go about re-writing this table? Since they key > > structure will change, it will affect locality, so all the data will have > > to move to a new location. If anyone can point to examples of code that > > does something like this, that would be very helpful. > > > > Thanks, > > Marcell > > > > > > -- > Thanks & Regards, > Anil Gupta >