I created a JIRA for this. I will do the changes in select kernels and
report back with benchmark results
https://issues.apache.org/jira/browse/ARROW-13170


On Thu, Jun 24, 2021 at 12:27 AM Yibo Cai <yibo....@arm.com> wrote:

> Did a quick test. For random bitmaps and my trivial test code, the
> branch-less code is 3.5x faster than branch one.
> https://quick-bench.com/q/UD22IIdMgKO9HU1PsPezj05Kkro
>
> On 6/23/21 11:21 PM, Wes McKinney wrote:
> > One project I was interested in getting to but haven't had the time
> > was introducing branch-free code into vector_selection.cc and reducing
> > the use of if-statements to try to improve performance.
> >
> > One way to do this is to take code that looks like this:
> >
> > if (BitUtil::GetBit(filter_data_, filter_offset_ + in_position)) {
> >    BitUtil::SetBit(out_is_valid_, out_offset_ + out_position_);
> >    out_data_[out_position_++] = values_data_[in_position];
> > }
> > ++in_position;
> >
> > and change it to a branch-free version
> >
> > bool advance = BitUtil::GetBit(filter_data_, filter_offset_ +
> in_position);
> > BitUtil::SetBitTo(out_is_valid_, out_offset_ + out_position_, advance);
> > out_data_[out_position_] = values_data_[in_position];
> > out_position_ += advance; // may need static_cast<int> here
> > ++in_position;
> >
> > Since more people are working on kernels and computing now, I thought
> > this might be an interesting project for someone to explore and see
> > what improvements are possible (and what the differences between e.g.
> > x86 and ARM architecture are like when it comes to reducing
> > branching). Another thing to look at might be batch-at-a-time
> > bitpacking in the output bitmap versus bit-at-a-time.
> >
>


-- 
Niranda Perera
https://niranda.dev/
@n1r44 <https://twitter.com/N1R44>

Reply via email to