Hi Wes, Thanks for your reply.
I agree, the implementation of the kernel is not the most optimal one. However, I was not sure how to initialize a hash table of an Arrow data type (STL wouldnt be able to help, since it needs a fixed type, I believe). I vaguely figured that i needed thr E Arrow native Hash, but wanted to focus on it after i get something running so that I can learn the ropes. I will follow your guidelines and work on the hash based implementation. Atri On Mon, 24 Sep 2018, 00:13 Wes McKinney, <wesmck...@gmail.com> wrote: > hi Atri, > > You're pushing one buffer for each element in the left array: > > > https://github.com/atris/arrow/commit/1dcce9b2f34818760df29fdf8767fc1619257ea9#diff-4459cb59122bbce0553230b6638f6d5eR100 > > (gdb) p out->buffers.size() > $23 = 3 > > The first buffer in out->buffers is the validity bitmap, which is > being set to all zeros, which indicates to other code that the values > are all null > > Unfortunately, this is not the preferred approach to implementing a > "match" function. It needs to use a hash table like Unique and > DictionaryEncode -- otherwise you have an O(n * m) algorithm instead > of O(n). You can see a commented out API placeholder in kernels/hash.h > > Hope this helps > Wes > On Sun, Sep 23, 2018 at 2:13 PM Atri Sharma <atri.j...@gmail.com> wrote: > > > > Hi All, > > > > While adding a new test, I am facing an issue where a Datum of Array > > type returned by a function in compute layer does not match the > > expected value. I manually checked the buffers of the returned Datum's > > Array's contained ArrayData, and they look to be the correct values, > > but on printing this ArrayData, all values are shown as null. > > > > Could someone please tell me what I am missing here? Is there an other > > way to do the comparison? > > > > The WIP code is at: > > > > > https://github.com/atris/arrow/commit/1dcce9b2f34818760df29fdf8767fc1619257ea9#diff-0c513f55830e5334d28c08b1a07c6215R1441 > > > > Regards, > > > > Atri >