FWIW the C++ compute library now uses
https://github.com/JuliaStrings/utf8proc, so assuming it does all of the
things you want, it could save you some trouble if you used it in Gandiva
too--cmake is already set up to use it.

Neal

On Tue, Dec 22, 2020 at 3:41 PM Sagnik Chakraborty <sagn...@dremio.com>
wrote:

> We are looking to implement upper() / lower() for non-ASCII characters.
> The current Gandiva implementation handles upper() / lower() only for
> standard ASCII characters.
>
> For the implementation in Gandiva, I went through a few articles and
> answers on StackOverflow and the top answer to this question <
> https://stackoverflow.com/questions/36897781/how-to-uppercase-lowercase-utf-8-characters-in-c>
> suggests that there is no standard way to do Unicode case conversion in
> C/C++ and that an external library like ICU <
> https://unicode-org.github.io/icu-docs/#/icu4c/> is necessary to ensure
> guaranteed Unicode case conversion.
>
> So, I just wanted to know that while adding any external library in
> Gandiva, what are the issues that we need to take care of in order to
> ensure that we do not break existing code and not sacrifice on performance
> as well? Is there any existing library that we can make use of to go about
> solving this problem? Any suggestions would be welcome.
>
> Regards,
> Sagnik

Reply via email to