Today String and Binary types are represented in memory as list<byte> [1] and we use logical types to distinguish between a list of bytes and string type [2].
The question of whether this is sufficient or if we should make a first class string/binary type has come up tangentially on a few threads and we should come try to come to a conclusion if we want to add it as part of a spec. I think the current proposal is that the String type would consist of null-bitmap buffer, an offset buffer and a buffer containing bytes (for strings the bytes would be UTF-8 encoded strings). The main difference with the list representation is, individual bytes cannot be marked as null because there isn't a nested Array. To quote Jacques for the pros of this approach: My main argument is that the most basic types most people need come in this order from my experience: Int String Float Decimal Binary Note that I'm not focused on width here, just generally "what people use". So I think a string comes second in priority and ease of use/approachability necessitate this as a first class concept. This is beyond the fact that it has specialized rules that are separate from a List<Byte>. The main argument for not doing this is it adds additional types that need to be implemented and can lead to some amount of redundant code. For instance, in the current C++ implementation we are able to have a String Array that extends a List Type and re-use already defined equality methods [3]. What do people think? Thanks, Micah [1] https://github.com/apache/arrow/blob/master/format/Layout.md [2] https://github.com/apache/arrow/blob/master/format/Message.fbs [3] https://github.com/apache/arrow/blob/master/cpp/src/arrow/types/string.h#L68