Hi, in the last couple of days and weeks I've been going through the Wiki and tried to find things that were undocumented or outdated (and update them).
This is a non-exhaustive list of things I found: Avro support, TIMESTAMP, BINARY, union types, a lot of UDFs, Indexes, HBase support, Table links, CLI options, ... A lot of these things are very nice features that could be very useful to end users. I've tried to do my best to document what I understand myself but for some of these things it's too much to understand. For some features there are either JIRAs or Design documents available but I've found that the implementation often differs significantly from what the design says so I had to resort to patches which are hard to read (at least for me). Wouldn't a general policy make sense that allows new and changed features only if they are documented? How else are end users supposed to find about all these great things? How are you bringing new users up to speed with Hive and all its features in your companies? In the mean time I'll continue to monitor commits and document what I can but I have some specific questions that maybe someone can help with: * What is the status of indexes? What does work, when and how can they be used? The design doc[1] seems out of date but I'm not sure. * How do union types really work? The JIRA[2] mentions tags that can be named but the tests in the patch don't seem to use them. Are they optional or not needed at all? * Is the design document for BINARY[3] types still accurate? I'm sure more will pop up and I appreciate any help. Also I'm not a native english speaker and no Hive expert so please feel free to correct whatever I'm writing in the Wiki. Cheers, Lars [1] <https://cwiki.apache.org/confluence/display/Hive/IndexDev> [2] <https://issues.apache.org/jira/browse/HIVE-537> [3] <https://cwiki.apache.org/Hive/binary-datatype-proposal.html>