> On Nov. 23, 2014, 10:59 p.m., Mohit Sabharwal wrote: > > data/files/parquet_types.txt, lines 1-3 > > <https://reviews.apache.org/r/28147/diff/3/?file=772138#file772138line1> > > > > I think this is bit confusing, since the 0b prefix gives the impression > > that data is read in binary format, whereas it is actually getting read as > > a string. > > > > I think we can either write (preferably non-ascii) binary data instead > > (for example, see: data/files/string.txt) OR alternatively, we could write > > it legibly in hex, like 68656c6c6f ("hello") and convert it to binary using > > unhex() in the INSERT OVERWRITE query. What do you think ?
I encode some Chinese words(non-ascii) and use hex function to convert into string like B4F3CAFDBEDD(some Chinese words). > On Nov. 23, 2014, 10:59 p.m., Mohit Sabharwal wrote: > > ql/src/test/queries/clientpositive/parquet_types.q, line 48 > > <https://reviews.apache.org/r/28147/diff/3/?file=772143#file772143line48> > > > > No need to unhex here... > > > > Can just be: > > > > SELECT cchar, LENGTH(cchar), cvarchar, LENGTH(cvarchar), cbinary FROM > > parquet_types > > > > Or you can pass it through hex() if original data has unprintable > > characters: > > > > SELECT cchar, LENGTH(cchar), cvarchar, LENGTH(cvarchar), hex(cbinary) > > FROM parquet_types I think the statement of "SELECT cint, ctinyint, csmallint, cfloat, cdouble, cstring1, t, cchar, cvarchar, hex(cbinary), m1, l1, st1 FROM parquet_types;" will cover the case of binary. There is no need anymore for checking cbinary again. - cheng ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28147/#review62744 ----------------------------------------------------------- On Nov. 21, 2014, 8:53 a.m., cheng xu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/28147/ > ----------------------------------------------------------- > > (Updated Nov. 21, 2014, 8:53 a.m.) > > > Review request for hive. > > > Repository: hive-git > > > Description > ------- > > This patch includes: > 1. binary support for ParquetHiveSerde > 2. related test cases both in unit and ql test > > > Diffs > ----- > > data/files/parquet_types.txt d342062 > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java > 472de8f > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java > d5aae3b > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java > 4effe73 > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java > 8ac7864 > ql/src/test/queries/clientpositive/parquet_types.q 22585c3 > ql/src/test/results/clientpositive/parquet_types.q.out 275897c > > Diff: https://reviews.apache.org/r/28147/diff/ > > > Testing > ------- > > related UT and QL tests passed > > > Thanks, > > cheng xu > >