I'm looking for help or workaround ideas for a hive bug. I know this is the Phoenix mailing list, but this issue has to do with getting data from hive into phoenix, and I'm hoping someone might have some ideas.
Basically: in order to use the CsvBulkExport tool, I take my source data table (compressed orc file based) and create a copy that is text file based: create external table... stored as textfile... What I have found recently is that if my data is sufficiently large, some of the data gets corrupted. A query against the copied table will show several null values where the same query against the source table will show good data. I opened a case w/ Hortonworks for this, but that doesn't really get me anywhere these days. Has anyone else encountered this issue? Any ideas of a workaround? Is there anyone out there importing billions of rows into HBase on a regular/daily basis? We can't be the only ones...