I haven't been able to find an explicit reference, hoping some one can clarify for me:
Do storage handler reads/write get executed as parallel resources, i.e., in an INSERT...SELECT... from a storage handler, will multiple storage handler instances be created to read from the data source (using partitioning or some other scheme) ? Likewise, will INSERT into a storage handler be executed using multiple streams ? FYI: I need to stream data into/out of Hive from/to parallel-efficient data sources, and would prefer to avoid landing everything in HDFS 1st, esp if the ultimate Hive file format is ORC, i.e, avoid multiple file copies, esp when moving terabytes between data sources and sinks. The storage handler mechanism seems a very elegant solution *if* it supports true parallel stream operations. TIA, Dean
