I haven't been able to find an explicit reference, hoping some one can
clarify for me:

Do storage handler reads/write get executed as parallel resources, i.e., in
an INSERT...SELECT... from a storage handler, will multiple storage handler
instances be created to read from the data source (using partitioning or
some other scheme) ?

Likewise, will INSERT into a storage handler be executed using multiple
streams ?

FYI: I need to stream data into/out of Hive from/to parallel-efficient data
sources, and would prefer to avoid landing everything in HDFS 1st, esp if
the ultimate Hive file format is ORC, i.e, avoid multiple file copies, esp
when moving terabytes between data sources and sinks. The storage handler
mechanism seems a very elegant solution *if* it supports true parallel
stream operations.

TIA,
Dean

Reply via email to