Hive Hcatalog Streaming. Why hive table must be bucketed?

Igor Kuzmenko Fri, 08 Apr 2016 02:36:07 -0700

Hello I've got few questions about Hive HCatalog streaming
<https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest>.
This feature has requirement:
"*The Hive table must be bucketed
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables>,
but not sorted. So something like “clustered by (colName) into 10 **buckets”
must be specified during table creation. The number of buckets is ideally
the same as the number of streaming writers*."


1) I wonder why it is required condition of streaming?
2) How many buckets should I create, when number of streaming writers
changes over time (for example from 1 to 10)?

Hive Hcatalog Streaming. Why hive table must be bucketed?

Reply via email to