Hi,

We have data in Orc formatted table, we filter certain records and then create 
an Avro format hive table using the "insert into" clause.

Our use case is to create smaller avro data files in a hive table that can be 
passed on to consumers as a Kafka Message.
Can we restrict the file size in an avro backed hive table while we execute the 
insert into command.

One solution we had was to use clustered by, but since the number of 
records/size is not known beforehand it becomes difficult to create the number 
of buckets.

Anything else we can try to restrict this?

Reply via email to