[ https://issues.apache.org/jira/browse/HIVE-19205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438030#comment-16438030 ]
Prasanth Jayachandran commented on HIVE-19205: ---------------------------------------------- [~vgarg] Can this be included in 3.0.0 release? > Hive streaming ingest improvements (v2) > --------------------------------------- > > Key: HIVE-19205 > URL: https://issues.apache.org/jira/browse/HIVE-19205 > Project: Hive > Issue Type: Improvement > Components: Streaming > Affects Versions: 3.0.0, 3.1.0 > Reporter: Prasanth Jayachandran > Assignee: Prasanth Jayachandran > Priority: Major > > This is umbrella jira to track hive streaming ingest improvements. At a high > level following are the improvements > - Support for dynamic partitioning > - API changes (simple streaming connection builder) > - Hide the transaction batches from clients (client can tune the transaction > batch but doesn't have to know about the transaction batch size) > - Support auto rollover to next transaction batch (clients don't have to > worry about closing a transaction batch and opening a new one) > - Record writers will all be strict meaning the schema of the record has to > match table schema. This is to avoid the multiple > serialization/deserialization for re-ordering columns if there is schema > mismatch > - Automatic distribution for non-bucketed tables so that compactor can have > more parallelism > - Create delta files with all ORC overhead disabled (no index, no > compression, no dictionary). Compactor will recreate the orc files with > index, compression and dictionary encoding. > - Automatic memory management via auto-flushing (will yield smaller stripes > for delta files but is more scalable and clients don't have to worry about > distributing the data across writers) > - Support for more writers (Avro specifically. ORC passthrough format?) > - Support to accept input stream instead of record byte[] > - Removing HCatalog dependency (old streaming API will be in the hcatalog > package for backward compatibility, new streaming API will be in its own hive > module) -- This message was sent by Atlassian JIRA (v7.6.3#76005)