[ 
https://issues.apache.org/jira/browse/HIVE-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13790782#comment-13790782
 ] 

Sushanth Sowmyan commented on HIVE-5504:
----------------------------------------

Looking through code, this is not directly a HCat issue as much as it is an Orc 
issue. It looks like OrcOutputFormat only instantiates the appropriate 
compression override if instantiated from within getHiveRecordWriter as opposed 
to getRecordWriter, so this support would need to be added to orc to make it 
work from outside hive.

{code}
100   public RecordWriter<NullWritable, OrcSerdeRow>
101       getRecordWriter(FileSystem fileSystem, JobConf conf, String name,
102                       Progressable reporter) throws IOException {
103     return new
104       OrcRecordWriter(new Path(name), OrcFile.writerOptions(conf));
105   }
{code}

versus:

{code}
108   public FileSinkOperator.RecordWriter
109      getHiveRecordWriter(JobConf conf,
110                          Path path,
111                          Class<? extends Writable> valueClass,
112                          boolean isCompressed,
113                          Properties tableProperties,
114                          Progressable reporter) throws IOException {
115     OrcFile.WriterOptions options = OrcFile.writerOptions(conf);
116     if (tableProperties.containsKey(OrcFile.STRIPE_SIZE)) {
117       options.stripeSize(Long.parseLong
118                            
(tableProperties.getProperty(OrcFile.STRIPE_SIZE)));
119     }
120 
121     if (tableProperties.containsKey(OrcFile.COMPRESSION)) {
122       options.compress(CompressionKind.valueOf
123                            
(tableProperties.getProperty(OrcFile.COMPRESSION)));
124     }
125 
126     if (tableProperties.containsKey(OrcFile.COMPRESSION_BLOCK_SIZE)) {
127       options.bufferSize(Integer.parseInt
128                          (tableProperties.getProperty
129                             (OrcFile.COMPRESSION_BLOCK_SIZE)));
130     }
131 
132     if (tableProperties.containsKey(OrcFile.ROW_INDEX_STRIDE)) {
133       options.rowIndexStride(Integer.parseInt
134                              (tableProperties.getProperty
135                               (OrcFile.ROW_INDEX_STRIDE)));
136     }
137 
138     if (tableProperties.containsKey(OrcFile.ENABLE_INDEXES)) {
139       if ("false".equals(tableProperties.getProperty
140                          (OrcFile.ENABLE_INDEXES))) {
141         options.rowIndexStride(0);
142       }
143     }
144 
145     if (tableProperties.containsKey(OrcFile.BLOCK_PADDING)) {
146       options.blockPadding(Boolean.parseBoolean
147                            (tableProperties.getProperty
148                             (OrcFile.BLOCK_PADDING)));
149     }
150 
151     return new OrcRecordWriter(path, options);
152   }
{code}

The fix can be a simple (but not trivial) fix, as a change to 
OrcFile.writerOptions, to also take in overrides from conf.

> HCatOutputFormat does not honor orc.compress tblproperty
> --------------------------------------------------------
>
>                 Key: HIVE-5504
>                 URL: https://issues.apache.org/jira/browse/HIVE-5504
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>    Affects Versions: 0.11.0, 0.12.0
>            Reporter: Venkat Ranganathan
>
> When we import data into a HCatalog table created with the following storage  
> description
> .. stored as orc tblproperties ("orc.compress"="SNAPPY") 
> the resultant orc file still uses the default zlib compression
> It looks like HCatOutputFormat is ignoring the tblproperties specified.   
> show tblproperties shows that the table indeed has the properties properly 
> saved.
> An insert/select into the table has the resulting orc file honor the tbl 
> property.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to