Review Request: Adding two interfaces for schema-aware codecs and their invocations from appropriate places in rcfile
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3051/ --- Review request for hive. Summary --- Introduces interfaces for schema-aware codecs. Actual implementations not part of this patch. One specific implementation will be added by HIVE-2604. This addresses bug HIVE-2600. https://issues.apache.org/jira/browse/HIVE-2600 Diffs - ql/src/java/org/apache/hadoop/hive/ql/io/SchemaAwareCompressionInputStream.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/SchemaAwareCompressionOutputStream.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java b15fdb8 Diff: https://reviews.apache.org/r/3051/diff Testing --- Thanks, Krishna
Review Request: Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3075/ --- Review request for hive. Summary --- Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies - gaps - supports only certain complex types - stats This addresses bug HIVE-2604. https://issues.apache.org/jira/browse/HIVE-2604 Diffs - contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/compressors/DummyIntegerCompressor.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/dsalg/Tuple.java PRE-CREATION contrib/src/test/queries/clientpositive/ubercompressor.q PRE-CREATION contrib/src/test/results/clientpositive/ubercompressor.q.out PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorUtils.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorColumnConfig.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorConfig.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorSerde.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionOutputStream.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionInputStream.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/InputReader.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/OutputWriter.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/TypeSpecificCompressor.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionCodec.java PRE-CREATION Diff: https://reviews.apache.org/r/3075/diff Testing --- test added Thanks, Krishna
Re: Review Request: Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3075/ --- (Updated 2011-12-17 10:41:45.367761) Review request for hive and Yongqiang He. Changes --- Closed the two gaps - support for arbitrary types, and stats Summary --- Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies - gaps - supports only certain complex types - stats This addresses bug HIVE-2604. https://issues.apache.org/jira/browse/HIVE-2604 Diffs (updated) - contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/InputReader.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/OutputWriter.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/TypeSpecificCompressor.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionCodec.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionInputStream.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionOutputStream.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorColumnConfig.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorConfig.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorSerde.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorSerdeField.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorUtils.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/compressors/DummyIntegerCompressor.java PRE-CREATION contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/dsalg/Tuple.java PRE-CREATION contrib/src/test/queries/clientpositive/ubercompressor.q PRE-CREATION contrib/src/test/results/clientpositive/ubercompressor.q.out PRE-CREATION Diff: https://reviews.apache.org/r/3075/diff Testing --- test added Thanks, Krishna
Review Request: Patch #4 For Hive 1918 - Export / Import
/exim_17_part_managed.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_18_part_external.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_19_part_external_location.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_20_part_managed_location.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_21_export_authsuccess.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_22_import_exist_authsuccess.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_23_import_part_authsuccess.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_24_import_nonexist_authsuccess.q.out PRE-CREATION Diff: https://reviews.apache.org/r/430/diff Testing --- Tests added Thanks, Krishna
Re: Review Request: Patch For Hive 1918 - Export / Import
/results/clientpositive/exim_17_part_managed.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_18_part_external.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_19_part_external_location.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_20_part_managed_location.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_21_export_authsuccess.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_22_import_exist_authsuccess.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_23_import_part_authsuccess.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_24_import_nonexist_authsuccess.q.out PRE-CREATION Diff: https://reviews.apache.org/r/430/diff Testing --- Tests added Thanks, Krishna
Re: Review Request: Patch For Hive 1918 - Export / Import
/results/clientpositive/exim_17_part_managed.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_18_part_external.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_19_part_external_location.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_20_part_managed_location.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_21_export_authsuccess.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_22_import_exist_authsuccess.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_23_import_part_authsuccess.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_24_import_nonexist_authsuccess.q.out PRE-CREATION Diff: https://reviews.apache.org/r/430/diff Testing --- Tests added Thanks, Krishna
Re: Review Request: Patch For Hive 1918 - Export / Import
/exim_15_external_part.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_16_part_external.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_17_part_managed.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_18_part_external.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_19_part_external_location.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_20_part_managed_location.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_21_export_authsuccess.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_22_import_exist_authsuccess.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_23_import_part_authsuccess.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_24_import_nonexist_authsuccess.q.out PRE-CREATION Diff: https://reviews.apache.org/r/430/diff Testing --- Tests added Thanks, Krishna
Re: Review Request: Patch For Hive 1918 - Export / Import
/exim_16_part_external.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_17_part_managed.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_18_part_external.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_19_part_external_location.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_20_part_managed_location.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_21_export_authsuccess.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_22_import_exist_authsuccess.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_23_import_part_authsuccess.q.out PRE-CREATION ql/src/test/results/clientpositive/exim_24_import_nonexist_authsuccess.q.out PRE-CREATION Diff: https://reviews.apache.org/r/430/diff Testing --- Tests added Thanks, Krishna
Review Request: Patch for HIVE-2003: Load analysis should add table/partition to the outputs
4fb79dd ql/src/test/results/clientpositive/smb_mapjoin_7.q.out 19dcd1c ql/src/test/results/clientpositive/smb_mapjoin_8.q.out 5e545a4 ql/src/test/results/clientpositive/stats11.q.out d5c03c2 ql/src/test/results/clientpositive/stats3.q.out fbfa58c ql/src/test/results/clientpositive/udaf_context_ngrams.q.out 577cc71 ql/src/test/results/clientpositive/udaf_corr.q.out 58f143e ql/src/test/results/clientpositive/udaf_covar_pop.q.out 634875c ql/src/test/results/clientpositive/udaf_covar_samp.q.out 7fd2527 ql/src/test/results/clientpositive/udaf_ngrams.q.out 2255a8e ql/src/test/results/clientpositive/udf_field.q.out f3914d2 ql/src/test/results/clientpositive/udf_length.q.out 410a0a7 ql/src/test/results/clientpositive/udf_reverse.q.out e21e2f2 ql/src/test/results/clientpositive/uniquejoin.q.out a026db3 ql/src/test/results/compiler/plan/cast1.q.xml a7bb943 ql/src/test/results/compiler/plan/groupby1.q.xml 92cb203 ql/src/test/results/compiler/plan/groupby2.q.xml 0d93feb ql/src/test/results/compiler/plan/groupby3.q.xml a267968 ql/src/test/results/compiler/plan/groupby4.q.xml c33d459 ql/src/test/results/compiler/plan/groupby5.q.xml 0f18322 ql/src/test/results/compiler/plan/groupby6.q.xml 251dc11 ql/src/test/results/compiler/plan/input1.q.xml 1e085be ql/src/test/results/compiler/plan/input2.q.xml 509c2ef ql/src/test/results/compiler/plan/input20.q.xml 80365fe ql/src/test/results/compiler/plan/input3.q.xml 240bf4f ql/src/test/results/compiler/plan/input4.q.xml e149c30 ql/src/test/results/compiler/plan/input5.q.xml 7e8b3b6 ql/src/test/results/compiler/plan/input6.q.xml b1ac912 ql/src/test/results/compiler/plan/input7.q.xml a7ef270 ql/src/test/results/compiler/plan/input8.q.xml e793db1 ql/src/test/results/compiler/plan/input9.q.xml 53b6ab1 ql/src/test/results/compiler/plan/input_part1.q.xml e598c31 ql/src/test/results/compiler/plan/input_testsequencefile.q.xml 098e81a ql/src/test/results/compiler/plan/input_testxpath.q.xml 687c3f2 ql/src/test/results/compiler/plan/input_testxpath2.q.xml d1c715a ql/src/test/results/compiler/plan/join1.q.xml 535aea4 ql/src/test/results/compiler/plan/join2.q.xml c558556 ql/src/test/results/compiler/plan/join3.q.xml deb278e ql/src/test/results/compiler/plan/join4.q.xml 7227624 ql/src/test/results/compiler/plan/join5.q.xml 08a456c ql/src/test/results/compiler/plan/join6.q.xml 1f49fe2 ql/src/test/results/compiler/plan/join7.q.xml 19815fd ql/src/test/results/compiler/plan/join8.q.xml c13ca3a ql/src/test/results/compiler/plan/sample1.q.xml a53f4e6 ql/src/test/results/compiler/plan/sample2.q.xml 10775d5 ql/src/test/results/compiler/plan/sample3.q.xml 38d0d98 ql/src/test/results/compiler/plan/sample4.q.xml 8d67192 ql/src/test/results/compiler/plan/sample5.q.xml 939b852 ql/src/test/results/compiler/plan/sample6.q.xml e9f9b57 ql/src/test/results/compiler/plan/sample7.q.xml 6e3e01a ql/src/test/results/compiler/plan/subq.q.xml 1fda353 ql/src/test/results/compiler/plan/udf1.q.xml 6931b8a ql/src/test/results/compiler/plan/udf4.q.xml 2e167aa ql/src/test/results/compiler/plan/udf6.q.xml 286884a ql/src/test/results/compiler/plan/udf_case.q.xml 5b73066 ql/src/test/results/compiler/plan/udf_when.q.xml 40dfca6 ql/src/test/results/compiler/plan/union.q.xml 4bc7d89 ql/src/test/templates/TestParse.vm cf860ac ql/src/test/templates/TestParseNegative.vm 48a0031 Diff: https://reviews.apache.org/r/518/diff Testing --- Tests added. Authentication failures are now possible now that outputs are set properly. Thanks, Krishna
Review Request: Fixes for (a) removing redundant synchronized (b) calculating and writing the correct record length and (c) making the layout and the key/value classes actually sequencefile compliant
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/529/ --- Review request for hive and Yongqiang He. Summary --- Patch for HIVE-2065 This addresses bug HIVE-2065. https://issues.apache.org/jira/browse/HIVE-2065 Diffs - build-common.xml 9f21a69 data/files/test_v6_compressed.rc PRE-CREATION data/files/test_v6_uncompressed.rc PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java eb5305b ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java 20d1f4e ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java f7eacdc ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java bb1e3c9 ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 8bb6f3a ql/src/test/results/clientpositive/alter_merge.q.out 25f36c0 ql/src/test/results/clientpositive/alter_merge_stats.q.out 243f7cc ql/src/test/results/clientpositive/partition_wise_fileformat.q.out cee2e72 ql/src/test/results/clientpositive/partition_wise_fileformat3.q.out 067ab43 ql/src/test/results/clientpositive/sample10.q.out 50406c3 Diff: https://reviews.apache.org/r/529/diff Testing --- Tests added, existing tests updated Thanks, Krishna
Re: Review Request: Fixes for (a) removing redundant synchronized (b) calculating and writing the correct record length and (c) making the layout and the key/value classes actually sequencefile compli
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/529/ --- (Updated 2011-04-06 17:13:30.910168) Review request for hive and Yongqiang He. Changes --- Updated patch where sequence file compliance is not addressed but the other two issues are. Summary --- Patch for HIVE-2065 This addresses bug HIVE-2065. https://issues.apache.org/jira/browse/HIVE-2065 Diffs (updated) - build-common.xml 9f21a69 data/files/test_v6dot0_compressed.rc PRE-CREATION data/files/test_v6dot0_uncompressed.rc PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java eb5305b ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java 20d1f4e ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java f7eacdc ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java bb1e3c9 ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 8bb6f3a ql/src/test/results/clientpositive/alter_merge.q.out 25f36c0 ql/src/test/results/clientpositive/alter_merge_stats.q.out 243f7cc ql/src/test/results/clientpositive/partition_wise_fileformat.q.out cee2e72 ql/src/test/results/clientpositive/partition_wise_fileformat3.q.out 067ab43 ql/src/test/results/clientpositive/sample10.q.out 50406c3 Diff: https://reviews.apache.org/r/529/diff Testing --- Tests added, existing tests updated Thanks, Krishna
Review Request: Add LazyBinaryColumnarSerDe
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/806/ --- Review request for hive and Yongqiang He. Summary --- Add LazyBinaryColumnarSerDe This addresses bug HIVE-956. https://issues.apache.org/jira/browse/HIVE-956 Diffs - ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 77a6dc6 serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStruct.java b062460 serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStructBase.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarSerDe.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarStruct.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java e927547 serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObject.java 2e2896c serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObjectBase.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryFactory.java 1440472 serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryObject.java ea20b34 serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java 5e6bb0a serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ColumnarStructObjectInspector.java 66f4f8d serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/CrossMapEqualComparer.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/MapEqualComparer.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java 90561a1 serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java 2b77072 serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/SimpleMapEqualComparer.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/columnar/TestLazyBinaryColumnarSerDe.java PRE-CREATION Diff: https://reviews.apache.org/r/806/diff Testing --- Tests added Thanks, Krishna
Re: Review Request: Add LazyBinaryColumnarSerDe
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/806/ --- (Updated 2011-06-02 12:00:23.653491) Review request for hive and Yongqiang He. Changes --- Uses a special marker for empty strings, thereby incurring no additional cost for normal (non-null, non-empty) strings. Summary --- Add LazyBinaryColumnarSerDe This addresses bug HIVE-956. https://issues.apache.org/jira/browse/HIVE-956 Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 77a6dc6 serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStruct.java b062460 serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStructBase.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarSerDe.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarStruct.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java e927547 serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObject.java 2e2896c serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObjectBase.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryFactory.java 1440472 serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryObject.java ea20b34 serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java 5e6bb0a serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ColumnarStructObjectInspector.java 66f4f8d serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/CrossMapEqualComparer.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/MapEqualComparer.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java 90561a1 serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java 2b77072 serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/SimpleMapEqualComparer.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/columnar/TestLazyBinaryColumnarSerDe.java PRE-CREATION Diff: https://reviews.apache.org/r/806/diff Testing --- Tests added Thanks, Krishna
Re: Review Request: Add LazyBinaryColumnarSerDe
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/806/ --- (Updated 2011-06-08 16:04:08.811137) Review request for hive and Yongqiang He. Changes --- Updating review comments re toString() Summary --- Add LazyBinaryColumnarSerDe This addresses bug HIVE-956. https://issues.apache.org/jira/browse/HIVE-956 Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 77a6dc6 serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStruct.java e79021d serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStructBase.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarSerDe.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarStruct.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java e927547 serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObject.java 2e2896c serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObjectBase.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryFactory.java 1440472 serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryObject.java ea20b34 serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java 4285ab3 serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ColumnarStructObjectInspector.java 66f4f8d serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/CrossMapEqualComparer.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/MapEqualComparer.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java 90561a1 serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java 2b77072 serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/SimpleMapEqualComparer.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/columnar/TestLazyBinaryColumnarSerDe.java PRE-CREATION Diff: https://reviews.apache.org/r/806/diff Testing --- Tests added Thanks, Krishna
Review Request: Patch for Hive-2209, extending ObjectInspectorUtils.compare with some map comparison implementations
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/879/ --- Review request for hive and Yongqiang He. Summary --- Patch for HIVE-2209 Diffs - serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/CrossMapEqualComparer.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/MapEqualComparer.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java 2b77072 serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/SimpleMapEqualComparer.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestCrossMapEqualcomparer.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestSimpleMapEqualcomparer.java PRE-CREATION Diff: https://reviews.apache.org/r/879/diff Testing --- Tests added Thanks, Krishna
Re: Review Request: Patch for Hive-2209, extending ObjectInspectorUtils.compare with some map comparison implementations
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/879/ --- (Updated 2011-06-17 07:52:38.058921) Review request for hive and Yongqiang He. Changes --- Added a complete compare implementation too, with sorting of the keys Summary --- Patch for HIVE-2209 Diffs (updated) - serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/CrossMapEqualComparer.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/FullMapEqualComparer.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/MapEqualComparer.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java 2b77072 serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/SimpleMapEqualComparer.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestCrossMapEqualcomparer.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestFullMapEqualcomparer.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestSimpleMapEqualcomparer.java PRE-CREATION Diff: https://reviews.apache.org/r/879/diff Testing --- Tests added Thanks, Krishna
Re: Review Request: Patch for Hive-2209, extending ObjectInspectorUtils.compare with some map comparison implementations
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/879/ --- (Updated 2011-06-20 12:54:09.245202) Review request for hive and Yongqiang He. Changes --- Fixed a lowercase/uppercase typo in the test classes Summary --- Patch for HIVE-2209 Diffs (updated) - serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/CrossMapEqualComparer.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/FullMapEqualComparer.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/MapEqualComparer.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java 2b77072 serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/SimpleMapEqualComparer.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestCrossMapEqualComparer.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestFullMapEqualComparer.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestSimpleMapEqualComparer.java PRE-CREATION Diff: https://reviews.apache.org/r/879/diff Testing --- Tests added Thanks, Krishna
Re: Review Request: Add LazyBinaryColumnarSerDe
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/806/ --- (Updated 2011-06-20 12:56:38.943799) Review request for hive and Yongqiang He. Changes --- After separating out mapcomparer changes to its own patch Summary --- Add LazyBinaryColumnarSerDe This addresses bug HIVE-956. https://issues.apache.org/jira/browse/HIVE-956 Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 77a6dc6 serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStruct.java e79021d serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStructBase.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarSerDe.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarStruct.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java e927547 serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObject.java 2e2896c serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObjectBase.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryFactory.java 1440472 serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryObject.java ea20b34 serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java 4285ab3 serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ColumnarStructObjectInspector.java 66f4f8d serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java 90561a1 serde/src/test/org/apache/hadoop/hive/serde2/columnar/TestLazyBinaryColumnarSerDe.java PRE-CREATION Diff: https://reviews.apache.org/r/806/diff Testing --- Tests added Thanks, Krishna
Re: Review Request: Patch for Hive-2209, extending ObjectInspectorUtils.compare with some map comparison implementations
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/879/ --- (Updated 2011-07-20 02:25:36.169590) Review request for hive and Yongqiang He. Summary --- Patch for HIVE-2209 This addresses bug HIVE-2209. https://issues.apache.org/jira/browse/HIVE-2209 Diffs - serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/CrossMapEqualComparer.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/FullMapEqualComparer.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/MapEqualComparer.java PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java 2b77072 serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/SimpleMapEqualComparer.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestCrossMapEqualComparer.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestFullMapEqualComparer.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestSimpleMapEqualComparer.java PRE-CREATION Diff: https://reviews.apache.org/r/879/diff Testing --- Tests added Thanks, Krishna
HIVE-4053 | Review request
Hi, I've implemented 'Refined Soundex' algorithm using a GenericUDF and would like to share it for a review by experts as I'm a newbie. Change Details: A new java class is created: GenericUDFRefinedSoundex.java Add a entry to FunctionRegistry.java: registerGenericUDF("soundex_ref", GenericUDFRefinedSoundex.class); Both files are attached to the email. I'm planning to implement other phonetic algorithms and submit all as a single patch. I understand there are many other steps that I need to finish before a patch is ready but for now, if you could review the attached code and provide feedback, it'll be great. Here are the details of Refined Soundex algorithm: First letter is stored Subsequent letters are replaced by numbers as defined below- * B, P => 1 * F, V => 2 * C, K, S => 3 * G, J => 4 * Q, X, Z => 5 * D, T => 6 * L => 7 * M, N => 8 * R => 9 * Other letters => 0 Consecutive letters belonging to the same group are replaced by one letter Example: > SELECT soundex_ref('Carren') FROM src LIMIT 1; > C30908 Thanks, Krishna
dev-ow...@hive.apache.org.
Can Hive handles Unstructured data o it handles only structured data? Please confirm Thanks Mohan
Re: dev-ow...@hive.apache.org.
Thanks alan for the answer/ So, can i conclude that Hive handles unstructured data? On Thu, Dec 4, 2014 at 10:06 PM, Alan Gates wrote: > Define unstructured. Hive can handle data such Avro or JSON, which I > would call self-structured. I believe the SerDes for these types can even > set the schema for the table or partition you are reading based on the data > in the file. > > Alan. > > Mohan Krishna > December 3, 2014 at 17:01 > Can Hive handles Unstructured data o it handles only structured data? > Please confirm > > > Thanks > Mohan > > > -- > Sent with Postbox <http://www.getpostbox.com> > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity > to which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You.
Re: dev-ow...@hive.apache.org.
Thankyou Bill Now it is clear for me, Thanks On Fri, Dec 5, 2014 at 12:54 AM, Bill Busch wrote: > Mohan, > > It will handle it, but it is probably (depending on your use case) not > optimal. Hive's sweat spot is structured data. > > Bill > > Thank You, > Follow me on @BigData73 > > - > Bill Busch | SSA | Enterprise Information Solutions CWP > m: 704.806.2485 | NASDAQ: PRFT | Perficient.com > > > > > BI/DW | Advanced Analytics | Big Data | ECI| EPM | MDM > > -Original Message- > From: Mohan Krishna [mailto:mohan.25fe...@gmail.com] > Sent: Thursday, December 04, 2014 1:09 PM > To: dev@hive.apache.org > Subject: Re: dev-ow...@hive.apache.org. > > Thanks alan for the answer/ > So, can i conclude that Hive handles unstructured data? > > > On Thu, Dec 4, 2014 at 10:06 PM, Alan Gates wrote: > > > Define unstructured. Hive can handle data such Avro or JSON, which I > > would call self-structured. I believe the SerDes for these types can > > even set the schema for the table or partition you are reading based > > on the data in the file. > > > > Alan. > > > > Mohan Krishna December 3, 2014 at 17:01 > > Can Hive handles Unstructured data o it handles only structured data? > > Please confirm > > > > > > Thanks > > Mohan > > > > > > -- > > Sent with Postbox <http://www.getpostbox.com> > > > > CONFIDENTIALITY NOTICE > > NOTICE: This message is intended for the use of the individual or > > entity to which it is addressed and may contain information that is > > confidential, privileged and exempt from disclosure under applicable > > law. If the reader of this message is not the intended recipient, you > > are hereby notified that any printing, copying, dissemination, > > distribution, disclosure or forwarding of this communication is > > strictly prohibited. If you have received this communication in error, > > please contact the sender immediately and delete it from your system. > Thank You. >
Re: [ANNOUNCE] New Hive PMC Member - Prasad Mujumdar
Congrats Prasad On Wed, Dec 10, 2014 at 3:47 AM, Carl Steinbach wrote: > I am pleased to announce that Prasad Mujumdar has been elected to the Hive > Project Management Committee. Please join me in congratulating Prasad! > > Thanks. > > - Carl >
What more Hive can do when compared to PIG
*Hello all* *Can somebody help me in getting the answer for the below question* *Its regarding PIG vs HIVE:* We knew that PIG for large data sets analysis and Hive is good at data summrization and adhoc queries. But,I want to know , an usecase where Hive can handle it and the same can not be acheived with PIG I mean to say, what more a HIve query can achieve when the same is not possible with PIG latin script if possible i want to know the viceversa case as well Thanks Mohan 469-274-5677
Some queries re locking
Hello, While looking into some of the tangential issues encountered while doing the export/import related work, I have some questions: 1. Should "CREATE TABLE" lock (shared) the database? I think so from the discussions, but I do not think it happens now. 2. Similarly "LOAD" should also lock (exclusive) the table/partition by adding the table/partition to the outputs. 3. While trying a fix for the above, I ran into another issue. IIUC, Test[Negative]CliDriver templates starts the zkcluster via QTestUtil ctor, but this is immediately shutdown via cleanup->teardown call, so most of the create/loads in createSources run without a zookeeper server, so any attempt to lock errors out. Is this by intent? Cheers Krishna
RCFile - some queries
Hello, I was looking into the RCFile format, esp when used with compression; a picture of the file layout as I understand it in this case is attached. Some queries/potential issues: 1. RCFile makes a claim of being sequence file compatible; but the recordLength is not the actual on-disk length of the record. As shown in the picture, it is the uncompressed key length plus the compressed value length. Similarly, the next field - key length - is not the on-disk length of the compressed key. 2. Record Length is also used for seeking on the inputstream. See Reader.seekToNextKeyBuffer(). Since record length is overstated for compressed records, this can result in incorrect positioning. 3. Thread-Safety: Is the RCFile.Reader class meant to be thread-safe? Some public methods are marked synchronized which gives that appearance but there are a few thread-safety issues I think. 3.1 Other public methods, such as Reader.nextBlock() are not synchronized which operate on the same data structures. 3.2. Callbacks such as LazyDecompressionCallbackImpl.decompress operates on the valuebuffer currentValue, which can be simultaneously modified by the public methods on the Reader. Cheers, Krishna
Re: RCFile - some queries
Hi yongqiang he, Have created a bug https://issues.apache.org/jira/browse/HIVE-2065 to carry on the discussion. Have attached the picture there too: https://issues.apache.org/jira/secure/attachment/12474055/Slide1.png. (looks like attachments are stripped from posts here?) Please comment there. Cheers, Krishna On 3/18/11 11:47 PM, "yongqiang he" wrote: >> but the recordLength is not the actual on-disk length of the record. It is acutal on-disk length. It is compressed key length plus the compressed value length >>Similarly, the next field - key length - is not the on-disk length of the >>compressed key. There are two keyLengths, one is compressed key length, the other is uncompressed keyLength For 2, it wo't be a problem. record length is compressed length >>Thread-Safety. It is not thread safe. Application should do it themselves. It is initially designed for Hive. Thread safety is there at first time, and then removed because Hive does not need that, and 'synchronized' may need extra overhead >>3.1 Reader.nextBlock() is later added for file merge. So the normal reader should not use this method. >>3.2. True. On Fri, Mar 18, 2011 at 8:30 AM, Krishna Kumar wrote: > Hello, > >I was looking into the RCFile format, esp when used with compression; a > picture of the file layout as I understand it in this case is attached. > >Some queries/potential issues: > >1. RCFile makes a claim of being sequence file compatible; but the > recordLength is not the actual on-disk length of the record. As shown in the > picture, it is the uncompressed key length plus the compressed value length. > Similarly, the next field - key length - is not the on-disk length of the > compressed key. > >2. Record Length is also used for seeking on the inputstream. See > Reader.seekToNextKeyBuffer(). Since record length is overstated for > compressed records, this can result in incorrect positioning. > >3. Thread-Safety: Is the RCFile.Reader class meant to be thread-safe? > Some public methods are marked synchronized which gives that appearance but > there are a few thread-safety issues I think. > >3.1 Other public methods, such as Reader.nextBlock() are not > synchronized which operate on the same data structures. > >3.2. Callbacks such as LazyDecompressionCallbackImpl.decompress > operates on the valuebuffer currentValue, which can be simultaneously > modified by the public methods on the Reader. > > Cheers, > Krishna > >
Re Stats Publishing /Aggregation
Any reason why persistent stores such as jdbc and hbase are supported for temporary stats storage IIUC, but hadoop counters were not used for the tasks to 'publish' their stats for the aggregation task to pick it up from? Cheers, Krishna
[jira] [Commented] (HIVE-4053) Add support for phonetic algorithms in Hive
[ https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583819#comment-13583819 ] Krishna commented on HIVE-4053: --- Soundex: http://en.wikipedia.org/wiki/Soundex Daitch-Mokotoff Soundex: http://en.wikipedia.org/wiki/Daitch%E2%80%93Mokotoff_Soundex New York State Identification and Intelligence System (NYSIIS): http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System Caverphone: http://en.wikipedia.org/wiki/Caverphone > Add support for phonetic algorithms in Hive > --- > > Key: HIVE-4053 > URL: https://issues.apache.org/jira/browse/HIVE-4053 > Project: Hive > Issue Type: New Feature > Components: UDF >Affects Versions: 0.10.0 >Reporter: Krishna > > Following phonetic algorithms should be considered, which are very useful in > search: > Soundex > Daitch–Mokotoff Soundex > Metaphone and Double Metaphone > New York State Identification and Intelligence System (NYSIIS) > Caverphone -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4053) Add support for phonetic algorithms in Hive
[ https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna updated HIVE-4053: -- Affects Version/s: (was: 0.10.0) > Add support for phonetic algorithms in Hive > --- > > Key: HIVE-4053 > URL: https://issues.apache.org/jira/browse/HIVE-4053 > Project: Hive > Issue Type: New Feature > Components: UDF > Reporter: Krishna > > Following phonetic algorithms should be considered, which are very useful in > search: > Soundex > Daitch–Mokotoff Soundex > Metaphone and Double Metaphone > New York State Identification and Intelligence System (NYSIIS) > Caverphone -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4053) Add support for phonetic algorithms in Hive
[ https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna updated HIVE-4053: -- Description: Following phonetic algorithms should be considered, which are very useful in search: Soundex Refined Soundex Daitch–Mokotoff Soundex Metaphone and Double Metaphone New York State Identification and Intelligence System (NYSIIS) Caverphone was: Following phonetic algorithms should be considered, which are very useful in search: Soundex Daitch–Mokotoff Soundex Metaphone and Double Metaphone New York State Identification and Intelligence System (NYSIIS) Caverphone > Add support for phonetic algorithms in Hive > --- > > Key: HIVE-4053 > URL: https://issues.apache.org/jira/browse/HIVE-4053 > Project: Hive > Issue Type: New Feature > Components: UDF > Reporter: Krishna > > Following phonetic algorithms should be considered, which are very useful in > search: > Soundex > Refined Soundex > Daitch–Mokotoff Soundex > Metaphone and Double Metaphone > New York State Identification and Intelligence System (NYSIIS) > Caverphone -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4053) Add support for phonetic algorithms in Hive
[ https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585025#comment-13585025 ] Krishna commented on HIVE-4053: --- I've implemented 'Refined Soundex' algorithm using a GenericUDF and would like to share it for a review by experts as I'm a newbie. Change Details: A new java class is created: GenericUDFRefinedSoundex.java Add a entry to FunctionRegistry.java: registerGenericUDF("soundex_ref", GenericUDFRefinedSoundex.class); Both files are attached to the email. I'm planning to implement other phonetic algorithms and submit all as a single patch. I understand there are many other steps that I need to finish before a patch is ready but for now, if you could review the attached code and provide feedback, it'll be great. Here are the details of Refined Soundex algorithm: First letter is stored Subsequent letters are replaced by numbers as defined below- * B, P => 1 * F, V => 2 * C, K, S => 3 * G, J => 4 * Q, X, Z => 5 * D, T => 6 * L => 7 * M, N => 8 * R => 9 * Other letters => 0 Consecutive letters belonging to the same group are replaced by one letter Example: > SELECT soundex_ref('Carren') FROM src LIMIT 1; > C30908 > Add support for phonetic algorithms in Hive > --- > > Key: HIVE-4053 > URL: https://issues.apache.org/jira/browse/HIVE-4053 > Project: Hive > Issue Type: New Feature > Components: UDF >Reporter: Krishna > Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java > > > Following phonetic algorithms should be considered, which are very useful in > search: > Soundex > Refined Soundex > Daitch–Mokotoff Soundex > Metaphone and Double Metaphone > New York State Identification and Intelligence System (NYSIIS) > Caverphone -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4053) Add support for phonetic algorithms in Hive
[ https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna updated HIVE-4053: -- Attachment: GenericUDFRefinedSoundex.java FunctionRegistry.java > Add support for phonetic algorithms in Hive > --- > > Key: HIVE-4053 > URL: https://issues.apache.org/jira/browse/HIVE-4053 > Project: Hive > Issue Type: New Feature > Components: UDF > Reporter: Krishna > Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java > > > Following phonetic algorithms should be considered, which are very useful in > search: > Soundex > Refined Soundex > Daitch–Mokotoff Soundex > Metaphone and Double Metaphone > New York State Identification and Intelligence System (NYSIIS) > Caverphone -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4053) Add support for phonetic algorithms in Hive
[ https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna updated HIVE-4053: -- Attachment: HIVE-4053.1.patch.txt > Add support for phonetic algorithms in Hive > --- > > Key: HIVE-4053 > URL: https://issues.apache.org/jira/browse/HIVE-4053 > Project: Hive > Issue Type: New Feature > Components: UDF > Reporter: Krishna > Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java, > HIVE-4053.1.patch.txt > > > Following phonetic algorithms should be considered, which are very useful in > search: > Soundex > Refined Soundex > Daitch–Mokotoff Soundex > Metaphone and Double Metaphone > New York State Identification and Intelligence System (NYSIIS) > Caverphone -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4053) Add support for phonetic algorithms in Hive
[ https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586422#comment-13586422 ] Krishna commented on HIVE-4053: --- I've attached the patch to JIRA. How do I post it for review on reviewboard? > Add support for phonetic algorithms in Hive > --- > > Key: HIVE-4053 > URL: https://issues.apache.org/jira/browse/HIVE-4053 > Project: Hive > Issue Type: New Feature > Components: UDF >Reporter: Krishna > Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java, > HIVE-4053.1.patch.txt > > > Following phonetic algorithms should be considered, which are very useful in > search: > Soundex > Refined Soundex > Daitch–Mokotoff Soundex > Metaphone and Double Metaphone > New York State Identification and Intelligence System (NYSIIS) > Caverphone -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4053) Add support for phonetic algorithms in Hive
[ https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna updated HIVE-4053: -- Fix Version/s: 0.10.0 Labels: patch (was: ) Affects Version/s: 0.10.0 Release Note: Implementation of the phonetic algorithm - Refined Soundex Status: Patch Available (was: Open) > Add support for phonetic algorithms in Hive > --- > > Key: HIVE-4053 > URL: https://issues.apache.org/jira/browse/HIVE-4053 > Project: Hive > Issue Type: New Feature > Components: UDF >Affects Versions: 0.10.0 >Reporter: Krishna > Labels: patch > Fix For: 0.10.0 > > Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java, > HIVE-4053.1.patch.txt > > > Following phonetic algorithms should be considered, which are very useful in > search: > Soundex > Refined Soundex > Daitch–Mokotoff Soundex > Metaphone and Double Metaphone > New York State Identification and Intelligence System (NYSIIS) > Caverphone -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4053) Add support for phonetic algorithms in Hive
[ https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586499#comment-13586499 ] Krishna commented on HIVE-4053: --- I have submitted the patch; please review the code. > Add support for phonetic algorithms in Hive > --- > > Key: HIVE-4053 > URL: https://issues.apache.org/jira/browse/HIVE-4053 > Project: Hive > Issue Type: New Feature > Components: UDF >Affects Versions: 0.10.0 >Reporter: Krishna > Labels: patch > Fix For: 0.10.0 > > Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java, > HIVE-4053.1.patch.txt > > > Following phonetic algorithms should be considered, which are very useful in > search: > Soundex > Refined Soundex > Daitch–Mokotoff Soundex > Metaphone and Double Metaphone > New York State Identification and Intelligence System (NYSIIS) > Caverphone -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4053) Add support for phonetic algorithms in Hive
[ https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna updated HIVE-4053: -- Description: Following phonetic algorithms should be considered, which are very useful in search: Soundex Refined Soundex Daitch–Mokotoff Soundex Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone New York State Identification and Intelligence System (NYSIIS) Caverphone was: Following phonetic algorithms should be considered, which are very useful in search: Soundex Refined Soundex Daitch–Mokotoff Soundex Metaphone and Double Metaphone New York State Identification and Intelligence System (NYSIIS) Caverphone > Add support for phonetic algorithms in Hive > --- > > Key: HIVE-4053 > URL: https://issues.apache.org/jira/browse/HIVE-4053 > Project: Hive > Issue Type: New Feature > Components: UDF >Affects Versions: 0.10.0 >Reporter: Krishna > Labels: patch > Fix For: 0.10.0 > > Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java, > HIVE-4053.1.patch.txt > > > Following phonetic algorithms should be considered, which are very useful in > search: > Soundex > Refined Soundex > Daitch–Mokotoff Soundex > Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone > New York State Identification and Intelligence System (NYSIIS) > Caverphone -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4053) Add support for phonetic algorithms in Hive
[ https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna updated HIVE-4053: -- Description: Following phonetic algorithms should be considered, which are very useful in search: Soundex: ttp://en.wikipedia.org/wiki/Soundex Refined Soundex Daitch–Mokotoff Soundex Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone New York State Identification and Intelligence System (NYSIIS) Caverphone was: Following phonetic algorithms should be considered, which are very useful in search: Soundex Refined Soundex Daitch–Mokotoff Soundex Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone New York State Identification and Intelligence System (NYSIIS) Caverphone > Add support for phonetic algorithms in Hive > --- > > Key: HIVE-4053 > URL: https://issues.apache.org/jira/browse/HIVE-4053 > Project: Hive > Issue Type: New Feature > Components: UDF >Affects Versions: 0.10.0 >Reporter: Krishna > Labels: patch > Fix For: 0.10.0 > > Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java, > HIVE-4053.1.patch.txt > > > Following phonetic algorithms should be considered, which are very useful in > search: > Soundex: ttp://en.wikipedia.org/wiki/Soundex > Refined Soundex > Daitch–Mokotoff Soundex > Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone > New York State Identification and Intelligence System (NYSIIS) > Caverphone -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4053) Add support for phonetic algorithms in Hive
[ https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna updated HIVE-4053: -- Description: Following phonetic algorithms should be considered, which are very useful in search: Soundex: http://en.wikipedia.org/wiki/Soundex Refined Soundex Daitch–Mokotoff Soundex Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone New York State Identification and Intelligence System (NYSIIS): http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System Caverphone was: Following phonetic algorithms should be considered, which are very useful in search: Soundex: ttp://en.wikipedia.org/wiki/Soundex Refined Soundex Daitch–Mokotoff Soundex Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone New York State Identification and Intelligence System (NYSIIS) Caverphone > Add support for phonetic algorithms in Hive > --- > > Key: HIVE-4053 > URL: https://issues.apache.org/jira/browse/HIVE-4053 > Project: Hive > Issue Type: New Feature > Components: UDF >Affects Versions: 0.10.0 >Reporter: Krishna > Labels: patch > Fix For: 0.10.0 > > Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java, > HIVE-4053.1.patch.txt > > > Following phonetic algorithms should be considered, which are very useful in > search: > Soundex: http://en.wikipedia.org/wiki/Soundex > Refined Soundex > Daitch–Mokotoff Soundex > Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone > New York State Identification and Intelligence System (NYSIIS): > http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System > Caverphone -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4053) Add support for phonetic algorithms in Hive
[ https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna updated HIVE-4053: -- Description: Following phonetic algorithms should be considered, which are very useful in search: Soundex: http://en.wikipedia.org/wiki/Soundex Refined Soundex: Daitch–Mokotoff Soundex Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone New York State Identification and Intelligence System (NYSIIS): http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System Caverphone: http://en.wikipedia.org/wiki/Caverphone was: Following phonetic algorithms should be considered, which are very useful in search: Soundex: http://en.wikipedia.org/wiki/Soundex Refined Soundex Daitch–Mokotoff Soundex Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone New York State Identification and Intelligence System (NYSIIS): http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System Caverphone > Add support for phonetic algorithms in Hive > --- > > Key: HIVE-4053 > URL: https://issues.apache.org/jira/browse/HIVE-4053 > Project: Hive > Issue Type: New Feature > Components: UDF >Affects Versions: 0.10.0 >Reporter: Krishna > Labels: patch > Fix For: 0.10.0 > > Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java, > HIVE-4053.1.patch.txt > > > Following phonetic algorithms should be considered, which are very useful in > search: > Soundex: http://en.wikipedia.org/wiki/Soundex > Refined Soundex: > Daitch–Mokotoff Soundex > Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone > New York State Identification and Intelligence System (NYSIIS): > http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System > Caverphone: http://en.wikipedia.org/wiki/Caverphone -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4053) Add support for phonetic algorithms in Hive
[ https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna updated HIVE-4053: -- Description: Following phonetic algorithms should be considered, which are very useful in search: Soundex: http://en.wikipedia.org/wiki/Soundex Refined Soundex: Refer to the comment on 22/Feb/13 23:51 Daitch–Mokotoff Soundex Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone New York State Identification and Intelligence System (NYSIIS): http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System Caverphone: http://en.wikipedia.org/wiki/Caverphone was: Following phonetic algorithms should be considered, which are very useful in search: Soundex: http://en.wikipedia.org/wiki/Soundex Refined Soundex: Daitch–Mokotoff Soundex Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone New York State Identification and Intelligence System (NYSIIS): http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System Caverphone: http://en.wikipedia.org/wiki/Caverphone > Add support for phonetic algorithms in Hive > --- > > Key: HIVE-4053 > URL: https://issues.apache.org/jira/browse/HIVE-4053 > Project: Hive > Issue Type: New Feature > Components: UDF >Affects Versions: 0.10.0 >Reporter: Krishna > Labels: patch > Fix For: 0.10.0 > > Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java, > HIVE-4053.1.patch.txt > > > Following phonetic algorithms should be considered, which are very useful in > search: > Soundex: http://en.wikipedia.org/wiki/Soundex > Refined Soundex: Refer to the comment on 22/Feb/13 23:51 > Daitch–Mokotoff Soundex > Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone > New York State Identification and Intelligence System (NYSIIS): > http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System > Caverphone: http://en.wikipedia.org/wiki/Caverphone -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4053) Add support for phonetic algorithms in Hive
[ https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna updated HIVE-4053: -- Description: Following phonetic algorithms should be considered, which are very useful in search: Soundex: http://en.wikipedia.org/wiki/Soundex Refined Soundex: Refer to the comment on 22/Feb/13 23:51 Daitch–Mokotoff Soundex: http://en.wikipedia.org/wiki/Daitch%E2%80%93Mokotoff_Soundex Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone New York State Identification and Intelligence System (NYSIIS): http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System Caverphone: http://en.wikipedia.org/wiki/Caverphone was: Following phonetic algorithms should be considered, which are very useful in search: Soundex: http://en.wikipedia.org/wiki/Soundex Refined Soundex: Refer to the comment on 22/Feb/13 23:51 Daitch–Mokotoff Soundex Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone New York State Identification and Intelligence System (NYSIIS): http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System Caverphone: http://en.wikipedia.org/wiki/Caverphone > Add support for phonetic algorithms in Hive > --- > > Key: HIVE-4053 > URL: https://issues.apache.org/jira/browse/HIVE-4053 > Project: Hive > Issue Type: New Feature > Components: UDF >Affects Versions: 0.10.0 >Reporter: Krishna > Labels: patch > Fix For: 0.10.0 > > Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java, > HIVE-4053.1.patch.txt > > > Following phonetic algorithms should be considered, which are very useful in > search: > Soundex: http://en.wikipedia.org/wiki/Soundex > Refined Soundex: Refer to the comment on 22/Feb/13 23:51 > Daitch–Mokotoff Soundex: > http://en.wikipedia.org/wiki/Daitch%E2%80%93Mokotoff_Soundex > Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone > New York State Identification and Intelligence System (NYSIIS): > http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System > Caverphone: http://en.wikipedia.org/wiki/Caverphone -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4053) Add support for phonetic algorithms in Hive
[ https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587827#comment-13587827 ] Krishna commented on HIVE-4053: --- There are 6 popular phonetic algorithms (as mentioned in JIRA description). I think, it's a good idea to implement all of them in Hive. There are 2 ways to implement: Option 1: Write a separate GenericUDF for each algorithm and there'll be a separate Hive function for each algorithm Option 2: Write one GenericUDF and use a parameter argument to this function for determining which algorithm is called. I prefer to implement the algorithms using option (2) but if someone feels option (1) is better, please comment. > Add support for phonetic algorithms in Hive > --- > > Key: HIVE-4053 > URL: https://issues.apache.org/jira/browse/HIVE-4053 > Project: Hive > Issue Type: New Feature > Components: UDF >Affects Versions: 0.10.0 >Reporter: Krishna > Labels: patch > Fix For: 0.10.0 > > Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java, > HIVE-4053.1.patch.txt > > > Following phonetic algorithms should be considered, which are very useful in > search: > Soundex: http://en.wikipedia.org/wiki/Soundex > Refined Soundex: Refer to the comment on 22/Feb/13 23:51 > Daitch–Mokotoff Soundex: > http://en.wikipedia.org/wiki/Daitch%E2%80%93Mokotoff_Soundex > Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone > New York State Identification and Intelligence System (NYSIIS): > http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System > Caverphone: http://en.wikipedia.org/wiki/Caverphone -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4053) Add support for phonetic algorithms in Hive
[ https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna updated HIVE-4053: -- Status: Open (was: Patch Available) I will re-submit the patch > Add support for phonetic algorithms in Hive > --- > > Key: HIVE-4053 > URL: https://issues.apache.org/jira/browse/HIVE-4053 > Project: Hive > Issue Type: New Feature > Components: UDF >Affects Versions: 0.10.0 >Reporter: Krishna > Labels: patch > Fix For: 0.10.0 > > Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java, > HIVE-4053.1.patch.txt > > > Following phonetic algorithms should be considered, which are very useful in > search: > Soundex: http://en.wikipedia.org/wiki/Soundex > Refined Soundex: Refer to the comment on 22/Feb/13 23:51 > Daitch–Mokotoff Soundex: > http://en.wikipedia.org/wiki/Daitch%E2%80%93Mokotoff_Soundex > Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone > New York State Identification and Intelligence System (NYSIIS): > http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System > Caverphone: http://en.wikipedia.org/wiki/Caverphone -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
hive pull request: Kk wb 1228
GitHub user krishna-verticloud opened a pull request: https://github.com/apache/hive/pull/11 Kk wb 1228 You can merge this pull request into a Git repository by running: $ git pull https://github.com/VertiPub/hive kk-WB-1228 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/11.patch
hive pull request: Kk wb 1228
Github user krishna-verticloud closed the pull request at: https://github.com/apache/hive/pull/11
[jira] Created: (HIVE-1918) Add export/import facilities to the hive system
Add export/import facilities to the hive system --- Key: HIVE-1918 URL: https://issues.apache.org/jira/browse/HIVE-1918 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Krishna Kumar This is an enhancement request to add export/import features to hive. With this language extension, the user can export the data of the table - which may be located in different hdfs locations in case of a partitioned table - as well as the metadata of the table into a specified output location. This output location can then be moved over to another different hadoop/hive instance and imported there. This should work independent of the source and target metastore dbms used; for instance, between derby and mysql. For partitioned tables, the ability to export/import a subset of the partition must be supported. Howl will add more features on top of this: The ability to create/use the exported data even in the absence of hive, using MR or Pig. Please see http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-1918: Status: Patch Available (was: Open) Patch for adding export/import. > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor > Reporter: Krishna Kumar > Attachments: HIVE-1918.patch.txt > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-1918: Attachment: HIVE-1918.patch.txt > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor > Reporter: Krishna Kumar > Attachments: HIVE-1918.patch.txt > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983532#action_12983532 ] Krishna Kumar commented on HIVE-1918: - Design notes: - Export/Import modeled on existing load functionality. No new tasks added, but existing tasks for copy/move/create table/add partition et al reused. - EXPORT TABLE table [PARTITION (partition_col=partition_colval, ...) ] TO location - IMPORT [[EXTERNAL] TABLE table [PARTITION (partition_col=partition_colval, ...)] ] FROM sourcelocation [LOCATION targetlocation] - The data/metadata stored as an xml-serialized file for the metadata in the target directory plus sub-directories for the data files. > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Krishna Kumar > Attachments: HIVE-1918.patch.txt > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12984128#action_12984128 ] Krishna Kumar commented on HIVE-1918: - Ok. Will take of this via a delegating ctor. A process question: I guess I should wait for more comments from other reviewers before I create another patch in case if others are reviewing the current patch? > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Krishna Kumar > Attachments: HIVE-1918.patch.txt > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12984563#action_12984563 ] Krishna Kumar commented on HIVE-1918: - Why export/import needs this change: It is not the export part, but rather the import part which needs this change. While creating a partition as part of an import, we need to be able to create the partition along with its ancillary data including partition parameters. But first part of the existing "create partition" flow (AddPartitionDesc -> DDLTask.addPartition -> Hive.createPartition) did not support partition params specification but the second part (metastore.api.Partition -> IMetaStoreClient.add_partition -> HiveMetaStore.HMSHandler.add_partition -> ObjectStore.addPartition) does. So I added the ability to pass the partition parameters along in the first part of the flow. In terms of options for compatible changes, there are two I can see: 1. The solution suggested above. Add an additional ctor so that no existing code breaks. {noformat} public Partition(Table tbl, Map partSpec, Path location) { this(tbl, partSpec, location, null); } public Partition(Table tbl, Map partSpec, Path location, Map partParams) {...} {noformat} 2. Have only the current ctor but in Hive.createPartition get the underlying metastore.api.Partition and set the parameters to it before passing it on to the metastoreClient. Thoughts? > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Krishna Kumar > Attachments: HIVE-1918.patch.txt > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1924) checkformat implementations leak handles
checkformat implementations leak handles Key: HIVE-1924 URL: https://issues.apache.org/jira/browse/HIVE-1924 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Krishna Kumar In validateInput, Reader constructors of SequenceFile and RCFile throw exceptions to indicate that the format is incorrect, but the close is not called in a finally block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1924) checkformat implementations leak handles
[ https://issues.apache.org/jira/browse/HIVE-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-1924: Status: Patch Available (was: Open) Not sure how to test this automatically, since the readers are transient. The way I actually tested is by - running on a nfs mounted directory (since nfs creates .nfsxx files for open files which have been deleted) - pausing the code after a load command (with conf hive. is executed - using lsof to list open files by the process > checkformat implementations leak handles > > > Key: HIVE-1924 > URL: https://issues.apache.org/jira/browse/HIVE-1924 > Project: Hive > Issue Type: Bug > Components: Query Processor > Reporter: Krishna Kumar > > In validateInput, Reader constructors of SequenceFile and RCFile throw > exceptions to indicate that the format is incorrect, but the close is not > called in a finally block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1924) checkformat implementations leak handles
[ https://issues.apache.org/jira/browse/HIVE-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-1924: Attachment: HIVE.1924.patch.txt Tested only manually with hive.fileformat.check set to true > checkformat implementations leak handles > > > Key: HIVE-1924 > URL: https://issues.apache.org/jira/browse/HIVE-1924 > Project: Hive > Issue Type: Bug > Components: Query Processor > Reporter: Krishna Kumar > Attachments: HIVE.1924.patch.txt > > > In validateInput, Reader constructors of SequenceFile and RCFile throw > exceptions to indicate that the format is incorrect, but the close is not > called in a finally block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985556#action_12985556 ] Krishna Kumar commented on HIVE-1918: - @Edward: Both the existing data model (prettified er diagram attached) and the object model (class org.apache.hadoop.hive.metastore.api.Partition) allow the specification of parameters on a per-partition basis. So I am not adding new fields to either of these models. By proposal 2 above, I will not be adding any ctor parameters to org.apache.hadoop.hive.ql.metadata.Partition as well. Your point re providing manageability via ddl statements to all aspects of the data/object model is taken. But I am not adding new aspects to either model, so if indeed we need to address current manageability gaps, should they not be addressed via another enhancement request, rather than this one, which aims simply to add export/import facilities? > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Krishna Kumar > Attachments: HIVE-1918.patch.txt > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985557#action_12985557 ] Krishna Kumar commented on HIVE-1918: - @Carl: 1. Taken care in the new patch. 2. Can you post some of the diffs that you get failures on? I had a problem with running the tests on nfs mounted directories. That had to do with an existing bug in the load functionality. This used to result in a "MetaException: could not delete dir" error while trying to cleanup the effects of the previous test. I have created a separate jira HIVE-1924 for this and have attached a patch. 3. Have taken the whitelist approach, the whitelist now set as "hdfs,pfile". > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Krishna Kumar > Attachments: HIVE-1918.patch.txt > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-1918: Attachment: HIVE-1918.patch.1.txt A quick summary of the second derivative (difference between diffs) - used no-prefix while generating patch - hive.test.exim replaced by hive.exim.uri.scheme.whitelist - schemaCompare, initializeFromUrl, validateTable all refactored to util methods - trailing spaces in some test files removed > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor > Reporter: Krishna Kumar > Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.txt > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-1918: Attachment: hive-metastore-er.pdf Prettified ER diagram of the existing data model > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor > Reporter: Krishna Kumar > Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.txt, > hive-metastore-er.pdf > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985565#action_12985565 ] Krishna Kumar commented on HIVE-1918: - @Namit: 1. Do you have any ideas re how we can get an unique, temporary directory name for use in the test script files? In code of course we can use the getScratchDir methods, but how to solve this problem in these test scripts? 2. Export/Import, as in the case of Load, operates at file level rather than at record level. So there are no record-level filters available. > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Krishna Kumar > Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.txt, > hive-metastore-er.pdf > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1924) checkformat implementations leak handles
[ https://issues.apache.org/jira/browse/HIVE-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-1924: Status: Open (was: Patch Available) Hmm, The patch is not correct. Setting hive.checkformat=false does seem to stop the leaks though. Investigating... > checkformat implementations leak handles > > > Key: HIVE-1924 > URL: https://issues.apache.org/jira/browse/HIVE-1924 > Project: Hive > Issue Type: Bug > Components: Query Processor > Reporter: Krishna Kumar > Attachments: HIVE.1924.patch.txt > > > In validateInput, Reader constructors of SequenceFile and RCFile throw > exceptions to indicate that the format is incorrect, but the close is not > called in a finally block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1924) checkformat implementations leak handles
[ https://issues.apache.org/jira/browse/HIVE-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar resolved HIVE-1924. - Resolution: Duplicate Problem referenced in HADOOP-5476 and HIVE-1185 > checkformat implementations leak handles > > > Key: HIVE-1924 > URL: https://issues.apache.org/jira/browse/HIVE-1924 > Project: Hive > Issue Type: Bug > Components: Query Processor > Reporter: Krishna Kumar > Attachments: HIVE.1924.patch.txt > > > In validateInput, Reader constructors of SequenceFile and RCFile throw > exceptions to indicate that the format is incorrect, but the close is not > called in a finally block. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-1918: Attachment: HIVE-1918.patch.2.txt Patch including - no changes to ql.metadata.Partition as per option#2 above - use relative paths in tests > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor > Reporter: Krishna Kumar > Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, > HIVE-1918.patch.txt, hive-metastore-er.pdf > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988693#comment-12988693 ] Krishna Kumar commented on HIVE-1918: - @Namit: Attached patch now uses relative paths in test scripts; (Note that some existing tests [clientpositive/insertexternal1.q, clientpositive/load_fs.q] uses absolute paths even today. Those need to be changed via another bug report.) @Edward: No changes to Partition.java as proposed in option 2 above. > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Krishna Kumar > Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, > HIVE-1918.patch.txt, hive-metastore-er.pdf > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-1918: Attachment: HIVE-1918.patch.3.txt Patch with all open issues addressed > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor > Reporter: Krishna Kumar > Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, > HIVE-1918.patch.3.txt, HIVE-1918.patch.txt, hive-metastore-er.pdf > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-1918: Status: Patch Available (was: Open) > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor > Reporter: Krishna Kumar > Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, > HIVE-1918.patch.3.txt, HIVE-1918.patch.txt, hive-metastore-er.pdf > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989747#comment-12989747 ] Krishna Kumar commented on HIVE-1918: - With this patch, I think all above issues are addressed. Also have added 3 bug fixes + tests for those bugs. Please review. > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Krishna Kumar > Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, > HIVE-1918.patch.3.txt, HIVE-1918.patch.txt, hive-metastore-er.pdf > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992036#comment-12992036 ] Krishna Kumar commented on HIVE-1918: - Thanks, Namit, for the comments. 1. Ok re moving serialization/deserialization methods to EximUtil, but did not understand the first part. Are you suggesting moving EximUtil, ImportSemanticAnalyzer and ExportSemanticAnalyzer to a new package? Does not seem to warrant it; today all parsing/semantic analysis classes are in o.a.h.h.ql.parse package... 2. You mean Hive.java's API? The existing first createPartition remains as it is, the second createPartition used in DDLTasek is changing to allow the creation of a partition with all the partition-specific configurations. Since AddPartitionDesc is initialized with nulls/-1 for these extra parameters, the existing behaviour is not altered. 3. Can you expand a little? What are inputs/outputs (classes?, tables?) - if they are part of the existing object model/data model, I think they are exported and imported. > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, > HIVE-1918.patch.3.txt, HIVE-1918.patch.txt, hive-metastore-er.pdf > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992373#comment-12992373 ] Krishna Kumar commented on HIVE-1918: - Importing into existing tables is now supported, but the checks (to see whether the imported table and the target table are compatible) have been kept fairly simple for now. Please see ImportSemanticAnalyzer.checkTable. The schemas (column and partition) of the two should match exactly, except for comments. Since we are just moving files (rather than rewriting records), I think there will be issues if the metadata schema does not match (in terms of types, number etc) the data serialization exactly. Re the earlier comment re outputs/inputs, got what you meant. I will add the table/partition to the inputs in exportsemanticanalyzer. But in the case of the imports, I see that the tasks themselves adds the entity operated upon to the inputs/outputs list. Isn't that too late for authorization/concurrency, even though it may work for replication. Or both the sem.analyzers and the tasks are expected to add them? In the case of newly created table/partition, the sem.analyzer does not have a handle ? > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, > HIVE-1918.patch.3.txt, HIVE-1918.patch.txt, hive-metastore-er.pdf > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992606#comment-12992606 ] Krishna Kumar commented on HIVE-1918: - Hmm. LoadSemanticAnalyzer (which knows the table) does not add it to the outputs, but the MoveTask it schedules, does. Similarly, CREATE-TABLE does not add the entity but the DDLTask it schedules, does. This may be fine only because the entity does not exist at compile time? ADD-PARTITION adds the table as an *input* at compile time and the partition itself is added as an output at execution time. Should not the table be an output (at compile time) as well - for authorization/concurrency purposes? Anyway, where the import operates on existing tables/partitions, I will add them at compile time. If the entity is being created as part of the task, then the task will be adding them to inputs/outputs at runtime. Is this fine? > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, > HIVE-1918.patch.3.txt, HIVE-1918.patch.txt, hive-metastore-er.pdf > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-1918: Status: Patch Available (was: Open) Please review. Will try and see if I can update the reviewboard myself... > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor > Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, > HIVE-1918.patch.3.txt, HIVE-1918.patch.4.txt, HIVE-1918.patch.txt, > hive-metastore-er.pdf > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-1918: Attachment: HIVE-1918.patch.4.txt Patch with - metadata ser/deser methods moved from HiveUtils to EximUtil - inputs and outputs populated; authorization related bugfix and tests > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor > Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, > HIVE-1918.patch.3.txt, HIVE-1918.patch.4.txt, HIVE-1918.patch.txt, > hive-metastore-er.pdf > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-17278) Incorrect output timestamp from from_utc_timestamp()/to_utc_timestamp when local timezone has DST
Leela Krishna created HIVE-17278: Summary: Incorrect output timestamp from from_utc_timestamp()/to_utc_timestamp when local timezone has DST Key: HIVE-17278 URL: https://issues.apache.org/jira/browse/HIVE-17278 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 2.0.0 Reporter: Leela Krishna HIVE-12706 is resolved but there is still a bug in this - from_utc_timestamp() is interpreting a GMT timestamp with DST. HS2 on PST timezone: GMT timestamp PST timestamp PST 2GMT 2012-03-11 01:30:15.332 2012-03-10 17:30:15.332 2012-03-11 01:30:15.332 2012-03-11 02:30:15.332 2012-03-10 19:30:15.332 2012-03-11 03:30:15.332 (<--- We got 1 hour more on GMT) PSTtimestap is generated using from_utc_timestamp('2012-03-11 02:30:15.332', 'PST') PST2GMT timestamp is generated using to_utc_timestamp(from_utc_timestamp('2012-03-11 02:30:15.332', 'PST'), 'PST') -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly
[ https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094294#comment-13094294 ] Krishna Kumar commented on HIVE-2417: - Yes, the test is designed to produce the error when run without the change. Are you finding that that's not the case? I get an EOFException while running the same steps in my development environment (i.e., not as a unit test). 1. This is needed so that the rcfiles in the target table are compressed with Bzip2. Do you mean that we should be using Default compression codec instead? Fine with me but why is that important? 2. tgt does contain more than one file. [before alter] +POSTHOOK: query: show table extended like `tgt_rc_merge_test` ... +totalNumberFiles:2 ... [after alter] +POSTHOOK: query: show table extended like `tgt_rc_merge_test` ... +totalNumberFiles:1 The 'create' adds one file, and the insert adds another file. [OT: Does it make sense append a block merge task after an non-overwrite insert? Dunno...] > Merging of compressed rcfiles fails to write the valuebuffer part correctly > --- > > Key: HIVE-2417 > URL: https://issues.apache.org/jira/browse/HIVE-2417 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-2417.v0.patch > > > The blockmerge task does not create proper rc files when merging compressed > rc files as the valuebuffer writing is incorrect. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly
[ https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-2417: Attachment: HIVE-2417.v1.patch Test changed after review comments - default codec instead of bzip2 - Create + 2 inserts instead of CTAS + 1 insert > Merging of compressed rcfiles fails to write the valuebuffer part correctly > --- > > Key: HIVE-2417 > URL: https://issues.apache.org/jira/browse/HIVE-2417 > Project: Hive > Issue Type: Bug > Components: Query Processor > Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-2417.v0.patch, HIVE-2417.v1.patch > > > The blockmerge task does not create proper rc files when merging compressed > rc files as the valuebuffer writing is incorrect. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2413) BlockMergeTask ignores client-specified jars
[ https://issues.apache.org/jira/browse/HIVE-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-2413: Attachment: HIVE-2413.v1.patch Empty string not handled correctly in JC so handling it here... > BlockMergeTask ignores client-specified jars > > > Key: HIVE-2413 > URL: https://issues.apache.org/jira/browse/HIVE-2413 > Project: Hive > Issue Type: Bug > Components: Query Processor > Reporter: Krishna Kumar >Assignee: Krishna Kumar >Priority: Minor > Attachments: HIVE-2413.v0.patch, HIVE-2413.v1.patch > > > User-specified jars are not added to the hadoop tasks while executing a > BlockMergeTask resulting in a ClassNotFoundException. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12995767#comment-12995767 ] Krishna Kumar commented on HIVE-1918: - https://reviews.apache.org/r/430/ added (with hive-git as repository). Carl, can you take down 339 as that is now superseded? > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, > HIVE-1918.patch.3.txt, HIVE-1918.patch.4.txt, HIVE-1918.patch.txt, > hive-metastore-er.pdf > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12997296#comment-12997296 ] Krishna Kumar commented on HIVE-1918: - There are a few reasons why I took this approach - The decision on compatibility (forward/backward) checks as in EximUtil.checkCompatibility needs to taken consciously. That is, automatically breaking backward compatibility is not an option here I think. - What needs to be serialized/deserialized is also requires a human decision. For instance, even now, authorization details are not transferred by an export/import. - The serialization/deserialization methods are also used by howl codebase outside of a hive context. It will be good to have this code only loosely coupled to the metastore code. > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, > HIVE-1918.patch.3.txt, HIVE-1918.patch.4.txt, HIVE-1918.patch.txt, > hive-metastore-er.pdf > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.
LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it. -- Key: HIVE-2003 URL: https://issues.apache.org/jira/browse/HIVE-2003 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Krishna Kumar Assignee: Krishna Kumar Priority: Minor The table/partition being loaded is not being added to outputs in the LoadSemanticAnalyzer. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.
[ https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-2003: Attachment: HIVE-2003.patch.txt Patch attached. 1. LoadSemanticAnalyzer adds the table/partition to the outputs 2. QTestUtil.cleanup() used to call setup.tearDown, resulting in the commands run during createSources being run without a zookeeper server instance. So I have moved setup.tearDown to QTestUtil.shutdown(). 3. EnforceReadOnlyTables also needs to allow outputs during initialization loads/creates. So a session boolean indicates initialization phase. 4. TestParse.vm and TestParseNegative.vm needed to be fixed too. Setup create a QTestUtil instance each time but tearDown seems to consider qt as a reusable instance. Changed tearDown to shutdown QTestUtil every time. 5. Test results regenerated. > LOAD compilation does not set the outputs during semantic analysis resulting > in no authorization checks being done for it. > -- > > Key: HIVE-2003 > URL: https://issues.apache.org/jira/browse/HIVE-2003 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Krishna Kumar >Assignee: Krishna Kumar >Priority: Minor > Attachments: HIVE-2003.patch.txt > > > The table/partition being loaded is not being added to outputs in the > LoadSemanticAnalyzer. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.
[ https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12998276#comment-12998276 ] Krishna Kumar commented on HIVE-2003: - 6. Loading a partitioned table without specifying partitions was being validated only if OVERWRITE was specified. This is not right IMO, so fixed this as well. > LOAD compilation does not set the outputs during semantic analysis resulting > in no authorization checks being done for it. > -- > > Key: HIVE-2003 > URL: https://issues.apache.org/jira/browse/HIVE-2003 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Krishna Kumar >Assignee: Krishna Kumar >Priority: Minor > Attachments: HIVE-2003.patch.txt > > > The table/partition being loaded is not being added to outputs in the > LoadSemanticAnalyzer. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.
[ https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-2003: Attachment: (was: HIVE-2003.patch.txt) > LOAD compilation does not set the outputs during semantic analysis resulting > in no authorization checks being done for it. > -- > > Key: HIVE-2003 > URL: https://issues.apache.org/jira/browse/HIVE-2003 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Krishna Kumar >Assignee: Krishna Kumar >Priority: Minor > Attachments: HIVE-2003.patch.txt > > > The table/partition being loaded is not being added to outputs in the > LoadSemanticAnalyzer. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.
[ https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-2003: Attachment: HIVE-2003.patch.txt One results file was diffed as binary so patch regenerated with --text > LOAD compilation does not set the outputs during semantic analysis resulting > in no authorization checks being done for it. > -- > > Key: HIVE-2003 > URL: https://issues.apache.org/jira/browse/HIVE-2003 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Krishna Kumar >Assignee: Krishna Kumar >Priority: Minor > Attachments: HIVE-2003.patch.txt > > > The table/partition being loaded is not being added to outputs in the > LoadSemanticAnalyzer. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12998442#comment-12998442 ] Krishna Kumar commented on HIVE-1918: - Thanks Paul. [Your comments are on a superseded review board submission; I will remind Carl again to take it down. The current reviewboard submission is up at https://reviews.apache.org/r/430/, but never the less both your comments are still applicable.] 1. Ok. Will address it. 2. I am not seeing how compatibility checking and selective serialization/deserialization of an object graph will be possible by auto-generated code. Will look into both thrift and datanucleus serialization (that you mentioned) from this aspect, but fine-grained control over this process is required here I think. > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, > HIVE-1918.patch.3.txt, HIVE-1918.patch.4.txt, HIVE-1918.patch.txt, > hive-metastore-er.pdf > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-1918: Attachment: HIVE-1918.patch.5.txt - Nested ternaries expanded - thrift-based serialization for the metastore objects Please review. > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor > Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, > HIVE-1918.patch.3.txt, HIVE-1918.patch.4.txt, HIVE-1918.patch.5.txt, > HIVE-1918.patch.txt, hive-metastore-er.pdf > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.
[ https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-2003: Status: Patch Available (was: Open) > LOAD compilation does not set the outputs during semantic analysis resulting > in no authorization checks being done for it. > -- > > Key: HIVE-2003 > URL: https://issues.apache.org/jira/browse/HIVE-2003 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Krishna Kumar >Assignee: Krishna Kumar >Priority: Minor > Attachments: HIVE-2003.patch.txt > > > The table/partition being loaded is not being added to outputs in the > LoadSemanticAnalyzer. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-1918: Attachment: HIVE-1918.patch.5.txt Merged with trunk > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor > Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, > HIVE-1918.patch.3.txt, HIVE-1918.patch.4.txt, HIVE-1918.patch.5.txt, > HIVE-1918.patch.5.txt, HIVE-1918.patch.txt, hive-metastore-er.pdf > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-1918: Attachment: (was: HIVE-1918.patch.5.txt) > Add export/import facilities to the hive system > --- > > Key: HIVE-1918 > URL: https://issues.apache.org/jira/browse/HIVE-1918 > Project: Hive > Issue Type: New Feature > Components: Query Processor > Reporter: Krishna Kumar >Assignee: Krishna Kumar > Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, > HIVE-1918.patch.3.txt, HIVE-1918.patch.4.txt, HIVE-1918.patch.5.txt, > HIVE-1918.patch.txt, hive-metastore-er.pdf > > > This is an enhancement request to add export/import features to hive. > With this language extension, the user can export the data of the table - > which may be located in different hdfs locations in case of a partitioned > table - as well as the metadata of the table into a specified output > location. This output location can then be moved over to another different > hadoop/hive instance and imported there. > This should work independent of the source and target metastore dbms used; > for instance, between derby and mysql. > For partitioned tables, the ability to export/import a subset of the > partition must be supported. > Howl will add more features on top of this: The ability to create/use the > exported data even in the absence of hive, using MR or Pig. Please see > http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HIVE-2065) RCFile issues
RCFile issues - Key: HIVE-2065 URL: https://issues.apache.org/jira/browse/HIVE-2065 Project: Hive Issue Type: Bug Reporter: Krishna Kumar Priority: Minor Some potential issues with RCFile 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per yongqiang he, the class is not meant to be thread-safe (and it is not). Might as well get rid of the confusing and performance-impacting lock acquisitions. 2. Record Length overstated for compressed files. IIUC, the key compression happens after we have written the record length. {code} int keyLength = key.getSize(); if (keyLength < 0) { throw new IOException("negative length keys not allowed: " + key); } out.writeInt(keyLength + valueLength); // total record length out.writeInt(keyLength); // key portion length if (!isCompressed()) { out.writeInt(keyLength); key.write(out); // key } else { keyCompressionBuffer.reset(); keyDeflateFilter.resetState(); key.write(keyDeflateOut); keyDeflateOut.flush(); keyDeflateFilter.finish(); int compressedKeyLen = keyCompressionBuffer.getLength(); out.writeInt(compressedKeyLen); out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen); } {code} 3. For sequence file compatibility, the compressed key length should be the next field to record length, not the uncompressed key length. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2065) RCFile issues
[ https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-2065: Attachment: Slide1.png Compressed RCFile Layout > RCFile issues > - > > Key: HIVE-2065 > URL: https://issues.apache.org/jira/browse/HIVE-2065 > Project: Hive > Issue Type: Bug > Reporter: Krishna Kumar >Priority: Minor > Attachments: Slide1.png > > > Some potential issues with RCFile > 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per > yongqiang he, the class is not meant to be thread-safe (and it is not). Might > as well get rid of the confusing and performance-impacting lock acquisitions. > 2. Record Length overstated for compressed files. IIUC, the key compression > happens after we have written the record length. > {code} > int keyLength = key.getSize(); > if (keyLength < 0) { > throw new IOException("negative length keys not allowed: " + key); > } > out.writeInt(keyLength + valueLength); // total record length > out.writeInt(keyLength); // key portion length > if (!isCompressed()) { > out.writeInt(keyLength); > key.write(out); // key > } else { > keyCompressionBuffer.reset(); > keyDeflateFilter.resetState(); > key.write(keyDeflateOut); > keyDeflateOut.flush(); > keyDeflateFilter.finish(); > int compressedKeyLen = keyCompressionBuffer.getLength(); > out.writeInt(compressedKeyLen); > out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen); > } > {code} > 3. For sequence file compatibility, the compressed key length should be the > next field to record length, not the uncompressed key length. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (HIVE-2065) RCFile issues
[ https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar reassigned HIVE-2065: --- Assignee: Krishna Kumar > RCFile issues > - > > Key: HIVE-2065 > URL: https://issues.apache.org/jira/browse/HIVE-2065 > Project: Hive > Issue Type: Bug > Reporter: Krishna Kumar > Assignee: Krishna Kumar >Priority: Minor > Attachments: Slide1.png > > > Some potential issues with RCFile > 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per > yongqiang he, the class is not meant to be thread-safe (and it is not). Might > as well get rid of the confusing and performance-impacting lock acquisitions. > 2. Record Length overstated for compressed files. IIUC, the key compression > happens after we have written the record length. > {code} > int keyLength = key.getSize(); > if (keyLength < 0) { > throw new IOException("negative length keys not allowed: " + key); > } > out.writeInt(keyLength + valueLength); // total record length > out.writeInt(keyLength); // key portion length > if (!isCompressed()) { > out.writeInt(keyLength); > key.write(out); // key > } else { > keyCompressionBuffer.reset(); > keyDeflateFilter.resetState(); > key.write(keyDeflateOut); > keyDeflateOut.flush(); > keyDeflateFilter.finish(); > int compressedKeyLen = keyCompressionBuffer.getLength(); > out.writeInt(compressedKeyLen); > out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen); > } > {code} > 3. For sequence file compatibility, the compressed key length should be the > next field to record length, not the uncompressed key length. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2065) RCFile issues
[ https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008741#comment-13008741 ] Krishna Kumar commented on HIVE-2065: - So should I go ahead and fix #2 and #3 as well? Note that these are non-compatible changes, so the version number will need to be bumped up. My proposal: Fix the issues in the new format - up the version number to 7. - compute and store record length as (compressed key length = 4 + compressed key contents length) + compressed value length - store compressed key length as the next 4-byte field - key contains 4-byte uncompressed key contents length + compressed key contents Provide backward compatibility - while reading version 6, - interpret fields as now but recalculate the recordlength from the next two fields (as record length = record length - uncompressed key length + compressed key length) > RCFile issues > - > > Key: HIVE-2065 > URL: https://issues.apache.org/jira/browse/HIVE-2065 > Project: Hive > Issue Type: Bug > Reporter: Krishna Kumar >Assignee: Krishna Kumar >Priority: Minor > Attachments: Slide1.png > > > Some potential issues with RCFile > 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per > yongqiang he, the class is not meant to be thread-safe (and it is not). Might > as well get rid of the confusing and performance-impacting lock acquisitions. > 2. Record Length overstated for compressed files. IIUC, the key compression > happens after we have written the record length. > {code} > int keyLength = key.getSize(); > if (keyLength < 0) { > throw new IOException("negative length keys not allowed: " + key); > } > out.writeInt(keyLength + valueLength); // total record length > out.writeInt(keyLength); // key portion length > if (!isCompressed()) { > out.writeInt(keyLength); > key.write(out); // key > } else { > keyCompressionBuffer.reset(); > keyDeflateFilter.resetState(); > key.write(keyDeflateOut); > keyDeflateOut.flush(); > keyDeflateFilter.finish(); > int compressedKeyLen = keyCompressionBuffer.getLength(); > out.writeInt(compressedKeyLen); > out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen); > } > {code} > 3. For sequence file compatibility, the compressed key length should be the > next field to record length, not the uncompressed key length. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2065) RCFile issues
[ https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-2065: Attachment: proposal.png Notation : Length bracket inside the dashed box means it is the uncompressed length. Length bracket outside the dashed box means it is the compressed length. > RCFile issues > - > > Key: HIVE-2065 > URL: https://issues.apache.org/jira/browse/HIVE-2065 > Project: Hive > Issue Type: Bug > Reporter: Krishna Kumar > Assignee: Krishna Kumar >Priority: Minor > Attachments: Slide1.png, proposal.png > > > Some potential issues with RCFile > 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per > yongqiang he, the class is not meant to be thread-safe (and it is not). Might > as well get rid of the confusing and performance-impacting lock acquisitions. > 2. Record Length overstated for compressed files. IIUC, the key compression > happens after we have written the record length. > {code} > int keyLength = key.getSize(); > if (keyLength < 0) { > throw new IOException("negative length keys not allowed: " + key); > } > out.writeInt(keyLength + valueLength); // total record length > out.writeInt(keyLength); // key portion length > if (!isCompressed()) { > out.writeInt(keyLength); > key.write(out); // key > } else { > keyCompressionBuffer.reset(); > keyDeflateFilter.resetState(); > key.write(keyDeflateOut); > keyDeflateOut.flush(); > keyDeflateFilter.finish(); > int compressedKeyLen = keyCompressionBuffer.getLength(); > out.writeInt(compressedKeyLen); > out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen); > } > {code} > 3. For sequence file compatibility, the compressed key length should be the > next field to record length, not the uncompressed key length. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.
[ https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-2003: Attachment: HIVE-2003.patch.1.txt Regenerated the patch. > LOAD compilation does not set the outputs during semantic analysis resulting > in no authorization checks being done for it. > -- > > Key: HIVE-2003 > URL: https://issues.apache.org/jira/browse/HIVE-2003 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Krishna Kumar >Assignee: Krishna Kumar >Priority: Minor > Attachments: HIVE-2003.patch.1.txt, HIVE-2003.patch.txt > > > The table/partition being loaded is not being added to outputs in the > LoadSemanticAnalyzer. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.
[ https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009202#comment-13009202 ] Krishna Kumar commented on HIVE-2003: - Added to review board: https://reviews.apache.org/r/518/ > LOAD compilation does not set the outputs during semantic analysis resulting > in no authorization checks being done for it. > -- > > Key: HIVE-2003 > URL: https://issues.apache.org/jira/browse/HIVE-2003 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Krishna Kumar >Assignee: Krishna Kumar >Priority: Minor > Attachments: HIVE-2003.patch.1.txt, HIVE-2003.patch.txt > > > The table/partition being loaded is not being added to outputs in the > LoadSemanticAnalyzer. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.
[ https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009203#comment-13009203 ] Krishna Kumar commented on HIVE-2003: - For #4 above, I have taken now the same approach as in TestCliDriver/TestNegativeCliDriver, reusing the QTestUtil instance. Needed to clear out inputs/outputs for this to work. > LOAD compilation does not set the outputs during semantic analysis resulting > in no authorization checks being done for it. > -- > > Key: HIVE-2003 > URL: https://issues.apache.org/jira/browse/HIVE-2003 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Krishna Kumar >Assignee: Krishna Kumar >Priority: Minor > Attachments: HIVE-2003.patch.1.txt, HIVE-2003.patch.txt > > > The table/partition being loaded is not being added to outputs in the > LoadSemanticAnalyzer. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.
[ https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Kumar updated HIVE-2003: Status: Patch Available (was: Open) Please review asap as there are lots of changes to q.out files and any delay may cause another conflict/resolution cycle. > LOAD compilation does not set the outputs during semantic analysis resulting > in no authorization checks being done for it. > -- > > Key: HIVE-2003 > URL: https://issues.apache.org/jira/browse/HIVE-2003 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Krishna Kumar >Assignee: Krishna Kumar >Priority: Minor > Attachments: HIVE-2003.patch.1.txt, HIVE-2003.patch.txt > > > The table/partition being loaded is not being added to outputs in the > LoadSemanticAnalyzer. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2065) RCFile issues
[ https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009981#comment-13009981 ] Krishna Kumar commented on HIVE-2065: - Hmm. #3 is taking me a bit too far than I originally thought. I assume being able to read an RCFile as SequenceFile is required, while being able to write an RCFile via the SequenceFile interface is desirable. Having made changes so that record length is correctly set, in order to be able to make sure that the rcfile is handled correctly as a sequence file, the following changes are also required, IIUC. - the second field should be the key length (4 + compressed/plain key contents) - the key class (KeyBuffer) must be made responsible for reading/writing the next field - plain key contents length - as well as compression/decompression of the key contents - the value class (ValueBuffer) related changes will be trickier. Since the value is not compressed as a unit, we can not use record-compressed format. We need to mark the records as plain records, and move the codec to a metadata entry. Then the valueBuffer class will work correctly with sequencefile implementation. Thoughts? worth it? > RCFile issues > - > > Key: HIVE-2065 > URL: https://issues.apache.org/jira/browse/HIVE-2065 > Project: Hive > Issue Type: Bug > Reporter: Krishna Kumar >Assignee: Krishna Kumar >Priority: Minor > Attachments: Slide1.png, proposal.png > > > Some potential issues with RCFile > 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per > yongqiang he, the class is not meant to be thread-safe (and it is not). Might > as well get rid of the confusing and performance-impacting lock acquisitions. > 2. Record Length overstated for compressed files. IIUC, the key compression > happens after we have written the record length. > {code} > int keyLength = key.getSize(); > if (keyLength < 0) { > throw new IOException("negative length keys not allowed: " + key); > } > out.writeInt(keyLength + valueLength); // total record length > out.writeInt(keyLength); // key portion length > if (!isCompressed()) { > out.writeInt(keyLength); > key.write(out); // key > } else { > keyCompressionBuffer.reset(); > keyDeflateFilter.resetState(); > key.write(keyDeflateOut); > keyDeflateOut.flush(); > keyDeflateFilter.finish(); > int compressedKeyLen = keyCompressionBuffer.getLength(); > out.writeInt(compressedKeyLen); > out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen); > } > {code} > 3. For sequence file compatibility, the compressed key length should be the > next field to record length, not the uncompressed key length. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira