from:"Krishna"

Review Request: Adding two interfaces for schema-aware codecs and their invocations from appropriate places in rcfile

2011-12-07 Thread Krishna


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3051/
---

Review request for hive.


Summary
---

Introduces interfaces for schema-aware codecs. Actual implementations not part 
of this patch. One specific implementation will be added by HIVE-2604.


This addresses bug HIVE-2600.
https://issues.apache.org/jira/browse/HIVE-2600


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/io/SchemaAwareCompressionInputStream.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/io/SchemaAwareCompressionOutputStream.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java b15fdb8 

Diff: https://reviews.apache.org/r/3051/diff


Testing
---


Thanks,

Krishna

Review Request: Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies

2011-12-08 Thread Krishna


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3075/
---

Review request for hive.


Summary
---

Add UberCompressor Serde/Codec to contrib which allows per-column compression 
strategies
 - gaps 
- supports only certain complex types
- stats


This addresses bug HIVE-2604.
https://issues.apache.org/jira/browse/HIVE-2604


Diffs
-

  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/compressors/DummyIntegerCompressor.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/dsalg/Tuple.java 
PRE-CREATION 
  contrib/src/test/queries/clientpositive/ubercompressor.q PRE-CREATION 
  contrib/src/test/results/clientpositive/ubercompressor.q.out PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorUtils.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorColumnConfig.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorConfig.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorSerde.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionOutputStream.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionInputStream.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/InputReader.java 
PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/OutputWriter.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/TypeSpecificCompressor.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionCodec.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/3075/diff


Testing
---

test added


Thanks,

Krishna

Re: Review Request: Add UberCompressor Serde/Codec to contrib which allows per-column compression strategies

2011-12-17 Thread Krishna


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3075/
---

(Updated 2011-12-17 10:41:45.367761)


Review request for hive and Yongqiang He.


Changes
---

Closed the two gaps - support for arbitrary types, and stats


Summary
---

Add UberCompressor Serde/Codec to contrib which allows per-column compression 
strategies
 - gaps 
- supports only certain complex types
- stats


This addresses bug HIVE-2604.
https://issues.apache.org/jira/browse/HIVE-2604


Diffs (updated)
-

  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/InputReader.java 
PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/OutputWriter.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/TypeSpecificCompressor.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionCodec.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionInputStream.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressionOutputStream.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorColumnConfig.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorConfig.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorSerde.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorSerdeField.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/UberCompressorUtils.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/compressors/DummyIntegerCompressor.java
 PRE-CREATION 
  
contrib/src/java/org/apache/hadoop/hive/contrib/ubercompressor/dsalg/Tuple.java 
PRE-CREATION 
  contrib/src/test/queries/clientpositive/ubercompressor.q PRE-CREATION 
  contrib/src/test/results/clientpositive/ubercompressor.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/3075/diff


Testing
---

test added


Thanks,

Krishna

Review Request: Patch #4 For Hive 1918 - Export / Import

2011-02-17 Thread Krishna

/exim_17_part_managed.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/exim_18_part_external.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/exim_19_part_external_location.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/exim_20_part_managed_location.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/exim_21_export_authsuccess.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/exim_22_import_exist_authsuccess.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/exim_23_import_part_authsuccess.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/exim_24_import_nonexist_authsuccess.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/430/diff


Testing
---

Tests added


Thanks,

Krishna

Re: Review Request: Patch For Hive 1918 - Export / Import

2011-02-17 Thread Krishna

/results/clientpositive/exim_17_part_managed.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/exim_18_part_external.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/exim_19_part_external_location.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/exim_20_part_managed_location.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/exim_21_export_authsuccess.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/exim_22_import_exist_authsuccess.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/exim_23_import_part_authsuccess.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/exim_24_import_nonexist_authsuccess.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/430/diff


Testing
---

Tests added


Thanks,

Krishna

Re: Review Request: Patch For Hive 1918 - Export / Import

2011-02-28 Thread Krishna

/results/clientpositive/exim_17_part_managed.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/exim_18_part_external.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/exim_19_part_external_location.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/exim_20_part_managed_location.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/exim_21_export_authsuccess.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/exim_22_import_exist_authsuccess.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/exim_23_import_part_authsuccess.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/exim_24_import_nonexist_authsuccess.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/430/diff


Testing
---

Tests added


Thanks,

Krishna

Re: Review Request: Patch For Hive 1918 - Export / Import

2011-02-28 Thread Krishna

/exim_15_external_part.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/exim_16_part_external.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/exim_17_part_managed.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/exim_18_part_external.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/exim_19_part_external_location.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/exim_20_part_managed_location.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/exim_21_export_authsuccess.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/exim_22_import_exist_authsuccess.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/exim_23_import_part_authsuccess.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/exim_24_import_nonexist_authsuccess.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/430/diff


Testing
---

Tests added


Thanks,

Krishna

Re: Review Request: Patch For Hive 1918 - Export / Import

2011-03-14 Thread Krishna

/exim_16_part_external.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/exim_17_part_managed.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/exim_18_part_external.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/exim_19_part_external_location.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/exim_20_part_managed_location.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/exim_21_export_authsuccess.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/exim_22_import_exist_authsuccess.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/exim_23_import_part_authsuccess.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/exim_24_import_nonexist_authsuccess.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/430/diff


Testing
---

Tests added


Thanks,

Krishna

Review Request: Patch for HIVE-2003: Load analysis should add table/partition to the outputs

2011-03-22 Thread Krishna

 4fb79dd 
  ql/src/test/results/clientpositive/smb_mapjoin_7.q.out 19dcd1c 
  ql/src/test/results/clientpositive/smb_mapjoin_8.q.out 5e545a4 
  ql/src/test/results/clientpositive/stats11.q.out d5c03c2 
  ql/src/test/results/clientpositive/stats3.q.out fbfa58c 
  ql/src/test/results/clientpositive/udaf_context_ngrams.q.out 577cc71 
  ql/src/test/results/clientpositive/udaf_corr.q.out 58f143e 
  ql/src/test/results/clientpositive/udaf_covar_pop.q.out 634875c 
  ql/src/test/results/clientpositive/udaf_covar_samp.q.out 7fd2527 
  ql/src/test/results/clientpositive/udaf_ngrams.q.out 2255a8e 
  ql/src/test/results/clientpositive/udf_field.q.out f3914d2 
  ql/src/test/results/clientpositive/udf_length.q.out 410a0a7 
  ql/src/test/results/clientpositive/udf_reverse.q.out e21e2f2 
  ql/src/test/results/clientpositive/uniquejoin.q.out a026db3 
  ql/src/test/results/compiler/plan/cast1.q.xml a7bb943 
  ql/src/test/results/compiler/plan/groupby1.q.xml 92cb203 
  ql/src/test/results/compiler/plan/groupby2.q.xml 0d93feb 
  ql/src/test/results/compiler/plan/groupby3.q.xml a267968 
  ql/src/test/results/compiler/plan/groupby4.q.xml c33d459 
  ql/src/test/results/compiler/plan/groupby5.q.xml 0f18322 
  ql/src/test/results/compiler/plan/groupby6.q.xml 251dc11 
  ql/src/test/results/compiler/plan/input1.q.xml 1e085be 
  ql/src/test/results/compiler/plan/input2.q.xml 509c2ef 
  ql/src/test/results/compiler/plan/input20.q.xml 80365fe 
  ql/src/test/results/compiler/plan/input3.q.xml 240bf4f 
  ql/src/test/results/compiler/plan/input4.q.xml e149c30 
  ql/src/test/results/compiler/plan/input5.q.xml 7e8b3b6 
  ql/src/test/results/compiler/plan/input6.q.xml b1ac912 
  ql/src/test/results/compiler/plan/input7.q.xml a7ef270 
  ql/src/test/results/compiler/plan/input8.q.xml e793db1 
  ql/src/test/results/compiler/plan/input9.q.xml 53b6ab1 
  ql/src/test/results/compiler/plan/input_part1.q.xml e598c31 
  ql/src/test/results/compiler/plan/input_testsequencefile.q.xml 098e81a 
  ql/src/test/results/compiler/plan/input_testxpath.q.xml 687c3f2 
  ql/src/test/results/compiler/plan/input_testxpath2.q.xml d1c715a 
  ql/src/test/results/compiler/plan/join1.q.xml 535aea4 
  ql/src/test/results/compiler/plan/join2.q.xml c558556 
  ql/src/test/results/compiler/plan/join3.q.xml deb278e 
  ql/src/test/results/compiler/plan/join4.q.xml 7227624 
  ql/src/test/results/compiler/plan/join5.q.xml 08a456c 
  ql/src/test/results/compiler/plan/join6.q.xml 1f49fe2 
  ql/src/test/results/compiler/plan/join7.q.xml 19815fd 
  ql/src/test/results/compiler/plan/join8.q.xml c13ca3a 
  ql/src/test/results/compiler/plan/sample1.q.xml a53f4e6 
  ql/src/test/results/compiler/plan/sample2.q.xml 10775d5 
  ql/src/test/results/compiler/plan/sample3.q.xml 38d0d98 
  ql/src/test/results/compiler/plan/sample4.q.xml 8d67192 
  ql/src/test/results/compiler/plan/sample5.q.xml 939b852 
  ql/src/test/results/compiler/plan/sample6.q.xml e9f9b57 
  ql/src/test/results/compiler/plan/sample7.q.xml 6e3e01a 
  ql/src/test/results/compiler/plan/subq.q.xml 1fda353 
  ql/src/test/results/compiler/plan/udf1.q.xml 6931b8a 
  ql/src/test/results/compiler/plan/udf4.q.xml 2e167aa 
  ql/src/test/results/compiler/plan/udf6.q.xml 286884a 
  ql/src/test/results/compiler/plan/udf_case.q.xml 5b73066 
  ql/src/test/results/compiler/plan/udf_when.q.xml 40dfca6 
  ql/src/test/results/compiler/plan/union.q.xml 4bc7d89 
  ql/src/test/templates/TestParse.vm cf860ac 
  ql/src/test/templates/TestParseNegative.vm 48a0031 

Diff: https://reviews.apache.org/r/518/diff


Testing
---

Tests added. Authentication failures are now possible now that outputs are set 
properly.


Thanks,

Krishna

Review Request: Fixes for (a) removing redundant synchronized (b) calculating and writing the correct record length and (c) making the layout and the key/value classes actually sequencefile compliant

2011-03-28 Thread Krishna


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/529/
---

Review request for hive and Yongqiang He.


Summary
---

Patch for HIVE-2065


This addresses bug HIVE-2065.
https://issues.apache.org/jira/browse/HIVE-2065


Diffs
-

  build-common.xml 9f21a69 
  data/files/test_v6_compressed.rc PRE-CREATION 
  data/files/test_v6_uncompressed.rc PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java eb5305b 
  
ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java
 20d1f4e 
  
ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java
 f7eacdc 
  ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 
bb1e3c9 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 8bb6f3a 
  ql/src/test/results/clientpositive/alter_merge.q.out 25f36c0 
  ql/src/test/results/clientpositive/alter_merge_stats.q.out 243f7cc 
  ql/src/test/results/clientpositive/partition_wise_fileformat.q.out cee2e72 
  ql/src/test/results/clientpositive/partition_wise_fileformat3.q.out 067ab43 
  ql/src/test/results/clientpositive/sample10.q.out 50406c3 

Diff: https://reviews.apache.org/r/529/diff


Testing
---

Tests added, existing tests updated


Thanks,

Krishna

Re: Review Request: Fixes for (a) removing redundant synchronized (b) calculating and writing the correct record length and (c) making the layout and the key/value classes actually sequencefile compli

2011-04-06 Thread Krishna


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/529/
---

(Updated 2011-04-06 17:13:30.910168)


Review request for hive and Yongqiang He.


Changes
---

Updated patch where sequence file compliance is not addressed but the other two 
issues are. 


Summary
---

Patch for HIVE-2065


This addresses bug HIVE-2065.
https://issues.apache.org/jira/browse/HIVE-2065


Diffs (updated)
-

  build-common.xml 9f21a69 
  data/files/test_v6dot0_compressed.rc PRE-CREATION 
  data/files/test_v6dot0_uncompressed.rc PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java eb5305b 
  
ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java
 20d1f4e 
  
ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileKeyBufferWrapper.java
 f7eacdc 
  ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java 
bb1e3c9 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java 8bb6f3a 
  ql/src/test/results/clientpositive/alter_merge.q.out 25f36c0 
  ql/src/test/results/clientpositive/alter_merge_stats.q.out 243f7cc 
  ql/src/test/results/clientpositive/partition_wise_fileformat.q.out cee2e72 
  ql/src/test/results/clientpositive/partition_wise_fileformat3.q.out 067ab43 
  ql/src/test/results/clientpositive/sample10.q.out 50406c3 

Diff: https://reviews.apache.org/r/529/diff


Testing
---

Tests added, existing tests updated


Thanks,

Krishna

Review Request: Add LazyBinaryColumnarSerDe

2011-05-31 Thread Krishna


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/806/
---

Review request for hive and Yongqiang He.


Summary
---

Add LazyBinaryColumnarSerDe


This addresses bug HIVE-956.
https://issues.apache.org/jira/browse/HIVE-956


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 77a6dc6 
  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStruct.java 
b062460 
  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStructBase.java 
PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarSerDe.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarStruct.java
 PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java e927547 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObject.java 2e2896c 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObjectBase.java 
PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryFactory.java 
1440472 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryObject.java 
ea20b34 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java 
5e6bb0a 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ColumnarStructObjectInspector.java
 66f4f8d 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/CrossMapEqualComparer.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/MapEqualComparer.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java
 90561a1 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
 2b77072 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/SimpleMapEqualComparer.java
 PRE-CREATION 
  
serde/src/test/org/apache/hadoop/hive/serde2/columnar/TestLazyBinaryColumnarSerDe.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/806/diff


Testing
---

Tests added


Thanks,

Krishna

Re: Review Request: Add LazyBinaryColumnarSerDe

2011-06-02 Thread Krishna


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/806/
---

(Updated 2011-06-02 12:00:23.653491)


Review request for hive and Yongqiang He.


Changes
---

Uses a special marker for empty strings, thereby incurring no additional cost 
for normal (non-null, non-empty) strings.


Summary
---

Add LazyBinaryColumnarSerDe


This addresses bug HIVE-956.
https://issues.apache.org/jira/browse/HIVE-956


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 77a6dc6 
  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStruct.java 
b062460 
  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStructBase.java 
PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarSerDe.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarStruct.java
 PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java e927547 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObject.java 2e2896c 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObjectBase.java 
PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryFactory.java 
1440472 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryObject.java 
ea20b34 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java 
5e6bb0a 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ColumnarStructObjectInspector.java
 66f4f8d 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/CrossMapEqualComparer.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/MapEqualComparer.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java
 90561a1 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
 2b77072 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/SimpleMapEqualComparer.java
 PRE-CREATION 
  
serde/src/test/org/apache/hadoop/hive/serde2/columnar/TestLazyBinaryColumnarSerDe.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/806/diff


Testing
---

Tests added


Thanks,

Krishna

Re: Review Request: Add LazyBinaryColumnarSerDe

2011-06-08 Thread Krishna


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/806/
---

(Updated 2011-06-08 16:04:08.811137)


Review request for hive and Yongqiang He.


Changes
---

Updating review comments re toString()


Summary
---

Add LazyBinaryColumnarSerDe


This addresses bug HIVE-956.
https://issues.apache.org/jira/browse/HIVE-956


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 77a6dc6 
  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStruct.java 
e79021d 
  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStructBase.java 
PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarSerDe.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarStruct.java
 PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java e927547 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObject.java 2e2896c 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObjectBase.java 
PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryFactory.java 
1440472 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryObject.java 
ea20b34 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java 
4285ab3 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ColumnarStructObjectInspector.java
 66f4f8d 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/CrossMapEqualComparer.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/MapEqualComparer.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java
 90561a1 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
 2b77072 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/SimpleMapEqualComparer.java
 PRE-CREATION 
  
serde/src/test/org/apache/hadoop/hive/serde2/columnar/TestLazyBinaryColumnarSerDe.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/806/diff


Testing
---

Tests added


Thanks,

Krishna

Review Request: Patch for Hive-2209, extending ObjectInspectorUtils.compare with some map comparison implementations

2011-06-10 Thread Krishna


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/879/
---

Review request for hive and Yongqiang He.


Summary
---

Patch for HIVE-2209


Diffs
-

  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/CrossMapEqualComparer.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/MapEqualComparer.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
 2b77072 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/SimpleMapEqualComparer.java
 PRE-CREATION 
  
serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestCrossMapEqualcomparer.java
 PRE-CREATION 
  
serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestSimpleMapEqualcomparer.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/879/diff


Testing
---

Tests added


Thanks,

Krishna

Re: Review Request: Patch for Hive-2209, extending ObjectInspectorUtils.compare with some map comparison implementations

2011-06-17 Thread Krishna


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/879/
---

(Updated 2011-06-17 07:52:38.058921)


Review request for hive and Yongqiang He.


Changes
---

Added a complete compare implementation too, with sorting of the keys 


Summary
---

Patch for HIVE-2209


Diffs (updated)
-

  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/CrossMapEqualComparer.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/FullMapEqualComparer.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/MapEqualComparer.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
 2b77072 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/SimpleMapEqualComparer.java
 PRE-CREATION 
  
serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestCrossMapEqualcomparer.java
 PRE-CREATION 
  
serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestFullMapEqualcomparer.java
 PRE-CREATION 
  
serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestSimpleMapEqualcomparer.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/879/diff


Testing
---

Tests added


Thanks,

Krishna

Re: Review Request: Patch for Hive-2209, extending ObjectInspectorUtils.compare with some map comparison implementations

2011-06-20 Thread Krishna


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/879/
---

(Updated 2011-06-20 12:54:09.245202)


Review request for hive and Yongqiang He.


Changes
---

Fixed a lowercase/uppercase typo in the test classes 


Summary
---

Patch for HIVE-2209


Diffs (updated)
-

  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/CrossMapEqualComparer.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/FullMapEqualComparer.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/MapEqualComparer.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
 2b77072 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/SimpleMapEqualComparer.java
 PRE-CREATION 
  
serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestCrossMapEqualComparer.java
 PRE-CREATION 
  
serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestFullMapEqualComparer.java
 PRE-CREATION 
  
serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestSimpleMapEqualComparer.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/879/diff


Testing
---

Tests added


Thanks,

Krishna

Re: Review Request: Add LazyBinaryColumnarSerDe

2011-06-20 Thread Krishna


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/806/
---

(Updated 2011-06-20 12:56:38.943799)


Review request for hive and Yongqiang He.


Changes
---

After separating out mapcomparer changes to its own patch


Summary
---

Add LazyBinaryColumnarSerDe


This addresses bug HIVE-956.
https://issues.apache.org/jira/browse/HIVE-956


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 77a6dc6 
  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStruct.java 
e79021d 
  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStructBase.java 
PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarSerDe.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarStruct.java
 PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java e927547 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObject.java 2e2896c 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObjectBase.java 
PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryFactory.java 
1440472 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryObject.java 
ea20b34 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java 
4285ab3 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ColumnarStructObjectInspector.java
 66f4f8d 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorFactory.java
 90561a1 
  
serde/src/test/org/apache/hadoop/hive/serde2/columnar/TestLazyBinaryColumnarSerDe.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/806/diff


Testing
---

Tests added


Thanks,

Krishna

Re: Review Request: Patch for Hive-2209, extending ObjectInspectorUtils.compare with some map comparison implementations

2011-07-19 Thread Krishna


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/879/
---

(Updated 2011-07-20 02:25:36.169590)


Review request for hive and Yongqiang He.


Summary
---

Patch for HIVE-2209


This addresses bug HIVE-2209.
https://issues.apache.org/jira/browse/HIVE-2209


Diffs
-

  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/CrossMapEqualComparer.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/FullMapEqualComparer.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/MapEqualComparer.java
 PRE-CREATION 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java
 2b77072 
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/SimpleMapEqualComparer.java
 PRE-CREATION 
  
serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestCrossMapEqualComparer.java
 PRE-CREATION 
  
serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestFullMapEqualComparer.java
 PRE-CREATION 
  
serde/src/test/org/apache/hadoop/hive/serde2/objectinspector/TestSimpleMapEqualComparer.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/879/diff


Testing
---

Tests added


Thanks,

Krishna

HIVE-4053 | Review request

2013-02-22 Thread Krishna

Hi,

I've implemented 'Refined Soundex' algorithm using a GenericUDF and would
like to share it for a review by experts as I'm a newbie.

Change Details:
A new java class is created: GenericUDFRefinedSoundex.java
Add a entry to FunctionRegistry.java: registerGenericUDF("soundex_ref",
GenericUDFRefinedSoundex.class);

Both files are attached to the email.

I'm planning to implement other phonetic algorithms and submit all as a
single patch. I understand there are many other steps that I need to finish
before a patch is ready but for now, if you could review the attached code
and provide feedback, it'll be great.

Here are the details of Refined Soundex algorithm:
First letter is stored
Subsequent letters are replaced by numbers as defined below-
 * B, P => 1
 * F, V => 2
 * C, K, S => 3
 * G, J => 4
 * Q, X, Z => 5
 * D, T => 6
 * L => 7
 * M, N => 8
 * R => 9
 * Other letters => 0
Consecutive letters belonging to the same group are replaced by one letter

Example:
> SELECT soundex_ref('Carren') FROM src LIMIT 1;
> C30908

Thanks,
Krishna

dev-ow...@hive.apache.org.

2014-12-04 Thread Mohan Krishna

Can Hive handles Unstructured data  o it handles only structured data?
Please confirm


Thanks
Mohan

Re: dev-ow...@hive.apache.org.

2014-12-04 Thread Mohan Krishna

Thanks alan for the answer/
So, can i conclude that Hive handles unstructured data?


On Thu, Dec 4, 2014 at 10:06 PM, Alan Gates  wrote:

> Define unstructured.  Hive can handle data such Avro or JSON, which I
> would call self-structured.  I believe the SerDes for these types can even
> set the schema for the table or partition you are reading based on the data
> in the file.
>
> Alan.
>
>   Mohan Krishna 
>  December 3, 2014 at 17:01
> Can Hive handles Unstructured data o it handles only structured data?
> Please confirm
>
>
> Thanks
> Mohan
>
>
> --
> Sent with Postbox <http://www.getpostbox.com>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: dev-ow...@hive.apache.org.

2014-12-04 Thread Mohan Krishna

Thankyou Bill
Now it is clear for me, Thanks



On Fri, Dec 5, 2014 at 12:54 AM, Bill Busch 
wrote:

> Mohan,
>
> It will handle it, but it is probably (depending on your use case) not
> optimal.  Hive's sweat spot is structured data.
>
> Bill
>
> Thank You,
> Follow me on   @BigData73
>
> -
> Bill Busch | SSA | Enterprise Information Solutions CWP
> m: 704.806.2485 |  NASDAQ: PRFT  |  Perficient.com
>
>
>
>
> BI/DW | Advanced Analytics | Big Data |  ECI|  EPM | MDM
>
> -Original Message-
> From: Mohan Krishna [mailto:mohan.25fe...@gmail.com]
> Sent: Thursday, December 04, 2014 1:09 PM
> To: dev@hive.apache.org
> Subject: Re: dev-ow...@hive.apache.org.
>
> Thanks alan for the answer/
> So, can i conclude that Hive handles unstructured data?
>
>
> On Thu, Dec 4, 2014 at 10:06 PM, Alan Gates  wrote:
>
> > Define unstructured.  Hive can handle data such Avro or JSON, which I
> > would call self-structured.  I believe the SerDes for these types can
> > even set the schema for the table or partition you are reading based
> > on the data in the file.
> >
> > Alan.
> >
> >   Mohan Krishna   December 3, 2014 at 17:01
> > Can Hive handles Unstructured data o it handles only structured data?
> > Please confirm
> >
> >
> > Thanks
> > Mohan
> >
> >
> > --
> > Sent with Postbox <http://www.getpostbox.com>
> >
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or
> > entity to which it is addressed and may contain information that is
> > confidential, privileged and exempt from disclosure under applicable
> > law. If the reader of this message is not the intended recipient, you
> > are hereby notified that any printing, copying, dissemination,
> > distribution, disclosure or forwarding of this communication is
> > strictly prohibited. If you have received this communication in error,
> > please contact the sender immediately and delete it from your system.
> Thank You.
>

Re: [ANNOUNCE] New Hive PMC Member - Prasad Mujumdar

2014-12-10 Thread Mohan Krishna

Congrats Prasad

On Wed, Dec 10, 2014 at 3:47 AM, Carl Steinbach  wrote:

> I am pleased to announce that Prasad Mujumdar has been elected to the Hive
> Project Management Committee. Please join me in congratulating Prasad!
>
> Thanks.
>
> - Carl
>

What more Hive can do when compared to PIG

2014-12-16 Thread Mohan Krishna

*Hello all*

*Can somebody help me in getting the answer for the below question*

*Its regarding PIG vs HIVE:*

We knew that PIG for large data sets analysis and Hive is good at data
summrization and adhoc queries. But,I want to know , an usecase where Hive
can handle it and the same can not be acheived with PIG

I mean to say, what more a HIve query can achieve when the same is not
possible with PIG latin script

if possible i want to know the viceversa case as well


Thanks
Mohan
469-274-5677

Some queries re locking

2011-02-16 Thread Krishna Kumar

Hello,

While looking into some of the tangential issues encountered while doing
the export/import related work, I have some questions:

1. Should "CREATE TABLE" lock (shared) the database? I think so from the
discussions, but I do not think it happens now.

2. Similarly "LOAD" should also lock (exclusive) the table/partition by
adding the table/partition to the outputs.

3. While trying a fix for the above, I ran into another issue. IIUC,
Test[Negative]CliDriver templates starts the zkcluster via QTestUtil ctor,
but this is immediately shutdown via cleanup->teardown call, so most of the
create/loads in createSources run without a zookeeper server, so any attempt
to lock errors out. Is this by intent?

Cheers
 Krishna

RCFile - some queries

2011-03-18 Thread Krishna Kumar

Hello,

I was looking into the RCFile format, esp when used with compression; a
picture of the file layout as I understand it in this case is attached.

Some queries/potential issues:

1. RCFile makes a claim of being sequence file compatible; but the
recordLength is not the actual on-disk length of the record. As shown in the
picture, it is the uncompressed key length plus the compressed value length.
Similarly, the next field - key length - is not the on-disk length of the
compressed key.

2. Record Length is also used for seeking on the inputstream. See
Reader.seekToNextKeyBuffer(). Since record length is overstated for
compressed records, this can result in incorrect positioning.

3. Thread-Safety: Is the RCFile.Reader class meant to be thread-safe?
Some public methods are marked synchronized which gives that appearance but
there are a few thread-safety issues I think.

3.1 Other public methods, such as Reader.nextBlock() are not
synchronized which operate on the same data structures.

3.2. Callbacks such as LazyDecompressionCallbackImpl.decompress
operates on the valuebuffer currentValue, which can be simultaneously
modified by the public methods on the Reader.

Cheers,
 Krishna

Re: RCFile - some queries

2011-03-18 Thread Krishna Kumar

Hi yongqiang he,

Have created a bug https://issues.apache.org/jira/browse/HIVE-2065 to carry 
on the discussion. Have attached the picture there too: 
https://issues.apache.org/jira/secure/attachment/12474055/Slide1.png. (looks 
like attachments are stripped from posts here?)

Please comment there.

Cheers,
 Krishna


On 3/18/11 11:47 PM, "yongqiang he"  wrote:

>> but the recordLength is not the actual on-disk length of the record.
It is acutal on-disk length. It is compressed key length plus the
compressed value length

>>Similarly, the next field - key length - is not the on-disk length of the 
>>compressed key.

There are two keyLengths, one is compressed key length, the other is
uncompressed keyLength

For 2, it wo't be a problem. record length is compressed length

>>Thread-Safety.
It is not thread safe. Application should do it themselves.
 It is initially designed for Hive. Thread safety is there at first
time, and then removed because Hive does not need that, and
'synchronized' may need extra overhead

>>3.1
Reader.nextBlock() is later added for file merge. So the normal reader
should not use this method.
>>3.2.
True.

On Fri, Mar 18, 2011 at 8:30 AM, Krishna Kumar  wrote:
> Hello,
>
>I was looking into the RCFile format, esp when used with compression; a
> picture of the file layout as I understand it in this case is attached.
>
>Some queries/potential issues:
>
>1. RCFile makes a claim of being sequence file compatible; but the
> recordLength is not the actual on-disk length of the record. As shown in the
> picture, it is the uncompressed key length plus the compressed value length.
> Similarly, the next field - key length - is not the on-disk length of the
> compressed key.
>
>2. Record Length is also used for seeking on the inputstream. See
> Reader.seekToNextKeyBuffer(). Since record length is overstated for
> compressed records, this can result in incorrect positioning.
>
>3. Thread-Safety: Is the RCFile.Reader class meant to be thread-safe?
> Some public methods are marked synchronized which gives that appearance but
> there are a few thread-safety issues I think.
>
>3.1 Other public methods, such as Reader.nextBlock() are not
> synchronized which operate on the same data structures.
>
>3.2. Callbacks such as LazyDecompressionCallbackImpl.decompress
> operates on the valuebuffer currentValue, which can be simultaneously
> modified by the public methods on the Reader.
>
> Cheers,
>  Krishna
>
>

Re Stats Publishing /Aggregation

2011-06-20 Thread Krishna Kumar

Any reason why persistent stores such as jdbc and hbase are supported for 
temporary stats storage IIUC, but hadoop counters were not used for the tasks 
to 'publish' their stats for the aggregation task to pick it up from?

Cheers,
 Krishna

[jira] [Commented] (HIVE-4053) Add support for phonetic algorithms in Hive

2013-02-21 Thread Krishna (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583819#comment-13583819
 ] 

Krishna commented on HIVE-4053:
---

Soundex: http://en.wikipedia.org/wiki/Soundex
Daitch-Mokotoff Soundex: 
http://en.wikipedia.org/wiki/Daitch%E2%80%93Mokotoff_Soundex
New York State Identification and Intelligence System (NYSIIS): 
http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System
Caverphone: http://en.wikipedia.org/wiki/Caverphone


> Add support for phonetic algorithms in Hive
> ---
>
> Key: HIVE-4053
> URL: https://issues.apache.org/jira/browse/HIVE-4053
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 0.10.0
>Reporter: Krishna
>
> Following phonetic algorithms should be considered, which are very useful in 
> search:
> Soundex
> Daitch–Mokotoff Soundex
> Metaphone and Double Metaphone
> New York State Identification and Intelligence System (NYSIIS)
> Caverphone

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4053) Add support for phonetic algorithms in Hive

2013-02-21 Thread Krishna (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna updated HIVE-4053:
--

Affects Version/s: (was: 0.10.0)

> Add support for phonetic algorithms in Hive
> ---
>
> Key: HIVE-4053
> URL: https://issues.apache.org/jira/browse/HIVE-4053
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>    Reporter: Krishna
>
> Following phonetic algorithms should be considered, which are very useful in 
> search:
> Soundex
> Daitch–Mokotoff Soundex
> Metaphone and Double Metaphone
> New York State Identification and Intelligence System (NYSIIS)
> Caverphone

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4053) Add support for phonetic algorithms in Hive

2013-02-22 Thread Krishna (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna updated HIVE-4053:
--

Description: 
Following phonetic algorithms should be considered, which are very useful in 
search:
Soundex
Refined Soundex
Daitch–Mokotoff Soundex
Metaphone and Double Metaphone
New York State Identification and Intelligence System (NYSIIS)
Caverphone


  was:
Following phonetic algorithms should be considered, which are very useful in 
search:
Soundex
Daitch–Mokotoff Soundex
Metaphone and Double Metaphone
New York State Identification and Intelligence System (NYSIIS)
Caverphone



> Add support for phonetic algorithms in Hive
> ---
>
> Key: HIVE-4053
> URL: https://issues.apache.org/jira/browse/HIVE-4053
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>    Reporter: Krishna
>
> Following phonetic algorithms should be considered, which are very useful in 
> search:
> Soundex
> Refined Soundex
> Daitch–Mokotoff Soundex
> Metaphone and Double Metaphone
> New York State Identification and Intelligence System (NYSIIS)
> Caverphone

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4053) Add support for phonetic algorithms in Hive

2013-02-22 Thread Krishna (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585025#comment-13585025
 ] 

Krishna commented on HIVE-4053:
---

I've implemented 'Refined Soundex' algorithm using a GenericUDF and would like 
to share it for a review by experts as I'm a newbie.

Change Details:
A new java class is created: GenericUDFRefinedSoundex.java
Add a entry to FunctionRegistry.java: registerGenericUDF("soundex_ref", 
GenericUDFRefinedSoundex.class);

Both files are attached to the email.

I'm planning to implement other phonetic algorithms and submit all as a single 
patch. I understand there are many other steps that I need to finish before a 
patch is ready but for now, if you could review the attached code and provide 
feedback, it'll be great.

Here are the details of Refined Soundex algorithm:
First letter is stored
Subsequent letters are replaced by numbers as defined below-
 * B, P => 1
 * F, V => 2
 * C, K, S => 3
 * G, J => 4
 * Q, X, Z => 5
 * D, T => 6
 * L => 7
 * M, N => 8
 * R => 9
 * Other letters => 0
Consecutive letters belonging to the same group are replaced by one letter

Example: 
> SELECT soundex_ref('Carren') FROM src LIMIT 1;
> C30908

> Add support for phonetic algorithms in Hive
> ---
>
> Key: HIVE-4053
> URL: https://issues.apache.org/jira/browse/HIVE-4053
>     Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Reporter: Krishna
> Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java
>
>
> Following phonetic algorithms should be considered, which are very useful in 
> search:
> Soundex
> Refined Soundex
> Daitch–Mokotoff Soundex
> Metaphone and Double Metaphone
> New York State Identification and Intelligence System (NYSIIS)
> Caverphone

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4053) Add support for phonetic algorithms in Hive

2013-02-22 Thread Krishna (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna updated HIVE-4053:
--

Attachment: GenericUDFRefinedSoundex.java
FunctionRegistry.java

> Add support for phonetic algorithms in Hive
> ---
>
> Key: HIVE-4053
> URL: https://issues.apache.org/jira/browse/HIVE-4053
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>    Reporter: Krishna
> Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java
>
>
> Following phonetic algorithms should be considered, which are very useful in 
> search:
> Soundex
> Refined Soundex
> Daitch–Mokotoff Soundex
> Metaphone and Double Metaphone
> New York State Identification and Intelligence System (NYSIIS)
> Caverphone

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4053) Add support for phonetic algorithms in Hive

2013-02-25 Thread Krishna (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna updated HIVE-4053:
--

Attachment: HIVE-4053.1.patch.txt

> Add support for phonetic algorithms in Hive
> ---
>
> Key: HIVE-4053
> URL: https://issues.apache.org/jira/browse/HIVE-4053
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>    Reporter: Krishna
> Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java, 
> HIVE-4053.1.patch.txt
>
>
> Following phonetic algorithms should be considered, which are very useful in 
> search:
> Soundex
> Refined Soundex
> Daitch–Mokotoff Soundex
> Metaphone and Double Metaphone
> New York State Identification and Intelligence System (NYSIIS)
> Caverphone

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4053) Add support for phonetic algorithms in Hive

2013-02-25 Thread Krishna (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586422#comment-13586422
 ] 

Krishna commented on HIVE-4053:
---

I've attached the patch to JIRA. How do I post it for review on reviewboard?

> Add support for phonetic algorithms in Hive
> ---
>
> Key: HIVE-4053
> URL: https://issues.apache.org/jira/browse/HIVE-4053
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Reporter: Krishna
> Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java, 
> HIVE-4053.1.patch.txt
>
>
> Following phonetic algorithms should be considered, which are very useful in 
> search:
> Soundex
> Refined Soundex
> Daitch–Mokotoff Soundex
> Metaphone and Double Metaphone
> New York State Identification and Intelligence System (NYSIIS)
> Caverphone

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4053) Add support for phonetic algorithms in Hive

2013-02-25 Thread Krishna (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna updated HIVE-4053:
--

Fix Version/s: 0.10.0
   Labels: patch  (was: )
Affects Version/s: 0.10.0
 Release Note: Implementation of the phonetic algorithm - Refined 
Soundex
   Status: Patch Available  (was: Open)

> Add support for phonetic algorithms in Hive
> ---
>
> Key: HIVE-4053
> URL: https://issues.apache.org/jira/browse/HIVE-4053
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 0.10.0
>Reporter: Krishna
>  Labels: patch
> Fix For: 0.10.0
>
> Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java, 
> HIVE-4053.1.patch.txt
>
>
> Following phonetic algorithms should be considered, which are very useful in 
> search:
> Soundex
> Refined Soundex
> Daitch–Mokotoff Soundex
> Metaphone and Double Metaphone
> New York State Identification and Intelligence System (NYSIIS)
> Caverphone

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4053) Add support for phonetic algorithms in Hive

2013-02-25 Thread Krishna (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586499#comment-13586499
 ] 

Krishna commented on HIVE-4053:
---

I have submitted the patch; please review the code.

> Add support for phonetic algorithms in Hive
> ---
>
> Key: HIVE-4053
> URL: https://issues.apache.org/jira/browse/HIVE-4053
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 0.10.0
>Reporter: Krishna
>  Labels: patch
> Fix For: 0.10.0
>
> Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java, 
> HIVE-4053.1.patch.txt
>
>
> Following phonetic algorithms should be considered, which are very useful in 
> search:
> Soundex
> Refined Soundex
> Daitch–Mokotoff Soundex
> Metaphone and Double Metaphone
> New York State Identification and Intelligence System (NYSIIS)
> Caverphone

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4053) Add support for phonetic algorithms in Hive

2013-02-26 Thread Krishna (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna updated HIVE-4053:
--

Description: 
Following phonetic algorithms should be considered, which are very useful in 
search:
Soundex
Refined Soundex
Daitch–Mokotoff Soundex
Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone
New York State Identification and Intelligence System (NYSIIS)
Caverphone


  was:
Following phonetic algorithms should be considered, which are very useful in 
search:
Soundex
Refined Soundex
Daitch–Mokotoff Soundex
Metaphone and Double Metaphone
New York State Identification and Intelligence System (NYSIIS)
Caverphone



> Add support for phonetic algorithms in Hive
> ---
>
> Key: HIVE-4053
> URL: https://issues.apache.org/jira/browse/HIVE-4053
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 0.10.0
>Reporter: Krishna
>  Labels: patch
> Fix For: 0.10.0
>
> Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java, 
> HIVE-4053.1.patch.txt
>
>
> Following phonetic algorithms should be considered, which are very useful in 
> search:
> Soundex
> Refined Soundex
> Daitch–Mokotoff Soundex
> Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone
> New York State Identification and Intelligence System (NYSIIS)
> Caverphone

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4053) Add support for phonetic algorithms in Hive

2013-02-26 Thread Krishna (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna updated HIVE-4053:
--

Description: 
Following phonetic algorithms should be considered, which are very useful in 
search:
Soundex: ttp://en.wikipedia.org/wiki/Soundex
Refined Soundex
Daitch–Mokotoff Soundex
Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone
New York State Identification and Intelligence System (NYSIIS)
Caverphone


  was:
Following phonetic algorithms should be considered, which are very useful in 
search:
Soundex
Refined Soundex
Daitch–Mokotoff Soundex
Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone
New York State Identification and Intelligence System (NYSIIS)
Caverphone



> Add support for phonetic algorithms in Hive
> ---
>
> Key: HIVE-4053
> URL: https://issues.apache.org/jira/browse/HIVE-4053
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 0.10.0
>Reporter: Krishna
>  Labels: patch
> Fix For: 0.10.0
>
> Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java, 
> HIVE-4053.1.patch.txt
>
>
> Following phonetic algorithms should be considered, which are very useful in 
> search:
> Soundex: ttp://en.wikipedia.org/wiki/Soundex
> Refined Soundex
> Daitch–Mokotoff Soundex
> Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone
> New York State Identification and Intelligence System (NYSIIS)
> Caverphone

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4053) Add support for phonetic algorithms in Hive

2013-02-26 Thread Krishna (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna updated HIVE-4053:
--

Description: 
Following phonetic algorithms should be considered, which are very useful in 
search:
Soundex: http://en.wikipedia.org/wiki/Soundex
Refined Soundex
Daitch–Mokotoff Soundex
Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone
New York State Identification and Intelligence System (NYSIIS): 
http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System
Caverphone


  was:
Following phonetic algorithms should be considered, which are very useful in 
search:
Soundex: ttp://en.wikipedia.org/wiki/Soundex
Refined Soundex
Daitch–Mokotoff Soundex
Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone
New York State Identification and Intelligence System (NYSIIS)
Caverphone



> Add support for phonetic algorithms in Hive
> ---
>
> Key: HIVE-4053
> URL: https://issues.apache.org/jira/browse/HIVE-4053
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 0.10.0
>Reporter: Krishna
>  Labels: patch
> Fix For: 0.10.0
>
> Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java, 
> HIVE-4053.1.patch.txt
>
>
> Following phonetic algorithms should be considered, which are very useful in 
> search:
> Soundex: http://en.wikipedia.org/wiki/Soundex
> Refined Soundex
> Daitch–Mokotoff Soundex
> Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone
> New York State Identification and Intelligence System (NYSIIS): 
> http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System
> Caverphone

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4053) Add support for phonetic algorithms in Hive

2013-02-26 Thread Krishna (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna updated HIVE-4053:
--

Description: 
Following phonetic algorithms should be considered, which are very useful in 
search:
Soundex: http://en.wikipedia.org/wiki/Soundex
Refined Soundex: 
Daitch–Mokotoff Soundex
Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone
New York State Identification and Intelligence System (NYSIIS): 
http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System
Caverphone: http://en.wikipedia.org/wiki/Caverphone


  was:
Following phonetic algorithms should be considered, which are very useful in 
search:
Soundex: http://en.wikipedia.org/wiki/Soundex
Refined Soundex
Daitch–Mokotoff Soundex
Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone
New York State Identification and Intelligence System (NYSIIS): 
http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System
Caverphone



> Add support for phonetic algorithms in Hive
> ---
>
> Key: HIVE-4053
> URL: https://issues.apache.org/jira/browse/HIVE-4053
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 0.10.0
>Reporter: Krishna
>  Labels: patch
> Fix For: 0.10.0
>
> Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java, 
> HIVE-4053.1.patch.txt
>
>
> Following phonetic algorithms should be considered, which are very useful in 
> search:
> Soundex: http://en.wikipedia.org/wiki/Soundex
> Refined Soundex: 
> Daitch–Mokotoff Soundex
> Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone
> New York State Identification and Intelligence System (NYSIIS): 
> http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System
> Caverphone: http://en.wikipedia.org/wiki/Caverphone

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4053) Add support for phonetic algorithms in Hive

2013-02-26 Thread Krishna (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna updated HIVE-4053:
--

Description: 
Following phonetic algorithms should be considered, which are very useful in 
search:
Soundex: http://en.wikipedia.org/wiki/Soundex
Refined Soundex: Refer to the comment on 22/Feb/13 23:51
Daitch–Mokotoff Soundex
Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone
New York State Identification and Intelligence System (NYSIIS): 
http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System
Caverphone: http://en.wikipedia.org/wiki/Caverphone


  was:
Following phonetic algorithms should be considered, which are very useful in 
search:
Soundex: http://en.wikipedia.org/wiki/Soundex
Refined Soundex: 
Daitch–Mokotoff Soundex
Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone
New York State Identification and Intelligence System (NYSIIS): 
http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System
Caverphone: http://en.wikipedia.org/wiki/Caverphone



> Add support for phonetic algorithms in Hive
> ---
>
> Key: HIVE-4053
> URL: https://issues.apache.org/jira/browse/HIVE-4053
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 0.10.0
>Reporter: Krishna
>  Labels: patch
> Fix For: 0.10.0
>
> Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java, 
> HIVE-4053.1.patch.txt
>
>
> Following phonetic algorithms should be considered, which are very useful in 
> search:
> Soundex: http://en.wikipedia.org/wiki/Soundex
> Refined Soundex: Refer to the comment on 22/Feb/13 23:51
> Daitch–Mokotoff Soundex
> Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone
> New York State Identification and Intelligence System (NYSIIS): 
> http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System
> Caverphone: http://en.wikipedia.org/wiki/Caverphone

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4053) Add support for phonetic algorithms in Hive

2013-02-26 Thread Krishna (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna updated HIVE-4053:
--

Description: 
Following phonetic algorithms should be considered, which are very useful in 
search:
Soundex: http://en.wikipedia.org/wiki/Soundex
Refined Soundex: Refer to the comment on 22/Feb/13 23:51
Daitch–Mokotoff Soundex: 
http://en.wikipedia.org/wiki/Daitch%E2%80%93Mokotoff_Soundex
Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone
New York State Identification and Intelligence System (NYSIIS): 
http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System
Caverphone: http://en.wikipedia.org/wiki/Caverphone


  was:
Following phonetic algorithms should be considered, which are very useful in 
search:
Soundex: http://en.wikipedia.org/wiki/Soundex
Refined Soundex: Refer to the comment on 22/Feb/13 23:51
Daitch–Mokotoff Soundex
Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone
New York State Identification and Intelligence System (NYSIIS): 
http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System
Caverphone: http://en.wikipedia.org/wiki/Caverphone



> Add support for phonetic algorithms in Hive
> ---
>
> Key: HIVE-4053
> URL: https://issues.apache.org/jira/browse/HIVE-4053
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 0.10.0
>Reporter: Krishna
>  Labels: patch
> Fix For: 0.10.0
>
> Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java, 
> HIVE-4053.1.patch.txt
>
>
> Following phonetic algorithms should be considered, which are very useful in 
> search:
> Soundex: http://en.wikipedia.org/wiki/Soundex
> Refined Soundex: Refer to the comment on 22/Feb/13 23:51
> Daitch–Mokotoff Soundex: 
> http://en.wikipedia.org/wiki/Daitch%E2%80%93Mokotoff_Soundex
> Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone
> New York State Identification and Intelligence System (NYSIIS): 
> http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System
> Caverphone: http://en.wikipedia.org/wiki/Caverphone

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4053) Add support for phonetic algorithms in Hive

2013-02-26 Thread Krishna (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587827#comment-13587827
 ] 

Krishna commented on HIVE-4053:
---

There are 6 popular phonetic algorithms (as mentioned in JIRA description). I 
think, it's a good idea to implement all of them in Hive. There are 2 ways to 
implement:

Option 1: Write a separate GenericUDF for each algorithm and there'll be a 
separate Hive function for each algorithm

Option 2: Write one GenericUDF and use a parameter argument to this function 
for determining which algorithm is called.

I prefer to implement the algorithms using option (2) but if someone feels 
option (1) is better, please comment.

> Add support for phonetic algorithms in Hive
> ---
>
> Key: HIVE-4053
> URL: https://issues.apache.org/jira/browse/HIVE-4053
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 0.10.0
>Reporter: Krishna
>  Labels: patch
> Fix For: 0.10.0
>
> Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java, 
> HIVE-4053.1.patch.txt
>
>
> Following phonetic algorithms should be considered, which are very useful in 
> search:
> Soundex: http://en.wikipedia.org/wiki/Soundex
> Refined Soundex: Refer to the comment on 22/Feb/13 23:51
> Daitch–Mokotoff Soundex: 
> http://en.wikipedia.org/wiki/Daitch%E2%80%93Mokotoff_Soundex
> Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone
> New York State Identification and Intelligence System (NYSIIS): 
> http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System
> Caverphone: http://en.wikipedia.org/wiki/Caverphone

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4053) Add support for phonetic algorithms in Hive

2013-03-02 Thread Krishna (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna updated HIVE-4053:
--

Status: Open  (was: Patch Available)

I will re-submit the patch

> Add support for phonetic algorithms in Hive
> ---
>
> Key: HIVE-4053
> URL: https://issues.apache.org/jira/browse/HIVE-4053
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Affects Versions: 0.10.0
>Reporter: Krishna
>  Labels: patch
> Fix For: 0.10.0
>
> Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java, 
> HIVE-4053.1.patch.txt
>
>
> Following phonetic algorithms should be considered, which are very useful in 
> search:
> Soundex: http://en.wikipedia.org/wiki/Soundex
> Refined Soundex: Refer to the comment on 22/Feb/13 23:51
> Daitch–Mokotoff Soundex: 
> http://en.wikipedia.org/wiki/Daitch%E2%80%93Mokotoff_Soundex
> Metaphone and Double Metaphone: http://en.wikipedia.org/wiki/Metaphone
> New York State Identification and Intelligence System (NYSIIS): 
> http://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelligence_System
> Caverphone: http://en.wikipedia.org/wiki/Caverphone

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

hive pull request: Kk wb 1228

2013-08-28 Thread krishna-verticloud

GitHub user krishna-verticloud opened a pull request:

https://github.com/apache/hive/pull/11

Kk wb 1228



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/VertiPub/hive kk-WB-1228

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/11.patch

hive pull request: Kk wb 1228

2013-08-28 Thread krishna-verticloud

Github user krishna-verticloud closed the pull request at:

https://github.com/apache/hive/pull/11

[jira] Created: (HIVE-1918) Add export/import facilities to the hive system

2011-01-17 Thread Krishna Kumar (JIRA)

Add export/import facilities to the hive system
---

 Key: HIVE-1918
 URL: https://issues.apache.org/jira/browse/HIVE-1918
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Krishna Kumar


This is an enhancement request to add export/import features to hive.

With this language extension, the user can export the data of the table - which 
may be located in different hdfs locations in case of a partitioned table - as 
well as the metadata of the table into a specified output location. This output 
location can then be moved over to another different hadoop/hive instance and 
imported there.  

This should work independent of the source and target metastore dbms used; for 
instance, between derby and mysql.

For partitioned tables, the ability to export/import a subset of the partition 
must be supported.

Howl will add more features on top of this: The ability to create/use the 
exported data even in the absence of hive, using MR or Pig. Please see 
http://wiki.apache.org/pig/Howl/HowlImportExport for these details.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system

2011-01-18 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-1918:


Status: Patch Available  (was: Open)

Patch for adding export/import.

> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>    Reporter: Krishna Kumar
> Attachments: HIVE-1918.patch.txt
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system

2011-01-18 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-1918:


Attachment: HIVE-1918.patch.txt

> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>    Reporter: Krishna Kumar
> Attachments: HIVE-1918.patch.txt
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system

2011-01-18 Thread Krishna Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983532#action_12983532
 ] 

Krishna Kumar commented on HIVE-1918:
-

Design notes:

 - Export/Import modeled on existing load functionality. No new tasks added, 
but existing tasks for copy/move/create table/add partition et al reused.

  - EXPORT TABLE table [PARTITION (partition_col=partition_colval, ...) ] TO 
location
  - IMPORT [[EXTERNAL] TABLE table [PARTITION (partition_col=partition_colval, 
...)] ] FROM sourcelocation [LOCATION targetlocation] 

 - The data/metadata stored as an xml-serialized file for the metadata in the 
target directory plus sub-directories for the data files.

> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Krishna Kumar
> Attachments: HIVE-1918.patch.txt
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system

2011-01-20 Thread Krishna Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12984128#action_12984128
 ] 

Krishna Kumar commented on HIVE-1918:
-

Ok. Will take of this via a delegating ctor.

A process question: I guess I should wait for more comments from other 
reviewers before I create another patch in case if others are reviewing the 
current patch?

> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Krishna Kumar
> Attachments: HIVE-1918.patch.txt
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system

2011-01-20 Thread Krishna Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12984563#action_12984563
 ] 

Krishna Kumar commented on HIVE-1918:
-

Why export/import needs this change: It is not the export part, but rather the 
import part which needs this change. While creating a partition as part of an 
import, we need to be able to create the partition along with its ancillary 
data including partition parameters. But first part of the existing "create 
partition" flow (AddPartitionDesc -> DDLTask.addPartition -> 
Hive.createPartition) did not support partition params specification but the 
second part (metastore.api.Partition -> IMetaStoreClient.add_partition -> 
HiveMetaStore.HMSHandler.add_partition -> ObjectStore.addPartition) does. So I 
added the ability to pass the partition parameters along in the first part of 
the flow.

In terms of options for compatible changes, there are two I can see:

1. The solution suggested above. Add an additional ctor so that no existing 
code breaks.

{noformat}
public Partition(Table tbl, Map partSpec, Path location) {
  this(tbl, partSpec, location, null);
}

public Partition(Table tbl, Map partSpec, Path location, 
Map partParams) {...}
{noformat}

2. Have only the current ctor but in Hive.createPartition get the underlying 
metastore.api.Partition and set the parameters to it before passing it on to 
the metastoreClient.

Thoughts?

> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Krishna Kumar
> Attachments: HIVE-1918.patch.txt
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HIVE-1924) checkformat implementations leak handles

2011-01-23 Thread Krishna Kumar (JIRA)

checkformat implementations leak handles


 Key: HIVE-1924
 URL: https://issues.apache.org/jira/browse/HIVE-1924
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Krishna Kumar


In validateInput, Reader constructors of SequenceFile and RCFile throw 
exceptions to indicate that the format is incorrect, but the close is not 
called in a finally block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1924) checkformat implementations leak handles

2011-01-23 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-1924:


Status: Patch Available  (was: Open)

Not sure how to test this automatically, since the readers are transient. The 
way I actually tested is by
 - running on a nfs mounted directory (since nfs creates .nfsxx files for 
open files which have been deleted)
 - pausing the code after a load command (with conf hive. is executed
 - using lsof to list open files by the process

> checkformat implementations leak handles
> 
>
> Key: HIVE-1924
> URL: https://issues.apache.org/jira/browse/HIVE-1924
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>    Reporter: Krishna Kumar
>
> In validateInput, Reader constructors of SequenceFile and RCFile throw 
> exceptions to indicate that the format is incorrect, but the close is not 
> called in a finally block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1924) checkformat implementations leak handles

2011-01-23 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-1924:


Attachment: HIVE.1924.patch.txt

Tested only manually with hive.fileformat.check set to true

> checkformat implementations leak handles
> 
>
> Key: HIVE-1924
> URL: https://issues.apache.org/jira/browse/HIVE-1924
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>    Reporter: Krishna Kumar
> Attachments: HIVE.1924.patch.txt
>
>
> In validateInput, Reader constructors of SequenceFile and RCFile throw 
> exceptions to indicate that the format is incorrect, but the close is not 
> called in a finally block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system

2011-01-24 Thread Krishna Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985556#action_12985556
 ] 

Krishna Kumar commented on HIVE-1918:
-

@Edward: Both the existing data model (prettified er diagram attached) and the 
object model (class org.apache.hadoop.hive.metastore.api.Partition) allow the 
specification of parameters on a per-partition basis. So I am not adding new 
fields to either of these models. By proposal 2 above, I will not be adding any 
ctor parameters to  org.apache.hadoop.hive.ql.metadata.Partition as well. 

Your point re providing manageability via ddl statements to all aspects of the 
data/object model is taken. But I am not adding new aspects to either model, so 
if indeed we need to address current manageability gaps, should they not be 
addressed via another enhancement request, rather than this one, which aims 
simply to add export/import facilities?

> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Krishna Kumar
> Attachments: HIVE-1918.patch.txt
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system

2011-01-24 Thread Krishna Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985557#action_12985557
 ] 

Krishna Kumar commented on HIVE-1918:
-

@Carl: 

  1. Taken care in the new patch.
  2. Can you post some of the diffs that you get failures on? I had a problem 
with running the tests on nfs mounted directories. That had to do with an 
existing bug in the load functionality. This used to result in a 
"MetaException: could not delete dir" error while trying to cleanup the effects 
of the previous test. I have created a separate jira HIVE-1924 for this and 
have attached a patch.
  3. Have taken the whitelist approach, the whitelist now set as "hdfs,pfile". 

> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Krishna Kumar
> Attachments: HIVE-1918.patch.txt
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system

2011-01-24 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-1918:


Attachment: HIVE-1918.patch.1.txt

A quick summary of the second derivative (difference between diffs)
 - used no-prefix while generating patch
 - hive.test.exim replaced by hive.exim.uri.scheme.whitelist
 - schemaCompare, initializeFromUrl, validateTable all refactored to util 
methods
 - trailing spaces in some test files removed

> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>    Reporter: Krishna Kumar
> Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.txt
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system

2011-01-24 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-1918:


Attachment: hive-metastore-er.pdf

Prettified ER diagram of the existing data model

> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>    Reporter: Krishna Kumar
> Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.txt, 
> hive-metastore-er.pdf
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system

2011-01-24 Thread Krishna Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985565#action_12985565
 ] 

Krishna Kumar commented on HIVE-1918:
-

@Namit: 

1. Do you have any ideas re how we can get an unique, temporary directory name 
for use in the test script files? In code of course we can use the 
getScratchDir methods, but how to solve this problem in these test scripts?

2. Export/Import, as in the case of Load, operates at file level rather than at 
record level. So there are no record-level filters available.

> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Krishna Kumar
> Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.txt, 
> hive-metastore-er.pdf
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1924) checkformat implementations leak handles

2011-01-24 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-1924:


Status: Open  (was: Patch Available)

Hmm, The patch is not correct. Setting hive.checkformat=false does seem to stop 
the leaks though. Investigating...

> checkformat implementations leak handles
> 
>
> Key: HIVE-1924
> URL: https://issues.apache.org/jira/browse/HIVE-1924
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>    Reporter: Krishna Kumar
> Attachments: HIVE.1924.patch.txt
>
>
> In validateInput, Reader constructors of SequenceFile and RCFile throw 
> exceptions to indicate that the format is incorrect, but the close is not 
> called in a finally block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HIVE-1924) checkformat implementations leak handles

2011-01-24 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar resolved HIVE-1924.
-

Resolution: Duplicate

Problem referenced in HADOOP-5476 and HIVE-1185

> checkformat implementations leak handles
> 
>
> Key: HIVE-1924
> URL: https://issues.apache.org/jira/browse/HIVE-1924
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>    Reporter: Krishna Kumar
> Attachments: HIVE.1924.patch.txt
>
>
> In validateInput, Reader constructors of SequenceFile and RCFile throw 
> exceptions to indicate that the format is incorrect, but the close is not 
> called in a finally block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system

2011-01-31 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-1918:


Attachment: HIVE-1918.patch.2.txt

Patch including 
 -  no changes to ql.metadata.Partition as per option#2 above
 -  use relative paths in tests

> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>    Reporter: Krishna Kumar
> Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, 
> HIVE-1918.patch.txt, hive-metastore-er.pdf
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system

2011-01-31 Thread Krishna Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988693#comment-12988693
 ] 

Krishna Kumar commented on HIVE-1918:
-

@Namit: Attached patch now uses relative paths in test scripts; (Note that some 
existing tests [clientpositive/insertexternal1.q, clientpositive/load_fs.q] 
uses absolute paths even today. Those need to be changed via another bug 
report.)

@Edward: No changes to Partition.java as proposed in option 2 above. 

> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Krishna Kumar
> Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, 
> HIVE-1918.patch.txt, hive-metastore-er.pdf
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system

2011-02-02 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-1918:


Attachment: HIVE-1918.patch.3.txt

Patch with all open issues addressed

> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>    Reporter: Krishna Kumar
> Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, 
> HIVE-1918.patch.3.txt, HIVE-1918.patch.txt, hive-metastore-er.pdf
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system

2011-02-02 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-1918:


Status: Patch Available  (was: Open)

> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>    Reporter: Krishna Kumar
> Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, 
> HIVE-1918.patch.3.txt, HIVE-1918.patch.txt, hive-metastore-er.pdf
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system

2011-02-02 Thread Krishna Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989747#comment-12989747
 ] 

Krishna Kumar commented on HIVE-1918:
-

With this patch, I think all above issues are addressed. Also have added 3 bug 
fixes + tests for those bugs. Please review.

> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Krishna Kumar
> Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, 
> HIVE-1918.patch.3.txt, HIVE-1918.patch.txt, hive-metastore-er.pdf
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system

2011-02-08 Thread Krishna Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992036#comment-12992036
 ] 

Krishna Kumar commented on HIVE-1918:
-

Thanks, Namit, for the comments.

1. Ok re moving serialization/deserialization methods to EximUtil, but did not 
understand the first part. Are you suggesting moving EximUtil, 
ImportSemanticAnalyzer and ExportSemanticAnalyzer to a new package? Does not 
seem to warrant it; today all parsing/semantic analysis classes are in 
o.a.h.h.ql.parse package...

2. You mean Hive.java's API? The existing first createPartition remains as it 
is, the second createPartition used in DDLTasek is changing to allow the 
creation of a partition with all the partition-specific configurations. Since 
AddPartitionDesc is initialized with nulls/-1 for these extra parameters, the 
existing behaviour is not altered.

3. Can you expand a little? What are inputs/outputs (classes?, tables?) - if 
they are part of the existing object model/data model, I think they are 
exported and imported. 

> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
> Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, 
> HIVE-1918.patch.3.txt, HIVE-1918.patch.txt, hive-metastore-er.pdf
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system

2011-02-09 Thread Krishna Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992373#comment-12992373
 ] 

Krishna Kumar commented on HIVE-1918:
-

Importing into existing tables is now supported, but the checks (to see whether 
the imported table and the target table are compatible) have been kept fairly 
simple for now. Please see ImportSemanticAnalyzer.checkTable. The schemas 
(column and partition) of the two should match exactly, except for comments. 
Since we are just moving files (rather than rewriting records), I think there 
will be issues if the metadata schema does not match (in terms of types, number 
etc) the data serialization exactly.

Re the earlier comment re outputs/inputs, got what you meant. I will add the 
table/partition to the inputs in exportsemanticanalyzer. But in the case of the 
imports, I see that the tasks themselves adds the entity operated upon to the 
inputs/outputs list. Isn't that too late for authorization/concurrency, even 
though it may work for replication. Or both the sem.analyzers and the tasks are 
expected to add them? In the case of newly created table/partition, the 
sem.analyzer does not have a handle ?

> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
> Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, 
> HIVE-1918.patch.3.txt, HIVE-1918.patch.txt, hive-metastore-er.pdf
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system

2011-02-09 Thread Krishna Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992606#comment-12992606
 ] 

Krishna Kumar commented on HIVE-1918:
-

Hmm. LoadSemanticAnalyzer (which knows the table) does not add it to the 
outputs, but the MoveTask it schedules, does. 

Similarly, CREATE-TABLE does not add the entity but the DDLTask it schedules, 
does. This may be fine only because the entity does not exist at compile time?

ADD-PARTITION adds the table as an *input* at compile time and the partition 
itself is added as an output at execution time. Should not the table be an 
output (at compile time) as well - for authorization/concurrency purposes?

Anyway, where the import operates on existing tables/partitions, I will add 
them at compile time. If the entity is being created as part of the task, then 
the task will be adding them to inputs/outputs at runtime. Is this fine?


> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
> Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, 
> HIVE-1918.patch.3.txt, HIVE-1918.patch.txt, hive-metastore-er.pdf
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system

2011-02-11 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-1918:


Status: Patch Available  (was: Open)

Please review. Will try and see if I can update the reviewboard myself...

> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>    Reporter: Krishna Kumar
>Assignee: Krishna Kumar
> Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, 
> HIVE-1918.patch.3.txt, HIVE-1918.patch.4.txt, HIVE-1918.patch.txt, 
> hive-metastore-er.pdf
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system

2011-02-11 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-1918:


Attachment: HIVE-1918.patch.4.txt

Patch with 
 - metadata ser/deser methods moved from HiveUtils to EximUtil
 - inputs and outputs populated; authorization related bugfix and tests

> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>    Reporter: Krishna Kumar
>Assignee: Krishna Kumar
> Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, 
> HIVE-1918.patch.3.txt, HIVE-1918.patch.4.txt, HIVE-1918.patch.txt, 
> hive-metastore-er.pdf
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-17278) Incorrect output timestamp from from_utc_timestamp()/to_utc_timestamp when local timezone has DST

2017-08-08 Thread Leela Krishna (JIRA)

Leela Krishna created HIVE-17278:


 Summary: Incorrect output timestamp from 
from_utc_timestamp()/to_utc_timestamp when local timezone has DST
 Key: HIVE-17278
 URL: https://issues.apache.org/jira/browse/HIVE-17278
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 2.0.0
Reporter: Leela Krishna


HIVE-12706 is resolved but there is still a bug in this - from_utc_timestamp() 
is interpreting a GMT timestamp with DST.
HS2 on PST timezone:
GMT timestamp PST timestamp PST 2GMT
2012-03-11 01:30:15.332 2012-03-10 17:30:15.332 2012-03-11 01:30:15.332
2012-03-11 02:30:15.332 2012-03-10 19:30:15.332 2012-03-11 03:30:15.332 (<--- 
We got 1 hour more on GMT)
PSTtimestap is generated using from_utc_timestamp('2012-03-11 02:30:15.332', 
'PST') 
PST2GMT timestamp is generated using 
to_utc_timestamp(from_utc_timestamp('2012-03-11 02:30:15.332', 'PST'), 'PST')



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly

2011-08-30 Thread Krishna Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094294#comment-13094294
 ] 

Krishna Kumar commented on HIVE-2417:
-

Yes, the test is designed to produce the error when run without the change. Are 
you finding that that's not the case? I get an EOFException while running the 
same steps in my development environment (i.e., not as a unit test).

1. This is needed so that the rcfiles in the target table are compressed with 
Bzip2. Do you mean that we should be using Default compression codec instead? 
Fine with me but why is that important?

2. tgt does contain more than one file.

[before alter]
+POSTHOOK: query: show table extended like `tgt_rc_merge_test`
...
+totalNumberFiles:2
...
[after alter]
+POSTHOOK: query: show table extended like `tgt_rc_merge_test`
...
+totalNumberFiles:1

The 'create' adds one file, and the insert adds another file. [OT: Does it make 
sense append a block merge task after an non-overwrite insert? Dunno...]

> Merging of compressed rcfiles fails to write the valuebuffer part correctly
> ---
>
> Key: HIVE-2417
> URL: https://issues.apache.org/jira/browse/HIVE-2417
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
> Attachments: HIVE-2417.v0.patch
>
>
> The blockmerge task does not create proper rc files when merging compressed 
> rc files as the valuebuffer writing is incorrect.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2417) Merging of compressed rcfiles fails to write the valuebuffer part correctly

2011-08-30 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2417:


Attachment: HIVE-2417.v1.patch

Test changed after review comments
 - default codec instead of bzip2
 - Create + 2 inserts instead of CTAS + 1 insert

> Merging of compressed rcfiles fails to write the valuebuffer part correctly
> ---
>
> Key: HIVE-2417
> URL: https://issues.apache.org/jira/browse/HIVE-2417
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>    Reporter: Krishna Kumar
>Assignee: Krishna Kumar
> Attachments: HIVE-2417.v0.patch, HIVE-2417.v1.patch
>
>
> The blockmerge task does not create proper rc files when merging compressed 
> rc files as the valuebuffer writing is incorrect.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2413) BlockMergeTask ignores client-specified jars

2011-09-01 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2413:


Attachment: HIVE-2413.v1.patch

Empty string not handled correctly in JC so handling it here...

> BlockMergeTask ignores client-specified jars
> 
>
> Key: HIVE-2413
> URL: https://issues.apache.org/jira/browse/HIVE-2413
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>    Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2413.v0.patch, HIVE-2413.v1.patch
>
>
> User-specified jars are not added to the hadoop tasks while executing a 
> BlockMergeTask resulting in a ClassNotFoundException.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system

2011-02-17 Thread Krishna Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12995767#comment-12995767
 ] 

Krishna Kumar commented on HIVE-1918:
-

https://reviews.apache.org/r/430/ added (with hive-git as repository).

Carl, can you take down 339 as that is now superseded?

> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
> Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, 
> HIVE-1918.patch.3.txt, HIVE-1918.patch.4.txt, HIVE-1918.patch.txt, 
> hive-metastore-er.pdf
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system

2011-02-21 Thread Krishna Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12997296#comment-12997296
 ] 

Krishna Kumar commented on HIVE-1918:
-

There are a few reasons why I took this approach

 - The decision on compatibility (forward/backward) checks as in 
EximUtil.checkCompatibility needs to taken consciously. That is, automatically 
breaking backward compatibility is not an option here I think.

 - What needs to be serialized/deserialized is also requires a human decision. 
For instance, even now, authorization details are not transferred by an 
export/import.

 - The serialization/deserialization methods are also used by howl codebase 
outside of a hive context. It will be good to have this code only loosely 
coupled to the metastore code.

> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
> Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, 
> HIVE-1918.patch.3.txt, HIVE-1918.patch.4.txt, HIVE-1918.patch.txt, 
> hive-metastore-er.pdf
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Created: (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.

2011-02-23 Thread Krishna Kumar (JIRA)

LOAD compilation does not set the outputs during semantic analysis resulting in 
no authorization checks being done for it.
--

 Key: HIVE-2003
 URL: https://issues.apache.org/jira/browse/HIVE-2003
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Krishna Kumar
Assignee: Krishna Kumar
Priority: Minor


The table/partition being loaded is not being added to outputs in the 
LoadSemanticAnalyzer.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.

2011-02-23 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2003:


Attachment: HIVE-2003.patch.txt

Patch attached.

1. LoadSemanticAnalyzer adds the table/partition to the outputs

2. QTestUtil.cleanup() used to call setup.tearDown, resulting in the commands 
run during createSources being run without a zookeeper server instance. So I 
have moved setup.tearDown to QTestUtil.shutdown().

3. EnforceReadOnlyTables also needs to allow outputs during initialization 
loads/creates. So a session boolean indicates initialization phase.

4. TestParse.vm and TestParseNegative.vm needed to be fixed too. Setup create a 
QTestUtil instance each time but tearDown seems to consider qt as a reusable 
instance. Changed tearDown to shutdown QTestUtil every time.

5. Test results regenerated.

> LOAD compilation does not set the outputs during semantic analysis resulting 
> in no authorization checks being done for it.
> --
>
> Key: HIVE-2003
> URL: https://issues.apache.org/jira/browse/HIVE-2003
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2003.patch.txt
>
>
> The table/partition being loaded is not being added to outputs in the 
> LoadSemanticAnalyzer.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.

2011-02-23 Thread Krishna Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12998276#comment-12998276
 ] 

Krishna Kumar commented on HIVE-2003:
-

6. Loading a partitioned table without specifying partitions was being 
validated only if OVERWRITE was specified. This is not right IMO, so fixed this 
as well.

> LOAD compilation does not set the outputs during semantic analysis resulting 
> in no authorization checks being done for it.
> --
>
> Key: HIVE-2003
> URL: https://issues.apache.org/jira/browse/HIVE-2003
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2003.patch.txt
>
>
> The table/partition being loaded is not being added to outputs in the 
> LoadSemanticAnalyzer.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.

2011-02-23 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2003:


Attachment: (was: HIVE-2003.patch.txt)

> LOAD compilation does not set the outputs during semantic analysis resulting 
> in no authorization checks being done for it.
> --
>
> Key: HIVE-2003
> URL: https://issues.apache.org/jira/browse/HIVE-2003
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2003.patch.txt
>
>
> The table/partition being loaded is not being added to outputs in the 
> LoadSemanticAnalyzer.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.

2011-02-23 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2003:


Attachment: HIVE-2003.patch.txt

One results file was diffed as binary so patch regenerated with --text

> LOAD compilation does not set the outputs during semantic analysis resulting 
> in no authorization checks being done for it.
> --
>
> Key: HIVE-2003
> URL: https://issues.apache.org/jira/browse/HIVE-2003
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2003.patch.txt
>
>
> The table/partition being loaded is not being added to outputs in the 
> LoadSemanticAnalyzer.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system

2011-02-23 Thread Krishna Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12998442#comment-12998442
 ] 

Krishna Kumar commented on HIVE-1918:
-

Thanks Paul. 

[Your comments are on a superseded review board submission; I will remind Carl 
again to take it down. The current reviewboard submission is up at 
https://reviews.apache.org/r/430/, but never the less both your comments are 
still applicable.]

1. Ok. Will address it.

2. I am not seeing how compatibility checking and selective 
serialization/deserialization of an object graph will be possible by 
auto-generated code. Will look into both thrift and datanucleus serialization 
(that you mentioned) from this aspect, but fine-grained control over this 
process is required here I think.



> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
> Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, 
> HIVE-1918.patch.3.txt, HIVE-1918.patch.4.txt, HIVE-1918.patch.txt, 
> hive-metastore-er.pdf
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system

2011-02-28 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-1918:


Attachment: HIVE-1918.patch.5.txt

 - Nested ternaries expanded
 - thrift-based serialization for the metastore objects

Please review.

> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>    Reporter: Krishna Kumar
>Assignee: Krishna Kumar
> Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, 
> HIVE-1918.patch.3.txt, HIVE-1918.patch.4.txt, HIVE-1918.patch.5.txt, 
> HIVE-1918.patch.txt, hive-metastore-er.pdf
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.

2011-02-28 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2003:


Status: Patch Available  (was: Open)

> LOAD compilation does not set the outputs during semantic analysis resulting 
> in no authorization checks being done for it.
> --
>
> Key: HIVE-2003
> URL: https://issues.apache.org/jira/browse/HIVE-2003
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2003.patch.txt
>
>
> The table/partition being loaded is not being added to outputs in the 
> LoadSemanticAnalyzer.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system

2011-03-14 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-1918:


Attachment: HIVE-1918.patch.5.txt

Merged with trunk

> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>    Reporter: Krishna Kumar
>Assignee: Krishna Kumar
> Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, 
> HIVE-1918.patch.3.txt, HIVE-1918.patch.4.txt, HIVE-1918.patch.5.txt, 
> HIVE-1918.patch.5.txt, HIVE-1918.patch.txt, hive-metastore-er.pdf
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system

2011-03-14 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-1918:


Attachment: (was: HIVE-1918.patch.5.txt)

> Add export/import facilities to the hive system
> ---
>
> Key: HIVE-1918
> URL: https://issues.apache.org/jira/browse/HIVE-1918
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>    Reporter: Krishna Kumar
>Assignee: Krishna Kumar
> Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, 
> HIVE-1918.patch.3.txt, HIVE-1918.patch.4.txt, HIVE-1918.patch.5.txt, 
> HIVE-1918.patch.txt, hive-metastore-er.pdf
>
>
> This is an enhancement request to add export/import features to hive.
> With this language extension, the user can export the data of the table - 
> which may be located in different hdfs locations in case of a partitioned 
> table - as well as the metadata of the table into a specified output 
> location. This output location can then be moved over to another different 
> hadoop/hive instance and imported there.  
> This should work independent of the source and target metastore dbms used; 
> for instance, between derby and mysql.
> For partitioned tables, the ability to export/import a subset of the 
> partition must be supported.
> Howl will add more features on top of this: The ability to create/use the 
> exported data even in the absence of hive, using MR or Pig. Please see 
> http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Created: (HIVE-2065) RCFile issues

2011-03-18 Thread Krishna Kumar (JIRA)

RCFile issues
-

 Key: HIVE-2065
 URL: https://issues.apache.org/jira/browse/HIVE-2065
 Project: Hive
  Issue Type: Bug
Reporter: Krishna Kumar
Priority: Minor


Some potential issues with RCFile

1. Remove unwanted synchronized modifiers on the methods of RCFile. As per 
yongqiang he, the class is not meant to be thread-safe (and it is not). Might 
as well get rid of the confusing and performance-impacting lock acquisitions.

2. Record Length overstated for compressed files. IIUC, the key compression 
happens after we have written the record length.

{code}
  int keyLength = key.getSize();
  if (keyLength < 0) {
throw new IOException("negative length keys not allowed: " + key);
  }

  out.writeInt(keyLength + valueLength); // total record length
  out.writeInt(keyLength); // key portion length
  if (!isCompressed()) {
out.writeInt(keyLength);
key.write(out); // key
  } else {
keyCompressionBuffer.reset();
keyDeflateFilter.resetState();
key.write(keyDeflateOut);
keyDeflateOut.flush();
keyDeflateFilter.finish();
int compressedKeyLen = keyCompressionBuffer.getLength();
out.writeInt(compressedKeyLen);
out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
  }
{code}

3. For sequence file compatibility, the compressed key length should be the 
next field to record length, not the uncompressed key length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HIVE-2065) RCFile issues

2011-03-18 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2065:


Attachment: Slide1.png

Compressed RCFile Layout

> RCFile issues
> -
>
> Key: HIVE-2065
> URL: https://issues.apache.org/jira/browse/HIVE-2065
> Project: Hive
>  Issue Type: Bug
>    Reporter: Krishna Kumar
>Priority: Minor
> Attachments: Slide1.png
>
>
> Some potential issues with RCFile
> 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per 
> yongqiang he, the class is not meant to be thread-safe (and it is not). Might 
> as well get rid of the confusing and performance-impacting lock acquisitions.
> 2. Record Length overstated for compressed files. IIUC, the key compression 
> happens after we have written the record length.
> {code}
>   int keyLength = key.getSize();
>   if (keyLength < 0) {
> throw new IOException("negative length keys not allowed: " + key);
>   }
>   out.writeInt(keyLength + valueLength); // total record length
>   out.writeInt(keyLength); // key portion length
>   if (!isCompressed()) {
> out.writeInt(keyLength);
> key.write(out); // key
>   } else {
> keyCompressionBuffer.reset();
> keyDeflateFilter.resetState();
> key.write(keyDeflateOut);
> keyDeflateOut.flush();
> keyDeflateFilter.finish();
> int compressedKeyLen = keyCompressionBuffer.getLength();
> out.writeInt(compressedKeyLen);
> out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
>   }
> {code}
> 3. For sequence file compatibility, the compressed key length should be the 
> next field to record length, not the uncompressed key length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Assigned: (HIVE-2065) RCFile issues

2011-03-18 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar reassigned HIVE-2065:
---

Assignee: Krishna Kumar

> RCFile issues
> -
>
> Key: HIVE-2065
> URL: https://issues.apache.org/jira/browse/HIVE-2065
> Project: Hive
>  Issue Type: Bug
>    Reporter: Krishna Kumar
>    Assignee: Krishna Kumar
>Priority: Minor
> Attachments: Slide1.png
>
>
> Some potential issues with RCFile
> 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per 
> yongqiang he, the class is not meant to be thread-safe (and it is not). Might 
> as well get rid of the confusing and performance-impacting lock acquisitions.
> 2. Record Length overstated for compressed files. IIUC, the key compression 
> happens after we have written the record length.
> {code}
>   int keyLength = key.getSize();
>   if (keyLength < 0) {
> throw new IOException("negative length keys not allowed: " + key);
>   }
>   out.writeInt(keyLength + valueLength); // total record length
>   out.writeInt(keyLength); // key portion length
>   if (!isCompressed()) {
> out.writeInt(keyLength);
> key.write(out); // key
>   } else {
> keyCompressionBuffer.reset();
> keyDeflateFilter.resetState();
> key.write(keyDeflateOut);
> keyDeflateOut.flush();
> keyDeflateFilter.finish();
> int compressedKeyLen = keyCompressionBuffer.getLength();
> out.writeInt(compressedKeyLen);
> out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
>   }
> {code}
> 3. For sequence file compatibility, the compressed key length should be the 
> next field to record length, not the uncompressed key length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HIVE-2065) RCFile issues

2011-03-19 Thread Krishna Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008741#comment-13008741
 ] 

Krishna Kumar commented on HIVE-2065:
-

So should I go ahead and fix #2 and #3 as well? Note that these are 
non-compatible changes, so the version number will need to be bumped up.

My proposal:

Fix the issues in the new format
 - up the version number to 7.
 - compute and store record length as (compressed key length = 4 + compressed 
key contents length) + compressed value length
 - store compressed key length as the next 4-byte field
 - key contains 4-byte uncompressed key contents length + compressed key 
contents

Provide backward compatibility
 - while reading version 6,
   - interpret fields as now but recalculate the recordlength from the next two 
fields (as record length = record length - uncompressed key length + compressed 
key length)

> RCFile issues
> -
>
> Key: HIVE-2065
> URL: https://issues.apache.org/jira/browse/HIVE-2065
> Project: Hive
>  Issue Type: Bug
>    Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: Slide1.png
>
>
> Some potential issues with RCFile
> 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per 
> yongqiang he, the class is not meant to be thread-safe (and it is not). Might 
> as well get rid of the confusing and performance-impacting lock acquisitions.
> 2. Record Length overstated for compressed files. IIUC, the key compression 
> happens after we have written the record length.
> {code}
>   int keyLength = key.getSize();
>   if (keyLength < 0) {
> throw new IOException("negative length keys not allowed: " + key);
>   }
>   out.writeInt(keyLength + valueLength); // total record length
>   out.writeInt(keyLength); // key portion length
>   if (!isCompressed()) {
> out.writeInt(keyLength);
> key.write(out); // key
>   } else {
> keyCompressionBuffer.reset();
> keyDeflateFilter.resetState();
> key.write(keyDeflateOut);
> keyDeflateOut.flush();
> keyDeflateFilter.finish();
> int compressedKeyLen = keyCompressionBuffer.getLength();
> out.writeInt(compressedKeyLen);
> out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
>   }
> {code}
> 3. For sequence file compatibility, the compressed key length should be the 
> next field to record length, not the uncompressed key length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HIVE-2065) RCFile issues

2011-03-19 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2065:


Attachment: proposal.png

Notation : Length bracket inside the dashed box means it is the uncompressed 
length. Length bracket outside the dashed box means it is the compressed 
length. 

> RCFile issues
> -
>
> Key: HIVE-2065
> URL: https://issues.apache.org/jira/browse/HIVE-2065
> Project: Hive
>  Issue Type: Bug
>    Reporter: Krishna Kumar
>    Assignee: Krishna Kumar
>Priority: Minor
> Attachments: Slide1.png, proposal.png
>
>
> Some potential issues with RCFile
> 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per 
> yongqiang he, the class is not meant to be thread-safe (and it is not). Might 
> as well get rid of the confusing and performance-impacting lock acquisitions.
> 2. Record Length overstated for compressed files. IIUC, the key compression 
> happens after we have written the record length.
> {code}
>   int keyLength = key.getSize();
>   if (keyLength < 0) {
> throw new IOException("negative length keys not allowed: " + key);
>   }
>   out.writeInt(keyLength + valueLength); // total record length
>   out.writeInt(keyLength); // key portion length
>   if (!isCompressed()) {
> out.writeInt(keyLength);
> key.write(out); // key
>   } else {
> keyCompressionBuffer.reset();
> keyDeflateFilter.resetState();
> key.write(keyDeflateOut);
> keyDeflateOut.flush();
> keyDeflateFilter.finish();
> int compressedKeyLen = keyCompressionBuffer.getLength();
> out.writeInt(compressedKeyLen);
> out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
>   }
> {code}
> 3. For sequence file compatibility, the compressed key length should be the 
> next field to record length, not the uncompressed key length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.

2011-03-21 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2003:


Attachment: HIVE-2003.patch.1.txt

Regenerated the patch.

> LOAD compilation does not set the outputs during semantic analysis resulting 
> in no authorization checks being done for it.
> --
>
> Key: HIVE-2003
> URL: https://issues.apache.org/jira/browse/HIVE-2003
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2003.patch.1.txt, HIVE-2003.patch.txt
>
>
> The table/partition being loaded is not being added to outputs in the 
> LoadSemanticAnalyzer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.

2011-03-21 Thread Krishna Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009202#comment-13009202
 ] 

Krishna Kumar commented on HIVE-2003:
-

Added to review board: https://reviews.apache.org/r/518/


> LOAD compilation does not set the outputs during semantic analysis resulting 
> in no authorization checks being done for it.
> --
>
> Key: HIVE-2003
> URL: https://issues.apache.org/jira/browse/HIVE-2003
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2003.patch.1.txt, HIVE-2003.patch.txt
>
>
> The table/partition being loaded is not being added to outputs in the 
> LoadSemanticAnalyzer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.

2011-03-21 Thread Krishna Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009203#comment-13009203
 ] 

Krishna Kumar commented on HIVE-2003:
-

For #4 above, I have taken now the same approach as in 
TestCliDriver/TestNegativeCliDriver, reusing the QTestUtil instance. Needed to 
clear out inputs/outputs for this to work.

> LOAD compilation does not set the outputs during semantic analysis resulting 
> in no authorization checks being done for it.
> --
>
> Key: HIVE-2003
> URL: https://issues.apache.org/jira/browse/HIVE-2003
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2003.patch.1.txt, HIVE-2003.patch.txt
>
>
> The table/partition being loaded is not being added to outputs in the 
> LoadSemanticAnalyzer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2003) LOAD compilation does not set the outputs during semantic analysis resulting in no authorization checks being done for it.

2011-03-22 Thread Krishna Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Kumar updated HIVE-2003:


Status: Patch Available  (was: Open)

Please review asap as there are lots of changes to q.out files and any delay 
may cause another conflict/resolution cycle.

> LOAD compilation does not set the outputs during semantic analysis resulting 
> in no authorization checks being done for it.
> --
>
> Key: HIVE-2003
> URL: https://issues.apache.org/jira/browse/HIVE-2003
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: HIVE-2003.patch.1.txt, HIVE-2003.patch.txt
>
>
> The table/partition being loaded is not being added to outputs in the 
> LoadSemanticAnalyzer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2065) RCFile issues

2011-03-22 Thread Krishna Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009981#comment-13009981
 ] 

Krishna Kumar commented on HIVE-2065:
-

Hmm. #3 is taking me a bit too far than I originally thought. I assume being 
able to read an RCFile as SequenceFile is required, while being able to write 
an RCFile via the SequenceFile interface is desirable.

Having made changes so that record length is correctly set, in order to be able 
to make sure that the rcfile is handled correctly as a sequence file, the 
following changes are also required, IIUC.

 - the second field should be the key length (4 + compressed/plain key contents)
 - the key class (KeyBuffer) must be made responsible for reading/writing the 
next field - plain key contents length - as well as compression/decompression 
of the key contents
 - the value class (ValueBuffer) related changes will be trickier. Since the 
value is not compressed as a unit, we can not use record-compressed format. We 
need to mark the records as plain records, and move the codec to a metadata 
entry. Then the valueBuffer class will work correctly with sequencefile 
implementation.

Thoughts? worth it?


> RCFile issues
> -
>
> Key: HIVE-2065
> URL: https://issues.apache.org/jira/browse/HIVE-2065
> Project: Hive
>  Issue Type: Bug
>    Reporter: Krishna Kumar
>Assignee: Krishna Kumar
>Priority: Minor
> Attachments: Slide1.png, proposal.png
>
>
> Some potential issues with RCFile
> 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per 
> yongqiang he, the class is not meant to be thread-safe (and it is not). Might 
> as well get rid of the confusing and performance-impacting lock acquisitions.
> 2. Record Length overstated for compressed files. IIUC, the key compression 
> happens after we have written the record length.
> {code}
>   int keyLength = key.getSize();
>   if (keyLength < 0) {
> throw new IOException("negative length keys not allowed: " + key);
>   }
>   out.writeInt(keyLength + valueLength); // total record length
>   out.writeInt(keyLength); // key portion length
>   if (!isCompressed()) {
> out.writeInt(keyLength);
> key.write(out); // key
>   } else {
> keyCompressionBuffer.reset();
> keyDeflateFilter.resetState();
> key.write(keyDeflateOut);
> keyDeflateOut.flush();
> keyDeflateFilter.finish();
> int compressedKeyLen = keyCompressionBuffer.getLength();
> out.writeInt(compressedKeyLen);
> out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
>   }
> {code}
> 3. For sequence file compatibility, the compressed key length should be the 
> next field to record length, not the uncompressed key length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

1 2 >

1 - 100 of 190 matches

Mail list logo