XML Serde

2012-06-24 Thread Sumit Kumar
Hi,

So i looked for a generic approach for handling xml files in hive but found
none and thought i could use the concepts from json-serde (
http://code.google.com/p/hive-json-serde/) in creating a generic xml serde.
XPath was something that came immediately in my mind and should work in the
same way that json works for json-serde. The problem is with the use case
that one xml file could contain multiple rows of interest in a single xml
file. Example shown below.


  ... 
  ... 
  ... 


In this case, serde is supposed to generate three rows for each book node.
I looked at json-serde implementation but there the deserialize step
returns an ArrayList instance with column values set in indices of the
ArrayList; and this one instance maps to one row. I do see that deserialize
step can return any java Object but not sure what would be the appropriate
way to return multiple rows corresponding to each book node. I'm going to
give it a shot anyway but thought to seek help from the community if
somebody has already tried this or has a better approach. Would really
appreciate any input, if i succeed, i will share my code; if not, i will
anyway come back :-)

Thanks in advance.
-Sumit


Re: XML Serde

2012-06-25 Thread Sumit Kumar
So i found this discussion on this topic
http://mail-archives.apache.org/mod_mbox/hive-user/201006.mbox/%3caanlktikyl3hinowfo36yeyid9vojyh_6pe3slorhy...@mail.gmail.com%3E.
Makes more sense now. Will post my final resolution.

On Sun, Jun 24, 2012 at 10:39 PM, Sumit Kumar  wrote:

> Hi,
>
> So i looked for a generic approach for handling xml files in hive but
> found none and thought i could use the concepts from json-serde (
> http://code.google.com/p/hive-json-serde/) in creating a generic xml
> serde. XPath was something that came immediately in my mind and should work
> in the same way that json works for json-serde. The problem is with the use
> case that one xml file could contain multiple rows of interest in a single
> xml file. Example shown below.
>
> 
>   ... 
>   ... 
>   ... 
> 
>
> In this case, serde is supposed to generate three rows for each book node.
> I looked at json-serde implementation but there the deserialize step
> returns an ArrayList instance with column values set in indices of the
> ArrayList; and this one instance maps to one row. I do see that deserialize
> step can return any java Object but not sure what would be the appropriate
> way to return multiple rows corresponding to each book node. I'm going to
> give it a shot anyway but thought to seek help from the community if
> somebody has already tried this or has a better approach. Would really
> appreciate any input, if i succeed, i will share my code; if not, i will
> anyway come back :-)
>
> Thanks in advance.
> -Sumit
>


"desc database extended " doesn't print dbproperties?

2014-06-25 Thread Sumit Kumar
Hey guys,

I just discovered that this syntax doesn't print the dbproperties any more. 
I've two hive versions that i'm testing following query on:

  create database test2 with dbproperties ('key1' = 'value1', 'key2' = 
'value2');
  desc database extended test2;


The output on hive 11 is:

hive>   desc database extended test2;   
 
OK
test2 hdfs://:9000/warehouse/test2.db   {key2=value2, 
key1=value1}
Time taken: 0.021 seconds, Fetched: 1 row(s)


The output on hive 13 is:
hive> desc database extended test2; 
 
OK
test2 hdfs://:9000/warehouse/test2.db    hadoop
Time taken: 0.023 seconds, Fetched: 1 row(s)


If you look closely, you would notice that no key value information from 
dbproperties was printed in hive13 case and somehow magically "hadoop" (i guess 
it's my userid) appeared.

Any idea if this functionality changed since hive 11? Do we have a reference 
jira? I searched on the wikis and JIRAs but couldn't find a reference; 
surprised that the language manual wiki 
(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL) doesn't 
even talk about this functionality any more. Would appreciate input on this.


Thanks,
-Sumit


[jira] [Commented] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system

2014-11-11 Thread Sumit Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207168#comment-14207168
 ] 

Sumit Kumar commented on HIVE-7136:
---

I still don't seem to have "write" access, will you please grant me the same? 

> Allow Hive to read hive scripts from any of the supported file systems in 
> hadoop eco-system
> ---
>
> Key: HIVE-7136
> URL: https://issues.apache.org/jira/browse/HIVE-7136
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>    Affects Versions: 0.13.0
>    Reporter: Sumit Kumar
>Assignee: Sumit Kumar
>Priority: Minor
>  Labels: TODOC14
> Fix For: 0.14.0
>
> Attachments: HIVE-7136.01.patch, HIVE-7136.patch
>
>
> Current hive cli assumes that the source file (hive script) is always on the 
> local file system. This patch implements support for reading source files 
> from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping 
> the default behavior intact to be reading from default filesystem (local) in 
> case scheme is not provided in the url for the source file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system

2014-11-12 Thread Sumit Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208278#comment-14208278
 ] 

Sumit Kumar commented on HIVE-7136:
---

[~leftylev] I already have a confluence id: "ksumit" (without quotes). 
Apologies for not mentioning the same earlier.

> Allow Hive to read hive scripts from any of the supported file systems in 
> hadoop eco-system
> ---
>
> Key: HIVE-7136
> URL: https://issues.apache.org/jira/browse/HIVE-7136
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.13.0
>    Reporter: Sumit Kumar
>Assignee: Sumit Kumar
>Priority: Minor
>  Labels: TODOC14
> Fix For: 0.14.0
>
> Attachments: HIVE-7136.01.patch, HIVE-7136.patch
>
>
> Current hive cli assumes that the source file (hive script) is always on the 
> local file system. This patch implements support for reading source files 
> from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping 
> the default behavior intact to be reading from default filesystem (local) in 
> case scheme is not provided in the url for the source file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system

2014-11-12 Thread Sumit Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208671#comment-14208671
 ] 

Sumit Kumar commented on HIVE-7136:
---

Thank you. I just updated both the wikis. 

> Allow Hive to read hive scripts from any of the supported file systems in 
> hadoop eco-system
> ---
>
> Key: HIVE-7136
> URL: https://issues.apache.org/jira/browse/HIVE-7136
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.13.0
>Reporter: Sumit Kumar
>Assignee: Sumit Kumar
>Priority: Minor
>  Labels: TODOC14
> Fix For: 0.14.0
>
> Attachments: HIVE-7136.01.patch, HIVE-7136.patch
>
>
> Current hive cli assumes that the source file (hive script) is always on the 
> local file system. This patch implements support for reading source files 
> from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping 
> the default behavior intact to be reading from default filesystem (local) in 
> case scheme is not provided in the url for the source file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system

2014-05-28 Thread Sumit Kumar (JIRA)
Sumit Kumar created HIVE-7136:
-

 Summary: Allow Hive to read hive scripts from any of the supported 
file systems in hadoop eco-system
 Key: HIVE-7136
 URL: https://issues.apache.org/jira/browse/HIVE-7136
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.13.0
Reporter: Sumit Kumar
Priority: Minor


Current hive cli assumes that the source file (hive script) is always on the 
local file system. This patch implements support for reading source files from 
other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping the 
default behavior intact to be reading from default filesystem (local) in case 
scheme is not provided in the url for the source file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system

2014-05-28 Thread Sumit Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Kumar updated HIVE-7136:
--

Attachment: HIVE-7136.patch

> Allow Hive to read hive scripts from any of the supported file systems in 
> hadoop eco-system
> ---
>
> Key: HIVE-7136
> URL: https://issues.apache.org/jira/browse/HIVE-7136
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.13.0
>Reporter: Sumit Kumar
>Priority: Minor
> Attachments: HIVE-7136.patch
>
>
> Current hive cli assumes that the source file (hive script) is always on the 
> local file system. This patch implements support for reading source files 
> from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping 
> the default behavior intact to be reading from default filesystem (local) in 
> case scheme is not provided in the url for the source file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system

2014-05-28 Thread Sumit Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Kumar updated HIVE-7136:
--

Status: Patch Available  (was: Open)

> Allow Hive to read hive scripts from any of the supported file systems in 
> hadoop eco-system
> ---
>
> Key: HIVE-7136
> URL: https://issues.apache.org/jira/browse/HIVE-7136
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.13.0
>Reporter: Sumit Kumar
>Priority: Minor
> Attachments: HIVE-7136.patch
>
>
> Current hive cli assumes that the source file (hive script) is always on the 
> local file system. This patch implements support for reading source files 
> from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping 
> the default behavior intact to be reading from default filesystem (local) in 
> case scheme is not provided in the url for the source file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7137) Add progressable to writer interfaces so they could report progress while different operations are in progress

2014-05-28 Thread Sumit Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Kumar updated HIVE-7137:
--

Affects Version/s: 0.13.0

> Add progressable to writer interfaces so they could report progress while 
> different operations are in progress
> --
>
> Key: HIVE-7137
> URL: https://issues.apache.org/jira/browse/HIVE-7137
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.13.0
>    Reporter: Sumit Kumar
>Priority: Minor
>
> This patch is to pass Progressable instance along with different Writer 
> implementations. Without this jobs fail whenever a bulk write operation takes 
> longer than usual. With this patch, writers keep sending heartbeat and job 
> keeps running fine. Hive already provided support for this so this is a minor 
> addition.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7137) Add progressable to writer interfaces so they could report progress while different operations are in progress

2014-05-28 Thread Sumit Kumar (JIRA)
Sumit Kumar created HIVE-7137:
-

 Summary: Add progressable to writer interfaces so they could 
report progress while different operations are in progress
 Key: HIVE-7137
 URL: https://issues.apache.org/jira/browse/HIVE-7137
 Project: Hive
  Issue Type: Improvement
Reporter: Sumit Kumar
Priority: Minor


This patch is to pass Progressable instance along with different Writer 
implementations. Without this jobs fail whenever a bulk write operation takes 
longer than usual. With this patch, writers keep sending heartbeat and job 
keeps running fine. Hive already provided support for this so this is a minor 
addition.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7137) Add progressable to writer interfaces so they could report progress while different operations are in progress

2014-05-28 Thread Sumit Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Kumar updated HIVE-7137:
--

Component/s: Query Processor

> Add progressable to writer interfaces so they could report progress while 
> different operations are in progress
> --
>
> Key: HIVE-7137
> URL: https://issues.apache.org/jira/browse/HIVE-7137
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Sumit Kumar
>Priority: Minor
>
> This patch is to pass Progressable instance along with different Writer 
> implementations. Without this jobs fail whenever a bulk write operation takes 
> longer than usual. With this patch, writers keep sending heartbeat and job 
> keeps running fine. Hive already provided support for this so this is a minor 
> addition.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7137) Add progressable to writer interfaces so they could report progress while different operations are in progress

2014-05-28 Thread Sumit Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Kumar updated HIVE-7137:
--

Attachment: HIVE-7137.patch

> Add progressable to writer interfaces so they could report progress while 
> different operations are in progress
> --
>
> Key: HIVE-7137
> URL: https://issues.apache.org/jira/browse/HIVE-7137
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Sumit Kumar
>Priority: Minor
> Attachments: HIVE-7137.patch
>
>
> This patch is to pass Progressable instance along with different Writer 
> implementations. Without this jobs fail whenever a bulk write operation takes 
> longer than usual. With this patch, writers keep sending heartbeat and job 
> keeps running fine. Hive already provided support for this so this is a minor 
> addition.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7137) Add progressable to writer interfaces so they could report progress while different operations are in progress

2014-05-28 Thread Sumit Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Kumar updated HIVE-7137:
--

Affects Version/s: (was: 0.13.0)
   0.13.1
 Release Note: This patch has been rebased to current state of hive 
0.13 branch (0.13.1)
   Status: Patch Available  (was: Open)

> Add progressable to writer interfaces so they could report progress while 
> different operations are in progress
> --
>
> Key: HIVE-7137
> URL: https://issues.apache.org/jira/browse/HIVE-7137
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.13.1
>Reporter: Sumit Kumar
>Priority: Minor
> Attachments: HIVE-7137.patch
>
>
> This patch is to pass Progressable instance along with different Writer 
> implementations. Without this jobs fail whenever a bulk write operation takes 
> longer than usual. With this patch, writers keep sending heartbeat and job 
> keeps running fine. Hive already provided support for this so this is a minor 
> addition.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HIVE-7137) Add progressable to writer interfaces so they could report progress while different operations are in progress

2014-05-29 Thread Sumit Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Kumar reassigned HIVE-7137:
-

Assignee: Sumit Kumar

> Add progressable to writer interfaces so they could report progress while 
> different operations are in progress
> --
>
> Key: HIVE-7137
> URL: https://issues.apache.org/jira/browse/HIVE-7137
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.13.1
>Reporter: Sumit Kumar
>Assignee: Sumit Kumar
>Priority: Minor
> Attachments: HIVE-7137.patch
>
>
> This patch is to pass Progressable instance along with different Writer 
> implementations. Without this jobs fail whenever a bulk write operation takes 
> longer than usual. With this patch, writers keep sending heartbeat and job 
> keeps running fine. Hive already provided support for this so this is a minor 
> addition.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system

2014-05-29 Thread Sumit Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012827#comment-14012827
 ] 

Sumit Kumar commented on HIVE-7136:
---

That should be easy to do. I'll run these failed tests locally to ensure they 
pass before submitting the new patch.

> Allow Hive to read hive scripts from any of the supported file systems in 
> hadoop eco-system
> ---
>
> Key: HIVE-7136
> URL: https://issues.apache.org/jira/browse/HIVE-7136
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.13.0
>Reporter: Sumit Kumar
>Assignee: Sumit Kumar
>Priority: Minor
> Attachments: HIVE-7136.patch
>
>
> Current hive cli assumes that the source file (hive script) is always on the 
> local file system. This patch implements support for reading source files 
> from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping 
> the default behavior intact to be reading from default filesystem (local) in 
> case scheme is not provided in the url for the source file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system

2014-05-31 Thread Sumit Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014767#comment-14014767
 ] 

Sumit Kumar commented on HIVE-7136:
---

[~ashutoshc] IOUtils.close(bufferedReader) is already doing that in the finally 
block right? 

> Allow Hive to read hive scripts from any of the supported file systems in 
> hadoop eco-system
> ---
>
> Key: HIVE-7136
> URL: https://issues.apache.org/jira/browse/HIVE-7136
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.13.0
>Reporter: Sumit Kumar
>Assignee: Sumit Kumar
>Priority: Minor
> Attachments: HIVE-7136.patch
>
>
> Current hive cli assumes that the source file (hive script) is always on the 
> local file system. This patch implements support for reading source files 
> from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping 
> the default behavior intact to be reading from default filesystem (local) in 
> case scheme is not provided in the url for the source file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system

2014-05-31 Thread Sumit Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014779#comment-14014779
 ] 

Sumit Kumar commented on HIVE-7136:
---

Thanks for confirming that [~ashutoshc]. I've the patch ready, will submit asap.

> Allow Hive to read hive scripts from any of the supported file systems in 
> hadoop eco-system
> ---
>
> Key: HIVE-7136
> URL: https://issues.apache.org/jira/browse/HIVE-7136
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.13.0
>Reporter: Sumit Kumar
>Assignee: Sumit Kumar
>Priority: Minor
> Attachments: HIVE-7136.patch
>
>
> Current hive cli assumes that the source file (hive script) is always on the 
> local file system. This patch implements support for reading source files 
> from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping 
> the default behavior intact to be reading from default filesystem (local) in 
> case scheme is not provided in the url for the source file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system

2014-06-02 Thread Sumit Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Kumar updated HIVE-7136:
--

Attachment: HIVE-7136-1.patch

> Allow Hive to read hive scripts from any of the supported file systems in 
> hadoop eco-system
> ---
>
> Key: HIVE-7136
> URL: https://issues.apache.org/jira/browse/HIVE-7136
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.13.0
>Reporter: Sumit Kumar
>Assignee: Sumit Kumar
>Priority: Minor
> Attachments: HIVE-7136-1.patch, HIVE-7136.patch
>
>
> Current hive cli assumes that the source file (hive script) is always on the 
> local file system. This patch implements support for reading source files 
> from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping 
> the default behavior intact to be reading from default filesystem (local) in 
> case scheme is not provided in the url for the source file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system

2014-06-02 Thread Sumit Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Kumar updated HIVE-7136:
--

Status: Patch Available  (was: Open)

Updated the patch to use FileSystem api instead of FileContext to support 
hadoop-1 as well.

> Allow Hive to read hive scripts from any of the supported file systems in 
> hadoop eco-system
> ---
>
> Key: HIVE-7136
> URL: https://issues.apache.org/jira/browse/HIVE-7136
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.13.0
>Reporter: Sumit Kumar
>Assignee: Sumit Kumar
>Priority: Minor
> Attachments: HIVE-7136-1.patch, HIVE-7136.patch
>
>
> Current hive cli assumes that the source file (hive script) is always on the 
> local file system. This patch implements support for reading source files 
> from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping 
> the default behavior intact to be reading from default filesystem (local) in 
> case scheme is not provided in the url for the source file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system

2014-06-02 Thread Sumit Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14015489#comment-14015489
 ] 

Sumit Kumar commented on HIVE-7136:
---

[~ashutoshc] wonder if there is a list of test cases that are known to fail. It 
would be helpful for new contributions. 

> Allow Hive to read hive scripts from any of the supported file systems in 
> hadoop eco-system
> ---
>
> Key: HIVE-7136
> URL: https://issues.apache.org/jira/browse/HIVE-7136
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.13.0
>Reporter: Sumit Kumar
>Assignee: Sumit Kumar
>Priority: Minor
> Attachments: HIVE-7136-1.patch, HIVE-7136.patch
>
>
> Current hive cli assumes that the source file (hive script) is always on the 
> local file system. This patch implements support for reading source files 
> from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping 
> the default behavior intact to be reading from default filesystem (local) in 
> case scheme is not provided in the url for the source file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system

2014-06-05 Thread Sumit Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Kumar updated HIVE-7136:
--

Attachment: (was: HIVE-7136-1.patch)

> Allow Hive to read hive scripts from any of the supported file systems in 
> hadoop eco-system
> ---
>
> Key: HIVE-7136
> URL: https://issues.apache.org/jira/browse/HIVE-7136
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.13.0
>Reporter: Sumit Kumar
>Assignee: Sumit Kumar
>Priority: Minor
> Attachments: HIVE-7136.01.patch, HIVE-7136.patch
>
>
> Current hive cli assumes that the source file (hive script) is always on the 
> local file system. This patch implements support for reading source files 
> from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping 
> the default behavior intact to be reading from default filesystem (local) in 
> case scheme is not provided in the url for the source file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system

2014-06-05 Thread Sumit Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Kumar updated HIVE-7136:
--

Attachment: HIVE-7136.01.patch

Renaming the patch file name to meet ptest requirements

> Allow Hive to read hive scripts from any of the supported file systems in 
> hadoop eco-system
> ---
>
> Key: HIVE-7136
> URL: https://issues.apache.org/jira/browse/HIVE-7136
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.13.0
>Reporter: Sumit Kumar
>Assignee: Sumit Kumar
>Priority: Minor
> Attachments: HIVE-7136.01.patch, HIVE-7136.patch
>
>
> Current hive cli assumes that the source file (hive script) is always on the 
> local file system. This patch implements support for reading source files 
> from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping 
> the default behavior intact to be reading from default filesystem (local) in 
> case scheme is not provided in the url for the source file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system

2014-06-05 Thread Sumit Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Kumar updated HIVE-7136:
--

Status: Patch Available  (was: Open)

> Allow Hive to read hive scripts from any of the supported file systems in 
> hadoop eco-system
> ---
>
> Key: HIVE-7136
> URL: https://issues.apache.org/jira/browse/HIVE-7136
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.13.0
>Reporter: Sumit Kumar
>Assignee: Sumit Kumar
>Priority: Minor
> Attachments: HIVE-7136.01.patch, HIVE-7136.patch
>
>
> Current hive cli assumes that the source file (hive script) is always on the 
> local file system. This patch implements support for reading source files 
> from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping 
> the default behavior intact to be reading from default filesystem (local) in 
> case scheme is not provided in the url for the source file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-2777) ability to add and drop partitions atomically

2014-06-05 Thread Sumit Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Kumar updated HIVE-2777:
--

Status: Open  (was: Patch Available)

> ability to add and drop partitions atomically
> -
>
> Key: HIVE-2777
> URL: https://issues.apache.org/jira/browse/HIVE-2777
> Project: Hive
>  Issue Type: New Feature
>  Components: Metastore
>Affects Versions: 0.13.0
>Reporter: Aniket Mokashi
>Assignee: Aniket Mokashi
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2777.D2271.1.patch, 
> hive-2777.patch
>
>
> Hive should have ability to atomically add and drop partitions. This way 
> admins can change partitions atomically without breaking the running jobs. It 
> allows admin to merge several partitions into one.
> Essentially, we would like to have an api- add_drop_partitions(String db, 
> String tbl_name, List addParts, List> dropParts, 
> boolean deleteData);
> This jira covers changes required for metastore and thrift.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system

2014-06-07 Thread Sumit Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021109#comment-14021109
 ] 

Sumit Kumar commented on HIVE-7136:
---

Sure, will update the documentation.

> Allow Hive to read hive scripts from any of the supported file systems in 
> hadoop eco-system
> ---
>
> Key: HIVE-7136
> URL: https://issues.apache.org/jira/browse/HIVE-7136
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 0.13.0
>Reporter: Sumit Kumar
>Assignee: Sumit Kumar
>Priority: Minor
> Fix For: 0.14.0
>
> Attachments: HIVE-7136.01.patch, HIVE-7136.patch
>
>
> Current hive cli assumes that the source file (hive script) is always on the 
> local file system. This patch implements support for reading source files 
> from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping 
> the default behavior intact to be reading from default filesystem (local) in 
> case scheme is not provided in the url for the source file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7239) Fix bug in HiveIndexedInputFormat implementation that causes incorrect query result when input backed by Sequence/RC files

2014-06-16 Thread Sumit Kumar (JIRA)
Sumit Kumar created HIVE-7239:
-

 Summary: Fix bug in HiveIndexedInputFormat implementation that 
causes incorrect query result when input backed by Sequence/RC files
 Key: HIVE-7239
 URL: https://issues.apache.org/jira/browse/HIVE-7239
 Project: Hive
  Issue Type: Bug
  Components: Indexing
Affects Versions: 0.13.1
Reporter: Sumit Kumar
Assignee: Sumit Kumar


In case of sequence files, it's crucial that splits are calculated around the 
boundaries enforced by the input sequence file. However by default hadoop 
creates input splits depending on the configuration parameters which may not 
match the boundaries for the input sequence file. Hive provides 
HiveIndexedInputFormat that provides extra logic and recalculates the split 
boundaries for each split depending on the sequence file's boundaries.

However we noticed this behavior of "over" reporting from data backed by 
sequence file. We've a sample data on which we experimented and fixed this bug, 
we have verified this fix by comparing the query output for input being 
sequence file format, rc file and regular format. However we have not able to 
find the right place to include this as a unit test that would execute as part 
of hive tests. We tried writing a "clientpositive" test as part of ql module 
but the output seems quite verbose and i couldn't interpret it that well. Can 
someone please review this change and guide on how to write a test that will 
execute as part of Hive testing?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7239) Fix bug in HiveIndexedInputFormat implementation that causes incorrect query result when input backed by Sequence/RC files

2014-06-16 Thread Sumit Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Kumar updated HIVE-7239:
--

Status: Patch Available  (was: Open)

> Fix bug in HiveIndexedInputFormat implementation that causes incorrect query 
> result when input backed by Sequence/RC files
> --
>
> Key: HIVE-7239
> URL: https://issues.apache.org/jira/browse/HIVE-7239
> Project: Hive
>  Issue Type: Bug
>  Components: Indexing
>Affects Versions: 0.13.1
>Reporter: Sumit Kumar
>Assignee: Sumit Kumar
> Attachments: HIVE-7239.patch
>
>
> In case of sequence files, it's crucial that splits are calculated around the 
> boundaries enforced by the input sequence file. However by default hadoop 
> creates input splits depending on the configuration parameters which may not 
> match the boundaries for the input sequence file. Hive provides 
> HiveIndexedInputFormat that provides extra logic and recalculates the split 
> boundaries for each split depending on the sequence file's boundaries.
> However we noticed this behavior of "over" reporting from data backed by 
> sequence file. We've a sample data on which we experimented and fixed this 
> bug, we have verified this fix by comparing the query output for input being 
> sequence file format, rc file and regular format. However we have not able to 
> find the right place to include this as a unit test that would execute as 
> part of hive tests. We tried writing a "clientpositive" test as part of ql 
> module but the output seems quite verbose and i couldn't interpret it that 
> well. Can someone please review this change and guide on how to write a test 
> that will execute as part of Hive testing?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7239) Fix bug in HiveIndexedInputFormat implementation that causes incorrect query result when input backed by Sequence/RC files

2014-06-16 Thread Sumit Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Kumar updated HIVE-7239:
--

Attachment: HIVE-7239.patch

Please review and recommend a way to test this patch as part of hive unit 
tests/cli tests/otherwise

> Fix bug in HiveIndexedInputFormat implementation that causes incorrect query 
> result when input backed by Sequence/RC files
> --
>
> Key: HIVE-7239
> URL: https://issues.apache.org/jira/browse/HIVE-7239
> Project: Hive
>  Issue Type: Bug
>  Components: Indexing
>Affects Versions: 0.13.1
>Reporter: Sumit Kumar
>Assignee: Sumit Kumar
> Attachments: HIVE-7239.patch
>
>
> In case of sequence files, it's crucial that splits are calculated around the 
> boundaries enforced by the input sequence file. However by default hadoop 
> creates input splits depending on the configuration parameters which may not 
> match the boundaries for the input sequence file. Hive provides 
> HiveIndexedInputFormat that provides extra logic and recalculates the split 
> boundaries for each split depending on the sequence file's boundaries.
> However we noticed this behavior of "over" reporting from data backed by 
> sequence file. We've a sample data on which we experimented and fixed this 
> bug, we have verified this fix by comparing the query output for input being 
> sequence file format, rc file and regular format. However we have not able to 
> find the right place to include this as a unit test that would execute as 
> part of hive tests. We tried writing a "clientpositive" test as part of ql 
> module but the output seems quite verbose and i couldn't interpret it that 
> well. Can someone please review this change and guide on how to write a test 
> that will execute as part of Hive testing?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7239) Fix bug in HiveIndexedInputFormat implementation that causes incorrect query result when input backed by Sequence/RC files

2014-06-16 Thread Sumit Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033440#comment-14033440
 ] 

Sumit Kumar commented on HIVE-7239:
---

[~ashutoshc] Is this something you would review and recommend on? The test 
results seem to be known failures (please correct me if i'm wrong). In case you 
are not the right person, will you be able to include the right person please?

> Fix bug in HiveIndexedInputFormat implementation that causes incorrect query 
> result when input backed by Sequence/RC files
> --
>
> Key: HIVE-7239
> URL: https://issues.apache.org/jira/browse/HIVE-7239
> Project: Hive
>  Issue Type: Bug
>  Components: Indexing
>Affects Versions: 0.13.1
>Reporter: Sumit Kumar
>Assignee: Sumit Kumar
> Attachments: HIVE-7239.patch
>
>
> In case of sequence files, it's crucial that splits are calculated around the 
> boundaries enforced by the input sequence file. However by default hadoop 
> creates input splits depending on the configuration parameters which may not 
> match the boundaries for the input sequence file. Hive provides 
> HiveIndexedInputFormat that provides extra logic and recalculates the split 
> boundaries for each split depending on the sequence file's boundaries.
> However we noticed this behavior of "over" reporting from data backed by 
> sequence file. We've a sample data on which we experimented and fixed this 
> bug, we have verified this fix by comparing the query output for input being 
> sequence file format, rc file and regular format. However we have not able to 
> find the right place to include this as a unit test that would execute as 
> part of hive tests. We tried writing a "clientpositive" test as part of ql 
> module but the output seems quite verbose and i couldn't interpret it that 
> well. Can someone please review this change and guide on how to write a test 
> that will execute as part of Hive testing?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7298) desc database extended does not show properties of the database

2014-06-26 Thread Sumit Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044918#comment-14044918
 ] 

Sumit Kumar commented on HIVE-7298:
---

+1

> desc database extended does not show properties of the database
> ---
>
> Key: HIVE-7298
> URL: https://issues.apache.org/jira/browse/HIVE-7298
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-7298.1.patch.txt
>
>
> HIVE-6386 added owner information to desc, but not updated schema of it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7097) The Support for REGEX Column Broken in HIVE 0.13

2014-06-26 Thread Sumit Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045065#comment-14045065
 ] 

Sumit Kumar commented on HIVE-7097:
---

[~sunrui] I hit this today and found following references useful:

# 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterColumn
# 
https://issues.apache.org/jira/secure/attachment/12618321/QuotedIdentifier.html

In short the functionality is still there but you need to set 
hive.support.quoted.identifiers to none to get the pre-0.13 behavior. I was 
able to run my query after
{code:actionscript}
hive> set hive.support.quoted.identifiers=none;
{code}

My query was something like:
{code:actionscript}
hive> select `(col1|col2|col3)?+.+` from testTable1;
{code}


> The Support for REGEX Column Broken in HIVE 0.13
> 
>
> Key: HIVE-7097
> URL: https://issues.apache.org/jira/browse/HIVE-7097
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Sun Rui
>
> The Support for REGEX Column is OK in HIVE 0.12, but is broken in HIVE 0.13.
> For example:
> {code:sql}
> select `key.*` from src limit 1;
> {code}
> will fail in HIVE 0.13 with the following error from SemanticAnalyzer:
> {noformat}
> FAILED: SemanticException [Error 10004]: Line 1:7 Invalid table alias or 
> column reference 'key.*': (possible column names are: key, value)
> {noformat}
> This issue is related to HIVE-6037. When set 
> "hive.support.quoted.identifiers=none", the issue will be gone.
> I am not sure the configuration was intended to break regex column. But at 
> least the documentation needs to be updated: 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-REGEXColumnSpecification
> I would argue backward compatibility is more important.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6037) Synchronize HiveConf with hive-default.xml.template and support show conf

2014-06-26 Thread Sumit Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045070#comment-14045070
 ] 

Sumit Kumar commented on HIVE-6037:
---

[~leftylev] Here is the JIRA that decided to remove hive-default.xml and import 
all configuration changes in HiveConf itself.

> Synchronize HiveConf with hive-default.xml.template and support show conf
> -
>
> Key: HIVE-6037
> URL: https://issues.apache.org/jira/browse/HIVE-6037
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Fix For: 0.14.0
>
> Attachments: CHIVE-6037.3.patch.txt, HIVE-6037-0.13.0, 
> HIVE-6037.1.patch.txt, HIVE-6037.10.patch.txt, HIVE-6037.11.patch.txt, 
> HIVE-6037.12.patch.txt, HIVE-6037.14.patch.txt, HIVE-6037.15.patch.txt, 
> HIVE-6037.16.patch.txt, HIVE-6037.17.patch, HIVE-6037.2.patch.txt, 
> HIVE-6037.4.patch.txt, HIVE-6037.5.patch.txt, HIVE-6037.6.patch.txt, 
> HIVE-6037.7.patch.txt, HIVE-6037.8.patch.txt, HIVE-6037.9.patch.txt, 
> HIVE-6037.patch
>
>
> see HIVE-5879



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7097) The Support for REGEX Column Broken in HIVE 0.13

2014-06-26 Thread Sumit Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045084#comment-14045084
 ] 

Sumit Kumar commented on HIVE-7097:
---

Basically this doesn't seem to be an issue but it would help if we clarify this 
in [Select 
documentation|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select]
 as well . 

> The Support for REGEX Column Broken in HIVE 0.13
> 
>
> Key: HIVE-7097
> URL: https://issues.apache.org/jira/browse/HIVE-7097
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Sun Rui
>
> The Support for REGEX Column is OK in HIVE 0.12, but is broken in HIVE 0.13.
> For example:
> {code:sql}
> select `key.*` from src limit 1;
> {code}
> will fail in HIVE 0.13 with the following error from SemanticAnalyzer:
> {noformat}
> FAILED: SemanticException [Error 10004]: Line 1:7 Invalid table alias or 
> column reference 'key.*': (possible column names are: key, value)
> {noformat}
> This issue is related to HIVE-6037. When set 
> "hive.support.quoted.identifiers=none", the issue will be gone.
> I am not sure the configuration was intended to break regex column. But at 
> least the documentation needs to be updated: 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-REGEXColumnSpecification
> I would argue backward compatibility is more important.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-7097) The Support for REGEX Column Broken in HIVE 0.13

2014-06-27 Thread Sumit Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Kumar resolved HIVE-7097.
---

Resolution: Not a Problem

Thank you [~leftylev]. Marking this "Resolved/Not a problem"

> The Support for REGEX Column Broken in HIVE 0.13
> 
>
> Key: HIVE-7097
> URL: https://issues.apache.org/jira/browse/HIVE-7097
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Sun Rui
>
> The Support for REGEX Column is OK in HIVE 0.12, but is broken in HIVE 0.13.
> For example:
> {code:sql}
> select `key.*` from src limit 1;
> {code}
> will fail in HIVE 0.13 with the following error from SemanticAnalyzer:
> {noformat}
> FAILED: SemanticException [Error 10004]: Line 1:7 Invalid table alias or 
> column reference 'key.*': (possible column names are: key, value)
> {noformat}
> This issue is related to HIVE-6037. When set 
> "hive.support.quoted.identifiers=none", the issue will be gone.
> I am not sure the configuration was intended to break regex column. But at 
> least the documentation needs to be updated: 
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-REGEXColumnSpecification
> I would argue backward compatibility is more important.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-2089) Add a new input format to be able to combine multiple .gz text files

2014-06-27 Thread Sumit Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Kumar resolved HIVE-2089.
---

Resolution: Won't Fix

I verified [~slider]'s observation. It indeed works. Marking this JIRA as 
"Won't Fix"

> Add a new input format to be able to combine multiple .gz text files
> 
>
> Key: HIVE-2089
> URL: https://issues.apache.org/jira/browse/HIVE-2089
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
> Attachments: HIVE-2089.1.patch
>
>
> For files that is not splittable, CombineHiveInputFormat won't help. This 
> jira is to add a new inputformat to support this feature. This is very useful 
> for partitions with tens of thousands of .gz files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)