XML Serde
Hi, So i looked for a generic approach for handling xml files in hive but found none and thought i could use the concepts from json-serde ( http://code.google.com/p/hive-json-serde/) in creating a generic xml serde. XPath was something that came immediately in my mind and should work in the same way that json works for json-serde. The problem is with the use case that one xml file could contain multiple rows of interest in a single xml file. Example shown below. ... ... ... In this case, serde is supposed to generate three rows for each book node. I looked at json-serde implementation but there the deserialize step returns an ArrayList instance with column values set in indices of the ArrayList; and this one instance maps to one row. I do see that deserialize step can return any java Object but not sure what would be the appropriate way to return multiple rows corresponding to each book node. I'm going to give it a shot anyway but thought to seek help from the community if somebody has already tried this or has a better approach. Would really appreciate any input, if i succeed, i will share my code; if not, i will anyway come back :-) Thanks in advance. -Sumit
Re: XML Serde
So i found this discussion on this topic http://mail-archives.apache.org/mod_mbox/hive-user/201006.mbox/%3caanlktikyl3hinowfo36yeyid9vojyh_6pe3slorhy...@mail.gmail.com%3E. Makes more sense now. Will post my final resolution. On Sun, Jun 24, 2012 at 10:39 PM, Sumit Kumar wrote: > Hi, > > So i looked for a generic approach for handling xml files in hive but > found none and thought i could use the concepts from json-serde ( > http://code.google.com/p/hive-json-serde/) in creating a generic xml > serde. XPath was something that came immediately in my mind and should work > in the same way that json works for json-serde. The problem is with the use > case that one xml file could contain multiple rows of interest in a single > xml file. Example shown below. > > > ... > ... > ... > > > In this case, serde is supposed to generate three rows for each book node. > I looked at json-serde implementation but there the deserialize step > returns an ArrayList instance with column values set in indices of the > ArrayList; and this one instance maps to one row. I do see that deserialize > step can return any java Object but not sure what would be the appropriate > way to return multiple rows corresponding to each book node. I'm going to > give it a shot anyway but thought to seek help from the community if > somebody has already tried this or has a better approach. Would really > appreciate any input, if i succeed, i will share my code; if not, i will > anyway come back :-) > > Thanks in advance. > -Sumit >
"desc database extended " doesn't print dbproperties?
Hey guys, I just discovered that this syntax doesn't print the dbproperties any more. I've two hive versions that i'm testing following query on: create database test2 with dbproperties ('key1' = 'value1', 'key2' = 'value2'); desc database extended test2; The output on hive 11 is: hive> desc database extended test2; OK test2 hdfs://:9000/warehouse/test2.db {key2=value2, key1=value1} Time taken: 0.021 seconds, Fetched: 1 row(s) The output on hive 13 is: hive> desc database extended test2; OK test2 hdfs://:9000/warehouse/test2.db hadoop Time taken: 0.023 seconds, Fetched: 1 row(s) If you look closely, you would notice that no key value information from dbproperties was printed in hive13 case and somehow magically "hadoop" (i guess it's my userid) appeared. Any idea if this functionality changed since hive 11? Do we have a reference jira? I searched on the wikis and JIRAs but couldn't find a reference; surprised that the language manual wiki (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL) doesn't even talk about this functionality any more. Would appreciate input on this. Thanks, -Sumit
[jira] [Commented] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system
[ https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207168#comment-14207168 ] Sumit Kumar commented on HIVE-7136: --- I still don't seem to have "write" access, will you please grant me the same? > Allow Hive to read hive scripts from any of the supported file systems in > hadoop eco-system > --- > > Key: HIVE-7136 > URL: https://issues.apache.org/jira/browse/HIVE-7136 > Project: Hive > Issue Type: Improvement > Components: CLI > Affects Versions: 0.13.0 > Reporter: Sumit Kumar >Assignee: Sumit Kumar >Priority: Minor > Labels: TODOC14 > Fix For: 0.14.0 > > Attachments: HIVE-7136.01.patch, HIVE-7136.patch > > > Current hive cli assumes that the source file (hive script) is always on the > local file system. This patch implements support for reading source files > from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping > the default behavior intact to be reading from default filesystem (local) in > case scheme is not provided in the url for the source file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system
[ https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208278#comment-14208278 ] Sumit Kumar commented on HIVE-7136: --- [~leftylev] I already have a confluence id: "ksumit" (without quotes). Apologies for not mentioning the same earlier. > Allow Hive to read hive scripts from any of the supported file systems in > hadoop eco-system > --- > > Key: HIVE-7136 > URL: https://issues.apache.org/jira/browse/HIVE-7136 > Project: Hive > Issue Type: Improvement > Components: CLI >Affects Versions: 0.13.0 > Reporter: Sumit Kumar >Assignee: Sumit Kumar >Priority: Minor > Labels: TODOC14 > Fix For: 0.14.0 > > Attachments: HIVE-7136.01.patch, HIVE-7136.patch > > > Current hive cli assumes that the source file (hive script) is always on the > local file system. This patch implements support for reading source files > from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping > the default behavior intact to be reading from default filesystem (local) in > case scheme is not provided in the url for the source file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system
[ https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208671#comment-14208671 ] Sumit Kumar commented on HIVE-7136: --- Thank you. I just updated both the wikis. > Allow Hive to read hive scripts from any of the supported file systems in > hadoop eco-system > --- > > Key: HIVE-7136 > URL: https://issues.apache.org/jira/browse/HIVE-7136 > Project: Hive > Issue Type: Improvement > Components: CLI >Affects Versions: 0.13.0 >Reporter: Sumit Kumar >Assignee: Sumit Kumar >Priority: Minor > Labels: TODOC14 > Fix For: 0.14.0 > > Attachments: HIVE-7136.01.patch, HIVE-7136.patch > > > Current hive cli assumes that the source file (hive script) is always on the > local file system. This patch implements support for reading source files > from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping > the default behavior intact to be reading from default filesystem (local) in > case scheme is not provided in the url for the source file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system
Sumit Kumar created HIVE-7136: - Summary: Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system Key: HIVE-7136 URL: https://issues.apache.org/jira/browse/HIVE-7136 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.13.0 Reporter: Sumit Kumar Priority: Minor Current hive cli assumes that the source file (hive script) is always on the local file system. This patch implements support for reading source files from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping the default behavior intact to be reading from default filesystem (local) in case scheme is not provided in the url for the source file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system
[ https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated HIVE-7136: -- Attachment: HIVE-7136.patch > Allow Hive to read hive scripts from any of the supported file systems in > hadoop eco-system > --- > > Key: HIVE-7136 > URL: https://issues.apache.org/jira/browse/HIVE-7136 > Project: Hive > Issue Type: Improvement > Components: CLI >Affects Versions: 0.13.0 >Reporter: Sumit Kumar >Priority: Minor > Attachments: HIVE-7136.patch > > > Current hive cli assumes that the source file (hive script) is always on the > local file system. This patch implements support for reading source files > from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping > the default behavior intact to be reading from default filesystem (local) in > case scheme is not provided in the url for the source file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system
[ https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated HIVE-7136: -- Status: Patch Available (was: Open) > Allow Hive to read hive scripts from any of the supported file systems in > hadoop eco-system > --- > > Key: HIVE-7136 > URL: https://issues.apache.org/jira/browse/HIVE-7136 > Project: Hive > Issue Type: Improvement > Components: CLI >Affects Versions: 0.13.0 >Reporter: Sumit Kumar >Priority: Minor > Attachments: HIVE-7136.patch > > > Current hive cli assumes that the source file (hive script) is always on the > local file system. This patch implements support for reading source files > from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping > the default behavior intact to be reading from default filesystem (local) in > case scheme is not provided in the url for the source file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7137) Add progressable to writer interfaces so they could report progress while different operations are in progress
[ https://issues.apache.org/jira/browse/HIVE-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated HIVE-7137: -- Affects Version/s: 0.13.0 > Add progressable to writer interfaces so they could report progress while > different operations are in progress > -- > > Key: HIVE-7137 > URL: https://issues.apache.org/jira/browse/HIVE-7137 > Project: Hive > Issue Type: Improvement >Affects Versions: 0.13.0 > Reporter: Sumit Kumar >Priority: Minor > > This patch is to pass Progressable instance along with different Writer > implementations. Without this jobs fail whenever a bulk write operation takes > longer than usual. With this patch, writers keep sending heartbeat and job > keeps running fine. Hive already provided support for this so this is a minor > addition. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7137) Add progressable to writer interfaces so they could report progress while different operations are in progress
Sumit Kumar created HIVE-7137: - Summary: Add progressable to writer interfaces so they could report progress while different operations are in progress Key: HIVE-7137 URL: https://issues.apache.org/jira/browse/HIVE-7137 Project: Hive Issue Type: Improvement Reporter: Sumit Kumar Priority: Minor This patch is to pass Progressable instance along with different Writer implementations. Without this jobs fail whenever a bulk write operation takes longer than usual. With this patch, writers keep sending heartbeat and job keeps running fine. Hive already provided support for this so this is a minor addition. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7137) Add progressable to writer interfaces so they could report progress while different operations are in progress
[ https://issues.apache.org/jira/browse/HIVE-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated HIVE-7137: -- Component/s: Query Processor > Add progressable to writer interfaces so they could report progress while > different operations are in progress > -- > > Key: HIVE-7137 > URL: https://issues.apache.org/jira/browse/HIVE-7137 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.13.0 >Reporter: Sumit Kumar >Priority: Minor > > This patch is to pass Progressable instance along with different Writer > implementations. Without this jobs fail whenever a bulk write operation takes > longer than usual. With this patch, writers keep sending heartbeat and job > keeps running fine. Hive already provided support for this so this is a minor > addition. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7137) Add progressable to writer interfaces so they could report progress while different operations are in progress
[ https://issues.apache.org/jira/browse/HIVE-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated HIVE-7137: -- Attachment: HIVE-7137.patch > Add progressable to writer interfaces so they could report progress while > different operations are in progress > -- > > Key: HIVE-7137 > URL: https://issues.apache.org/jira/browse/HIVE-7137 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.13.0 >Reporter: Sumit Kumar >Priority: Minor > Attachments: HIVE-7137.patch > > > This patch is to pass Progressable instance along with different Writer > implementations. Without this jobs fail whenever a bulk write operation takes > longer than usual. With this patch, writers keep sending heartbeat and job > keeps running fine. Hive already provided support for this so this is a minor > addition. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7137) Add progressable to writer interfaces so they could report progress while different operations are in progress
[ https://issues.apache.org/jira/browse/HIVE-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated HIVE-7137: -- Affects Version/s: (was: 0.13.0) 0.13.1 Release Note: This patch has been rebased to current state of hive 0.13 branch (0.13.1) Status: Patch Available (was: Open) > Add progressable to writer interfaces so they could report progress while > different operations are in progress > -- > > Key: HIVE-7137 > URL: https://issues.apache.org/jira/browse/HIVE-7137 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.13.1 >Reporter: Sumit Kumar >Priority: Minor > Attachments: HIVE-7137.patch > > > This patch is to pass Progressable instance along with different Writer > implementations. Without this jobs fail whenever a bulk write operation takes > longer than usual. With this patch, writers keep sending heartbeat and job > keeps running fine. Hive already provided support for this so this is a minor > addition. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7137) Add progressable to writer interfaces so they could report progress while different operations are in progress
[ https://issues.apache.org/jira/browse/HIVE-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar reassigned HIVE-7137: - Assignee: Sumit Kumar > Add progressable to writer interfaces so they could report progress while > different operations are in progress > -- > > Key: HIVE-7137 > URL: https://issues.apache.org/jira/browse/HIVE-7137 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.13.1 >Reporter: Sumit Kumar >Assignee: Sumit Kumar >Priority: Minor > Attachments: HIVE-7137.patch > > > This patch is to pass Progressable instance along with different Writer > implementations. Without this jobs fail whenever a bulk write operation takes > longer than usual. With this patch, writers keep sending heartbeat and job > keeps running fine. Hive already provided support for this so this is a minor > addition. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system
[ https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012827#comment-14012827 ] Sumit Kumar commented on HIVE-7136: --- That should be easy to do. I'll run these failed tests locally to ensure they pass before submitting the new patch. > Allow Hive to read hive scripts from any of the supported file systems in > hadoop eco-system > --- > > Key: HIVE-7136 > URL: https://issues.apache.org/jira/browse/HIVE-7136 > Project: Hive > Issue Type: Improvement > Components: CLI >Affects Versions: 0.13.0 >Reporter: Sumit Kumar >Assignee: Sumit Kumar >Priority: Minor > Attachments: HIVE-7136.patch > > > Current hive cli assumes that the source file (hive script) is always on the > local file system. This patch implements support for reading source files > from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping > the default behavior intact to be reading from default filesystem (local) in > case scheme is not provided in the url for the source file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system
[ https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014767#comment-14014767 ] Sumit Kumar commented on HIVE-7136: --- [~ashutoshc] IOUtils.close(bufferedReader) is already doing that in the finally block right? > Allow Hive to read hive scripts from any of the supported file systems in > hadoop eco-system > --- > > Key: HIVE-7136 > URL: https://issues.apache.org/jira/browse/HIVE-7136 > Project: Hive > Issue Type: Improvement > Components: CLI >Affects Versions: 0.13.0 >Reporter: Sumit Kumar >Assignee: Sumit Kumar >Priority: Minor > Attachments: HIVE-7136.patch > > > Current hive cli assumes that the source file (hive script) is always on the > local file system. This patch implements support for reading source files > from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping > the default behavior intact to be reading from default filesystem (local) in > case scheme is not provided in the url for the source file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system
[ https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014779#comment-14014779 ] Sumit Kumar commented on HIVE-7136: --- Thanks for confirming that [~ashutoshc]. I've the patch ready, will submit asap. > Allow Hive to read hive scripts from any of the supported file systems in > hadoop eco-system > --- > > Key: HIVE-7136 > URL: https://issues.apache.org/jira/browse/HIVE-7136 > Project: Hive > Issue Type: Improvement > Components: CLI >Affects Versions: 0.13.0 >Reporter: Sumit Kumar >Assignee: Sumit Kumar >Priority: Minor > Attachments: HIVE-7136.patch > > > Current hive cli assumes that the source file (hive script) is always on the > local file system. This patch implements support for reading source files > from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping > the default behavior intact to be reading from default filesystem (local) in > case scheme is not provided in the url for the source file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system
[ https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated HIVE-7136: -- Attachment: HIVE-7136-1.patch > Allow Hive to read hive scripts from any of the supported file systems in > hadoop eco-system > --- > > Key: HIVE-7136 > URL: https://issues.apache.org/jira/browse/HIVE-7136 > Project: Hive > Issue Type: Improvement > Components: CLI >Affects Versions: 0.13.0 >Reporter: Sumit Kumar >Assignee: Sumit Kumar >Priority: Minor > Attachments: HIVE-7136-1.patch, HIVE-7136.patch > > > Current hive cli assumes that the source file (hive script) is always on the > local file system. This patch implements support for reading source files > from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping > the default behavior intact to be reading from default filesystem (local) in > case scheme is not provided in the url for the source file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system
[ https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated HIVE-7136: -- Status: Patch Available (was: Open) Updated the patch to use FileSystem api instead of FileContext to support hadoop-1 as well. > Allow Hive to read hive scripts from any of the supported file systems in > hadoop eco-system > --- > > Key: HIVE-7136 > URL: https://issues.apache.org/jira/browse/HIVE-7136 > Project: Hive > Issue Type: Improvement > Components: CLI >Affects Versions: 0.13.0 >Reporter: Sumit Kumar >Assignee: Sumit Kumar >Priority: Minor > Attachments: HIVE-7136-1.patch, HIVE-7136.patch > > > Current hive cli assumes that the source file (hive script) is always on the > local file system. This patch implements support for reading source files > from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping > the default behavior intact to be reading from default filesystem (local) in > case scheme is not provided in the url for the source file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system
[ https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14015489#comment-14015489 ] Sumit Kumar commented on HIVE-7136: --- [~ashutoshc] wonder if there is a list of test cases that are known to fail. It would be helpful for new contributions. > Allow Hive to read hive scripts from any of the supported file systems in > hadoop eco-system > --- > > Key: HIVE-7136 > URL: https://issues.apache.org/jira/browse/HIVE-7136 > Project: Hive > Issue Type: Improvement > Components: CLI >Affects Versions: 0.13.0 >Reporter: Sumit Kumar >Assignee: Sumit Kumar >Priority: Minor > Attachments: HIVE-7136-1.patch, HIVE-7136.patch > > > Current hive cli assumes that the source file (hive script) is always on the > local file system. This patch implements support for reading source files > from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping > the default behavior intact to be reading from default filesystem (local) in > case scheme is not provided in the url for the source file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system
[ https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated HIVE-7136: -- Attachment: (was: HIVE-7136-1.patch) > Allow Hive to read hive scripts from any of the supported file systems in > hadoop eco-system > --- > > Key: HIVE-7136 > URL: https://issues.apache.org/jira/browse/HIVE-7136 > Project: Hive > Issue Type: Improvement > Components: CLI >Affects Versions: 0.13.0 >Reporter: Sumit Kumar >Assignee: Sumit Kumar >Priority: Minor > Attachments: HIVE-7136.01.patch, HIVE-7136.patch > > > Current hive cli assumes that the source file (hive script) is always on the > local file system. This patch implements support for reading source files > from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping > the default behavior intact to be reading from default filesystem (local) in > case scheme is not provided in the url for the source file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system
[ https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated HIVE-7136: -- Attachment: HIVE-7136.01.patch Renaming the patch file name to meet ptest requirements > Allow Hive to read hive scripts from any of the supported file systems in > hadoop eco-system > --- > > Key: HIVE-7136 > URL: https://issues.apache.org/jira/browse/HIVE-7136 > Project: Hive > Issue Type: Improvement > Components: CLI >Affects Versions: 0.13.0 >Reporter: Sumit Kumar >Assignee: Sumit Kumar >Priority: Minor > Attachments: HIVE-7136.01.patch, HIVE-7136.patch > > > Current hive cli assumes that the source file (hive script) is always on the > local file system. This patch implements support for reading source files > from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping > the default behavior intact to be reading from default filesystem (local) in > case scheme is not provided in the url for the source file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system
[ https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated HIVE-7136: -- Status: Patch Available (was: Open) > Allow Hive to read hive scripts from any of the supported file systems in > hadoop eco-system > --- > > Key: HIVE-7136 > URL: https://issues.apache.org/jira/browse/HIVE-7136 > Project: Hive > Issue Type: Improvement > Components: CLI >Affects Versions: 0.13.0 >Reporter: Sumit Kumar >Assignee: Sumit Kumar >Priority: Minor > Attachments: HIVE-7136.01.patch, HIVE-7136.patch > > > Current hive cli assumes that the source file (hive script) is always on the > local file system. This patch implements support for reading source files > from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping > the default behavior intact to be reading from default filesystem (local) in > case scheme is not provided in the url for the source file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-2777) ability to add and drop partitions atomically
[ https://issues.apache.org/jira/browse/HIVE-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated HIVE-2777: -- Status: Open (was: Patch Available) > ability to add and drop partitions atomically > - > > Key: HIVE-2777 > URL: https://issues.apache.org/jira/browse/HIVE-2777 > Project: Hive > Issue Type: New Feature > Components: Metastore >Affects Versions: 0.13.0 >Reporter: Aniket Mokashi >Assignee: Aniket Mokashi > Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2777.D2271.1.patch, > hive-2777.patch > > > Hive should have ability to atomically add and drop partitions. This way > admins can change partitions atomically without breaking the running jobs. It > allows admin to merge several partitions into one. > Essentially, we would like to have an api- add_drop_partitions(String db, > String tbl_name, List addParts, List> dropParts, > boolean deleteData); > This jira covers changes required for metastore and thrift. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7136) Allow Hive to read hive scripts from any of the supported file systems in hadoop eco-system
[ https://issues.apache.org/jira/browse/HIVE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021109#comment-14021109 ] Sumit Kumar commented on HIVE-7136: --- Sure, will update the documentation. > Allow Hive to read hive scripts from any of the supported file systems in > hadoop eco-system > --- > > Key: HIVE-7136 > URL: https://issues.apache.org/jira/browse/HIVE-7136 > Project: Hive > Issue Type: Improvement > Components: CLI >Affects Versions: 0.13.0 >Reporter: Sumit Kumar >Assignee: Sumit Kumar >Priority: Minor > Fix For: 0.14.0 > > Attachments: HIVE-7136.01.patch, HIVE-7136.patch > > > Current hive cli assumes that the source file (hive script) is always on the > local file system. This patch implements support for reading source files > from other file systems in hadoop eco-system (hdfs, s3 etc) as well keeping > the default behavior intact to be reading from default filesystem (local) in > case scheme is not provided in the url for the source file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7239) Fix bug in HiveIndexedInputFormat implementation that causes incorrect query result when input backed by Sequence/RC files
Sumit Kumar created HIVE-7239: - Summary: Fix bug in HiveIndexedInputFormat implementation that causes incorrect query result when input backed by Sequence/RC files Key: HIVE-7239 URL: https://issues.apache.org/jira/browse/HIVE-7239 Project: Hive Issue Type: Bug Components: Indexing Affects Versions: 0.13.1 Reporter: Sumit Kumar Assignee: Sumit Kumar In case of sequence files, it's crucial that splits are calculated around the boundaries enforced by the input sequence file. However by default hadoop creates input splits depending on the configuration parameters which may not match the boundaries for the input sequence file. Hive provides HiveIndexedInputFormat that provides extra logic and recalculates the split boundaries for each split depending on the sequence file's boundaries. However we noticed this behavior of "over" reporting from data backed by sequence file. We've a sample data on which we experimented and fixed this bug, we have verified this fix by comparing the query output for input being sequence file format, rc file and regular format. However we have not able to find the right place to include this as a unit test that would execute as part of hive tests. We tried writing a "clientpositive" test as part of ql module but the output seems quite verbose and i couldn't interpret it that well. Can someone please review this change and guide on how to write a test that will execute as part of Hive testing? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7239) Fix bug in HiveIndexedInputFormat implementation that causes incorrect query result when input backed by Sequence/RC files
[ https://issues.apache.org/jira/browse/HIVE-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated HIVE-7239: -- Status: Patch Available (was: Open) > Fix bug in HiveIndexedInputFormat implementation that causes incorrect query > result when input backed by Sequence/RC files > -- > > Key: HIVE-7239 > URL: https://issues.apache.org/jira/browse/HIVE-7239 > Project: Hive > Issue Type: Bug > Components: Indexing >Affects Versions: 0.13.1 >Reporter: Sumit Kumar >Assignee: Sumit Kumar > Attachments: HIVE-7239.patch > > > In case of sequence files, it's crucial that splits are calculated around the > boundaries enforced by the input sequence file. However by default hadoop > creates input splits depending on the configuration parameters which may not > match the boundaries for the input sequence file. Hive provides > HiveIndexedInputFormat that provides extra logic and recalculates the split > boundaries for each split depending on the sequence file's boundaries. > However we noticed this behavior of "over" reporting from data backed by > sequence file. We've a sample data on which we experimented and fixed this > bug, we have verified this fix by comparing the query output for input being > sequence file format, rc file and regular format. However we have not able to > find the right place to include this as a unit test that would execute as > part of hive tests. We tried writing a "clientpositive" test as part of ql > module but the output seems quite verbose and i couldn't interpret it that > well. Can someone please review this change and guide on how to write a test > that will execute as part of Hive testing? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7239) Fix bug in HiveIndexedInputFormat implementation that causes incorrect query result when input backed by Sequence/RC files
[ https://issues.apache.org/jira/browse/HIVE-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar updated HIVE-7239: -- Attachment: HIVE-7239.patch Please review and recommend a way to test this patch as part of hive unit tests/cli tests/otherwise > Fix bug in HiveIndexedInputFormat implementation that causes incorrect query > result when input backed by Sequence/RC files > -- > > Key: HIVE-7239 > URL: https://issues.apache.org/jira/browse/HIVE-7239 > Project: Hive > Issue Type: Bug > Components: Indexing >Affects Versions: 0.13.1 >Reporter: Sumit Kumar >Assignee: Sumit Kumar > Attachments: HIVE-7239.patch > > > In case of sequence files, it's crucial that splits are calculated around the > boundaries enforced by the input sequence file. However by default hadoop > creates input splits depending on the configuration parameters which may not > match the boundaries for the input sequence file. Hive provides > HiveIndexedInputFormat that provides extra logic and recalculates the split > boundaries for each split depending on the sequence file's boundaries. > However we noticed this behavior of "over" reporting from data backed by > sequence file. We've a sample data on which we experimented and fixed this > bug, we have verified this fix by comparing the query output for input being > sequence file format, rc file and regular format. However we have not able to > find the right place to include this as a unit test that would execute as > part of hive tests. We tried writing a "clientpositive" test as part of ql > module but the output seems quite verbose and i couldn't interpret it that > well. Can someone please review this change and guide on how to write a test > that will execute as part of Hive testing? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7239) Fix bug in HiveIndexedInputFormat implementation that causes incorrect query result when input backed by Sequence/RC files
[ https://issues.apache.org/jira/browse/HIVE-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033440#comment-14033440 ] Sumit Kumar commented on HIVE-7239: --- [~ashutoshc] Is this something you would review and recommend on? The test results seem to be known failures (please correct me if i'm wrong). In case you are not the right person, will you be able to include the right person please? > Fix bug in HiveIndexedInputFormat implementation that causes incorrect query > result when input backed by Sequence/RC files > -- > > Key: HIVE-7239 > URL: https://issues.apache.org/jira/browse/HIVE-7239 > Project: Hive > Issue Type: Bug > Components: Indexing >Affects Versions: 0.13.1 >Reporter: Sumit Kumar >Assignee: Sumit Kumar > Attachments: HIVE-7239.patch > > > In case of sequence files, it's crucial that splits are calculated around the > boundaries enforced by the input sequence file. However by default hadoop > creates input splits depending on the configuration parameters which may not > match the boundaries for the input sequence file. Hive provides > HiveIndexedInputFormat that provides extra logic and recalculates the split > boundaries for each split depending on the sequence file's boundaries. > However we noticed this behavior of "over" reporting from data backed by > sequence file. We've a sample data on which we experimented and fixed this > bug, we have verified this fix by comparing the query output for input being > sequence file format, rc file and regular format. However we have not able to > find the right place to include this as a unit test that would execute as > part of hive tests. We tried writing a "clientpositive" test as part of ql > module but the output seems quite verbose and i couldn't interpret it that > well. Can someone please review this change and guide on how to write a test > that will execute as part of Hive testing? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7298) desc database extended does not show properties of the database
[ https://issues.apache.org/jira/browse/HIVE-7298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044918#comment-14044918 ] Sumit Kumar commented on HIVE-7298: --- +1 > desc database extended does not show properties of the database > --- > > Key: HIVE-7298 > URL: https://issues.apache.org/jira/browse/HIVE-7298 > Project: Hive > Issue Type: Bug > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Minor > Attachments: HIVE-7298.1.patch.txt > > > HIVE-6386 added owner information to desc, but not updated schema of it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7097) The Support for REGEX Column Broken in HIVE 0.13
[ https://issues.apache.org/jira/browse/HIVE-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045065#comment-14045065 ] Sumit Kumar commented on HIVE-7097: --- [~sunrui] I hit this today and found following references useful: # https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterColumn # https://issues.apache.org/jira/secure/attachment/12618321/QuotedIdentifier.html In short the functionality is still there but you need to set hive.support.quoted.identifiers to none to get the pre-0.13 behavior. I was able to run my query after {code:actionscript} hive> set hive.support.quoted.identifiers=none; {code} My query was something like: {code:actionscript} hive> select `(col1|col2|col3)?+.+` from testTable1; {code} > The Support for REGEX Column Broken in HIVE 0.13 > > > Key: HIVE-7097 > URL: https://issues.apache.org/jira/browse/HIVE-7097 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.13.0 >Reporter: Sun Rui > > The Support for REGEX Column is OK in HIVE 0.12, but is broken in HIVE 0.13. > For example: > {code:sql} > select `key.*` from src limit 1; > {code} > will fail in HIVE 0.13 with the following error from SemanticAnalyzer: > {noformat} > FAILED: SemanticException [Error 10004]: Line 1:7 Invalid table alias or > column reference 'key.*': (possible column names are: key, value) > {noformat} > This issue is related to HIVE-6037. When set > "hive.support.quoted.identifiers=none", the issue will be gone. > I am not sure the configuration was intended to break regex column. But at > least the documentation needs to be updated: > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-REGEXColumnSpecification > I would argue backward compatibility is more important. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6037) Synchronize HiveConf with hive-default.xml.template and support show conf
[ https://issues.apache.org/jira/browse/HIVE-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045070#comment-14045070 ] Sumit Kumar commented on HIVE-6037: --- [~leftylev] Here is the JIRA that decided to remove hive-default.xml and import all configuration changes in HiveConf itself. > Synchronize HiveConf with hive-default.xml.template and support show conf > - > > Key: HIVE-6037 > URL: https://issues.apache.org/jira/browse/HIVE-6037 > Project: Hive > Issue Type: Improvement > Components: Configuration >Reporter: Navis >Assignee: Navis >Priority: Minor > Fix For: 0.14.0 > > Attachments: CHIVE-6037.3.patch.txt, HIVE-6037-0.13.0, > HIVE-6037.1.patch.txt, HIVE-6037.10.patch.txt, HIVE-6037.11.patch.txt, > HIVE-6037.12.patch.txt, HIVE-6037.14.patch.txt, HIVE-6037.15.patch.txt, > HIVE-6037.16.patch.txt, HIVE-6037.17.patch, HIVE-6037.2.patch.txt, > HIVE-6037.4.patch.txt, HIVE-6037.5.patch.txt, HIVE-6037.6.patch.txt, > HIVE-6037.7.patch.txt, HIVE-6037.8.patch.txt, HIVE-6037.9.patch.txt, > HIVE-6037.patch > > > see HIVE-5879 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7097) The Support for REGEX Column Broken in HIVE 0.13
[ https://issues.apache.org/jira/browse/HIVE-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045084#comment-14045084 ] Sumit Kumar commented on HIVE-7097: --- Basically this doesn't seem to be an issue but it would help if we clarify this in [Select documentation|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select] as well . > The Support for REGEX Column Broken in HIVE 0.13 > > > Key: HIVE-7097 > URL: https://issues.apache.org/jira/browse/HIVE-7097 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.13.0 >Reporter: Sun Rui > > The Support for REGEX Column is OK in HIVE 0.12, but is broken in HIVE 0.13. > For example: > {code:sql} > select `key.*` from src limit 1; > {code} > will fail in HIVE 0.13 with the following error from SemanticAnalyzer: > {noformat} > FAILED: SemanticException [Error 10004]: Line 1:7 Invalid table alias or > column reference 'key.*': (possible column names are: key, value) > {noformat} > This issue is related to HIVE-6037. When set > "hive.support.quoted.identifiers=none", the issue will be gone. > I am not sure the configuration was intended to break regex column. But at > least the documentation needs to be updated: > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-REGEXColumnSpecification > I would argue backward compatibility is more important. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-7097) The Support for REGEX Column Broken in HIVE 0.13
[ https://issues.apache.org/jira/browse/HIVE-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar resolved HIVE-7097. --- Resolution: Not a Problem Thank you [~leftylev]. Marking this "Resolved/Not a problem" > The Support for REGEX Column Broken in HIVE 0.13 > > > Key: HIVE-7097 > URL: https://issues.apache.org/jira/browse/HIVE-7097 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.13.0 >Reporter: Sun Rui > > The Support for REGEX Column is OK in HIVE 0.12, but is broken in HIVE 0.13. > For example: > {code:sql} > select `key.*` from src limit 1; > {code} > will fail in HIVE 0.13 with the following error from SemanticAnalyzer: > {noformat} > FAILED: SemanticException [Error 10004]: Line 1:7 Invalid table alias or > column reference 'key.*': (possible column names are: key, value) > {noformat} > This issue is related to HIVE-6037. When set > "hive.support.quoted.identifiers=none", the issue will be gone. > I am not sure the configuration was intended to break regex column. But at > least the documentation needs to be updated: > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-REGEXColumnSpecification > I would argue backward compatibility is more important. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-2089) Add a new input format to be able to combine multiple .gz text files
[ https://issues.apache.org/jira/browse/HIVE-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Kumar resolved HIVE-2089. --- Resolution: Won't Fix I verified [~slider]'s observation. It indeed works. Marking this JIRA as "Won't Fix" > Add a new input format to be able to combine multiple .gz text files > > > Key: HIVE-2089 > URL: https://issues.apache.org/jira/browse/HIVE-2089 > Project: Hive > Issue Type: New Feature >Reporter: He Yongqiang > Attachments: HIVE-2089.1.patch > > > For files that is not splittable, CombineHiveInputFormat won't help. This > jira is to add a new inputformat to support this feature. This is very useful > for partitions with tens of thousands of .gz files. -- This message was sent by Atlassian JIRA (v6.2#6252)