Re: [ANNOUNCE] New Hive Committer - Eugene Koifman

2014-09-15 Thread kulkarni.swar...@gmail.com
Congratulations! Nice Job!

On Mon, Sep 15, 2014 at 2:54 AM, Damien Carol  wrote:

>  Congratulations, Eugene.
>
>  Damien CAROL
>
>- tél : +33 (0)4 74 96 88 14
>- fax : +33 (0)4 74 96 31 88
>- email : dca...@blitzbs.com
>
> BLITZ BUSINESS SERVICE
>  Le 14/09/2014 09:23, Thejas Nair a écrit :
>
> Congrats Eugene!
>
>
> On Sat, Sep 13, 2014 at 8:26 AM, Ted Yu  
>  wrote:
>
>  Congratulations, Eugene.
>
>
>


-- 
Swarnim


Re: [ANNOUNCE] New Hive PMC Member - Sergey Shelukhin

2015-02-27 Thread kulkarni.swar...@gmail.com
Congratulations Sergey! Well deserved!

On Fri, Feb 27, 2015 at 1:51 AM, Vinod Kumar Vavilapalli <
vino...@hortonworks.com> wrote:

> Congratulations and keep up the great work!
>
> +Vinod
>
> On Feb 25, 2015, at 8:43 AM, Carl Steinbach  wrote:
>
> > I am pleased to announce that Sergey Shelukhin has been elected to the
> Hive Project Management Committee. Please join me in congratulating Sergey!
> >
> > Thanks.
> >
> > - Carl
> >
>
>


-- 
Swarnim


Re: Can anyone review dayofyear UDF (HIVE-3378)?

2015-04-09 Thread kulkarni.swar...@gmail.com
Alexander,

I reviewed your code and left a few suggestions on how to possibly simplify
it(if I understood your implementation correctly). Let me know if they
don't make sense to you.

On Wed, Apr 8, 2015 at 12:34 PM, Alexander Pivovarov 
wrote:

> https://issues.apache.org/jira/browse/HIVE-3378
>
> https://reviews.apache.org/r/32732/
>



-- 
Swarnim


Re: [ANNOUNCE] New Hive Committer - Mithun Radhakrishnan

2015-04-15 Thread kulkarni.swar...@gmail.com
Congratulations!!

On Wed, Apr 15, 2015 at 10:57 AM, Viraj Bhat 
wrote:

> Mithun Congrats!!
> Viraj
>
>   From: Carl Steinbach 
>  To: dev@hive.apache.org; u...@hive.apache.org; mit...@apache.org
>  Sent: Tuesday, April 14, 2015 2:54 PM
>  Subject: [ANNOUNCE] New Hive Committer - Mithun Radhakrishnan
>
> The Apache Hive PMC has voted to make Mithun Radhakrishnan a committer on
> the Apache Hive Project.
> Please join me in congratulating Mithun.
> Thanks.
> - Carl
>
>
>
>



-- 
Swarnim


Re: [ANNOUNCE] New Hive Committer - Alex Pivovarov

2015-05-04 Thread kulkarni.swar...@gmail.com
Congratulations Alex!!

On Thu, Apr 30, 2015 at 2:49 PM, Sergey Shelukhin 
wrote:

> Congratulations!
>
> On 15/4/29, 17:57, "Jimmy Xiang"  wrote:
>
> >Congrats!!
> >
> >On Wed, Apr 29, 2015 at 5:48 PM, Xu, Cheng A 
> wrote:
> >
> >> Congratulations Alex!
> >>
> >> -Original Message-
> >> From: Lefty Leverenz [mailto:leftylever...@gmail.com]
> >> Sent: Thursday, April 30, 2015 8:46 AM
> >> To: dev@hive.apache.org
> >> Subject: Re: [ANNOUNCE] New Hive Committer - Alex Pivovarov
> >>
> >> Congratulations Alex!
> >>
> >> -- Lefty
> >>
> >> On Wed, Apr 29, 2015 at 8:41 PM, Vaibhav Gumashta <
> >> vgumas...@hortonworks.com
> >> > wrote:
> >>
> >> > Congrats Alex!
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Wed, Apr 29, 2015 at 5:26 PM -0700, "Alexander Pivovarov" <
> >> > apivova...@gmail.com> wrote:
> >> >
> >> > Thank you Everyone!
> >> > Do you know where I can get my lightsaber?
> >> >
> >> > On Wed, Apr 29, 2015 at 1:19 PM, Thejas Nair 
> >> > wrote:
> >> >
> >> > > Congrats Alex!
> >> > >
> >> > > On Wed, Apr 29, 2015 at 12:37 PM, Jason Dere  >
> >> > > wrote:
> >> > > > Congrats Alex!
> >> > > >
> >> > > > On Apr 29, 2015, at 12:35 PM, Chao Sun 
> >> > > >  wrote:
> >> > > >
> >> > > >> Congrats Alex! Well done!
> >> > > >>
> >> > > >> On Wed, Apr 29, 2015 at 12:32 PM, Prasanth Jayachandran <
> >> > > >> pjayachand...@hortonworks.com> wrote:
> >> > > >>
> >> > > >>> Congratulations Alex!
> >> > > >>>
> >> > >  On Apr 29, 2015, at 12:17 PM, Eugene Koifman <
> >> > > ekoif...@hortonworks.com>
> >> > > >>> wrote:
> >> > > 
> >> > >  Congratulations!
> >> > > 
> >> > >  On 4/29/15, 12:14 PM, "Carl Steinbach"  wrote:
> >> > > 
> >> > > > The Apache Hive PMC has voted to make Alex Pivovarov a
> >> > > > committer on
> >> > > the
> >> > > > Apache Hive Project.
> >> > > >
> >> > > > Please join me in congratulating Alex!
> >> > > >
> >> > > > Thanks.
> >> > > >
> >> > > > - Carl
> >> > > 
> >> > > >>>
> >> > > >>>
> >> > > >>
> >> > > >>
> >> > > >> --
> >> > > >> Best,
> >> > > >> Chao
> >> > > >
> >> > >
> >> >
> >>
>
>


-- 
Swarnim


[DISCUSS] Hive/HBase Integration

2015-05-09 Thread kulkarni.swar...@gmail.com
Hello all,

So last week, Myself, Brock Noland and Nick Dimiduk got a chance to present
some of the work we have been doing in the Hive/HBase integration space at
HBaseCon 2015 (slides here[1] for anyone interested). One of the
interesting things that we noted at this conference was that even though
this was an HBase conference, *SQL on HBase* was by far the most popular
theme with talks on Apache Phoenix, Trafodion, Apache Kylin, Apache Drill
and a SQL-On-HBase panel to compare these and other technologies.

I personally feel that with the existing work, we have come a long way but
still have work to do and would need more love to make this a top-notch
feature of Hive. However I was curious to know what the community thought
about it and where do they see this integration stand in coming time when
compared with all the other upcoming techs?

Thanks,
Swarnim

[1]
https://docs.google.com/presentation/d/1K2A2NMsNbmKWuG02aUDxsLo0Lal0lhznYy8SB6HjC9U/edit#slide=id.p


JIRA notifications

2015-05-13 Thread kulkarni.swar...@gmail.com
I noticed that I haven't been getting notifications(or they are really
delayed) on any of the new JIRAs created/ comments added. Anyone else
noticing similar issues as well?

-- 
Swarnim


Re: Questions related to HBase general use

2015-05-14 Thread kulkarni.swar...@gmail.com
+ hive-dev

Thanks for your question. We recently have been busy adding quite a few
features on top on Hive/HBase Integration to make it more stable and easy
to use. We also did a talk very recently at HBaseCon 2015 showing off the
latest improvements. Slides here[1]. Like Jerry mentioned, if you run a
regular query from Hive on an HBase table with billions of rows, it is
going to be slow as it would trigger a full table scan. However, Hive has
smarts around filter pushdown where the attributes in a "where" clause are
pushed down and converted to scan ranges and filters to optimize the scan.
Plus with the recent Hive On Spark uplift, I see this integration take
benefit of that as well.

That said, we here use this integration daily over billions of rows to run
hundreds of queries without any issues. Since you mentioned that you are a
already a big consumer of Hive, I would highly recommend to give this a
spin and report back with whatever issues you face so we can work on making
this more stable.

Hope that helps.

Swarnim

[1]
https://docs.google.com/presentation/d/1K2A2NMsNbmKWuG02aUDxsLo0Lal0lhznYy8SB6HjC9U/edit#slide=id.p

On Wed, May 13, 2015 at 6:26 PM, Nick Dimiduk  wrote:

> + Swarnim, who's expert on HBase/Hive integration.
>
> Yes, snapshots may be interesting for you. I believe Hive can access HBase
> timestamps, exposed as a "virtual" column. It's assumed across there whole
> row however, not per cell.
>
> On Sun, May 10, 2015 at 9:14 PM, Jerry He  wrote:
>
>> Hi, Yong
>>
>> You have a good understanding of the benefit of HBase already.
>> Generally speaking, HBase is suitable for real time read/write to your big
>> data set.
>> Regarding the HBase performance evaluation tool, the 'read' test use HBase
>> 'get'. For 1m rows, the test would issue 1m 'get' (and RPC) to the server.
>> The 'scan' test scans the table and transfers the rows to the client in
>> batches (e.g. 100 rows at a time), which will take shorter time for the
>> whole test to complete for the same number of rows.
>> The hive/hbase integration, as you said, needs more consideration.
>> 1) The performance.  Hive access HBase via HBase client API, which
>> involves
>> going to the HBase server for all the data access. This will slow things
>> down.
>> There are a couple of things you can explore. e.g. Hive/HBase snapshot
>> integration. This would provide direct access to HBase hfiles.
>> 2) In your email, you are interested in HBase's capability of storing
>> multiple versions of data.  You need to consider if Hive supports this
>> HBase feature. i.e provide you access to multi versions. As I can
>> remember,
>> it is not fully.
>>
>> Jerry
>>
>>
>> On Thu, May 7, 2015 at 6:18 PM, java8964  wrote:
>>
>> > Hi,
>> > I am kind of new to HBase. Currently our production run IBM BigInsight
>> V3,
>> > comes with Hadoop 2.2 and HBase 0.96.0.
>> > We are mostly using HDFS and Hive/Pig for our BigData project, it works
>> > very good for our big datasets. Right now, we have a one dataset needs
>> to
>> > be loaded from Mysql, about 100G, and will have about Gs change daily.
>> This
>> > is a very important slow change dimension data, we like to sync between
>> > Mysql and BigData platform.
>> > I am thinking of using HBase to store it, instead of refreshing the
>> whole
>> > dataset in HDFS, due to:
>> > 1) HBase makes the merge the change very easy.2) HBase could store all
>> the
>> > changes in the history, as a function out of box. We will replicate all
>> the
>> > changes from the binlog level from Mysql, and we could keep all changes
>> in
>> > HBase (or long history), then it can give us some insight that cannot be
>> > done easily in HDFS.3) HBase could give us the benefit to access the
>> data
>> > by key fast, for some cases.4) HBase is available out of box.
>> > What I am not sure is the Hive/HBase integration. Hive is the top tool
>> in
>> > our environment. If one dataset stored in Hbase (even only about 100G as
>> > now), the join between it with the other Big datasets in HDFS worries
>> me. I
>> > read quite some information about Hive/HBase integration, and feel that
>> it
>> > is not really mature, as not too many usage cases I can find online,
>> > especially on performance. There are quite some JIRAs related to make
>> Hive
>> > utilize the HBase for performance in MR job are still pending.
>> > I want to know other people experience to use HBase in this way. I
>> > understand HBase is not designed as a storage system for Data Warehouse
>> > component or analytics engine. But the benefits to use HBase in this
>> case
>> > still attractive me. If my use cases of HBase is mostly read or full
>> scan
>> > the data, how bad it is compared to HDFS in the same cluster? 3x? 5x?
>> > To help me understand the read throughput of HBase, I use the HBase
>> > performance evaluation tool, but the output is quite confusing. I have 2
>> > clusters, one is with 5 nodes with 3 slaves all running on VM (Each with
>> > 24G + 4 cores, s

Re: JIRA notifications

2015-05-14 Thread kulkarni.swar...@gmail.com
Also not sure if it's related but seems like RB has been pretty sluggish
lately too for me. It takes forever for a patch to submitted and a review
request created(the latest one is still running for past 30 minutes with no
output)

On Wed, May 13, 2015 at 4:26 PM, Lefty Leverenz 
wrote:

> By the way, we still need to add iss...@hive.apache.org to the
> website's Mailing
> Lists <http://hive.apache.org/mailing_lists.html> page -- see HIVE-10124
> <https://issues.apache.org/jira/browse/HIVE-10124>.
>
> -- Lefty
>
> On Wed, May 13, 2015 at 2:16 PM, Lefty Leverenz 
> wrote:
>
> > But some notifications and comments aren't making it onto any Hive
> mailing
> > list -- see INFRA-9221 <https://issues.apache.org/jira/browse/INFRA-9221>
> (please
> > add your own comments and examples).  This means the mail archives don't
> > have a complete record of JIRA activity.
> >
> > -- Lefty
> >
> > On Wed, May 13, 2015 at 10:03 AM, Thejas Nair 
> > wrote:
> >
> >> comments now added go to iss...@hive.apache.org .
> >> emails for JIRAs created should still go to dev@
> >>
> >>
> >> On Wed, May 13, 2015 at 9:25 AM, kulkarni.swar...@gmail.com
> >>  wrote:
> >> > I noticed that I haven't been getting notifications(or they are really
> >> > delayed) on any of the new JIRAs created/ comments added. Anyone else
> >> > noticing similar issues as well?
> >> >
> >> > --
> >> > Swarnim
> >>
> >
> >
>



-- 
Swarnim


[DISCUSS] Hive API passivity

2015-05-14 Thread kulkarni.swar...@gmail.com
While reviewing some of the recent patches, I came across a few with
non-passive changes and or discussion around them. I was wondering what
kind of passivity guarantees should we provide to our consumers? I
understand that Hive API is probably not as widely used as some of its
peers in the ecosystem like HBase. But should that be something we should
start thinking on especially around user facing interfaces like UDFs,
SerDes, StorageHandlers etc? More so given that we are 1.0 now?
IMO we should avoid doing any of such changes and/or if we have to do so
with a major version bump for the next release.

Thoughts?

-- 
Swarnim


Re: JIRA notifications

2015-05-14 Thread kulkarni.swar...@gmail.com
Yeah I was having issues with both the manual method as well as with rbt.
But seems like things are back to normal now.

Thanks guys!
On May 14, 2015 12:51 PM, "Alexander Pivovarov" 
wrote:

> You can use the following command to create new review. It takes about 3-5
> sec
> $ rbt post -g yes
>
> To update the review you can run.
> $ rbt post -u -g yes
>
> On Thu, May 14, 2015 at 10:48 AM, Prasanth Jayachandran <
> pjayachand...@hortonworks.com> wrote:
>
> > @Swarnim..
> > Generating patch with git diff needs to include the full index for it to
> > be uploaded to review board. “git diff —full-index”.
> > https://code.google.com/p/reviewboard/issues/detail?id=3115
> >
> > - Prasanth
> >
> > > On May 14, 2015, at 9:14 AM, Thejas Nair 
> wrote:
> > >
> > > Now that we have moved to git, you can try using github pull request
> > instead.
> > > It also  integrates with jira.
> > > More git instructions - http://accumulo.apache.org/git.html
> > >
> > >
> > > On Thu, May 14, 2015 at 8:01 AM, kulkarni.swar...@gmail.com
> > >  wrote:
> > >> Also not sure if it's related but seems like RB has been pretty
> sluggish
> > >> lately too for me. It takes forever for a patch to submitted and a
> > review
> > >> request created(the latest one is still running for past 30 minutes
> > with no
> > >> output)
> > >>
> > >> On Wed, May 13, 2015 at 4:26 PM, Lefty Leverenz <
> > leftylever...@gmail.com>
> > >> wrote:
> > >>
> > >>> By the way, we still need to add iss...@hive.apache.org to the
> > >>> website's Mailing
> > >>> Lists <http://hive.apache.org/mailing_lists.html> page -- see
> > HIVE-10124
> > >>> <https://issues.apache.org/jira/browse/HIVE-10124>.
> > >>>
> > >>> -- Lefty
> > >>>
> > >>> On Wed, May 13, 2015 at 2:16 PM, Lefty Leverenz <
> > leftylever...@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> But some notifications and comments aren't making it onto any Hive
> > >>> mailing
> > >>>> list -- see INFRA-9221 <
> > https://issues.apache.org/jira/browse/INFRA-9221>
> > >>> (please
> > >>>> add your own comments and examples).  This means the mail archives
> > don't
> > >>>> have a complete record of JIRA activity.
> > >>>>
> > >>>> -- Lefty
> > >>>>
> > >>>> On Wed, May 13, 2015 at 10:03 AM, Thejas Nair <
> thejas.n...@gmail.com>
> > >>>> wrote:
> > >>>>
> > >>>>> comments now added go to iss...@hive.apache.org .
> > >>>>> emails for JIRAs created should still go to dev@
> > >>>>>
> > >>>>>
> > >>>>> On Wed, May 13, 2015 at 9:25 AM, kulkarni.swar...@gmail.com
> > >>>>>  wrote:
> > >>>>>> I noticed that I haven't been getting notifications(or they are
> > really
> > >>>>>> delayed) on any of the new JIRAs created/ comments added. Anyone
> > else
> > >>>>>> noticing similar issues as well?
> > >>>>>>
> > >>>>>> --
> > >>>>>> Swarnim
> > >>>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> Swarnim
> >
> >
>


Re: [ANNOUNCE] New Hive Committer - Chaoyu Tang

2015-05-21 Thread kulkarni.swar...@gmail.com
Congrats Chaoyu!

On Thu, May 21, 2015 at 9:17 AM, Sergio Pena 
wrote:

> Congratulations Chaoyu !!!
>
> On Wed, May 20, 2015 at 5:29 PM, Carl Steinbach  wrote:
>
> > The Apache Hive PMC has voted to make Chaoyu Tang a committer on the
> Apache
> > Hive Project.
> >
> > Please join me in congratulating Chaoyu!
> >
> > Thanks.
> >
> > - Carl
> >
>



-- 
Swarnim


Re: [DISCUSS] Supporting Hadoop-1 and experimental features

2015-05-22 Thread kulkarni.swar...@gmail.com
+1 on the new proposal. Feedback below:

> New features must be put into master.  Whether to put them into branch-1
is at the discretion of the developer.

How about we change this to "*All* features must be put into master.
Whether to put them into branch-1 is at the discretion of the *committer*."
The reason I think is going forward for us to sustain as a happy and
healthy community, it's imperative for us to make it not only easy for the
users, but also for developers and committers to contribute/commit patches.
To me being a hive contributor would be hard to determine which branch my
code belongs. Also IMO(and I might be wrong) but many committers have their
own areas of expertise and it's also very hard for them to immediately
determine what branch a patch should go to unless very well documented
somewhere. Putting all code into the master would be an easy approach to
follow and then cherry picking to other branches can be done. So even if
people forget to do that, we can always go back to master and port the
patches out to these branches. So we have a master branch, a branch-1 for
stable code, branch-2 for experimental and "bleeding edge" code and so on.
Once branch-2 is stable, we deprecate branch-1, create branch-3 and move on.

Another reason I say this is because in my experience, a pretty significant
amount of work is hive is still bug fixes and I think that is what the user
cares most about(correctness above anything else). So with this approach,
might be very obvious to what branches to commit this to.

On Fri, May 22, 2015 at 1:11 PM, Alan Gates  wrote:

> Thanks for your feedback Chris.  It sounds like there are a couple of
> reasonable concerns being voiced repeatedly:
> 1) Fragmentation, the two branches will drift too far apart.
> 2) Stagnation, branch-1 will effectively become a dead-end.
>
> So I modify the proposal as follows to deal with those:
>
> 1) New features must be put into master.  Whether to put them into
> branch-1 is at the discretion of the developer.  The exception would be
> features that would not apply in master (e.g. say someone developed a way
> to double the speed of map reduce jobs Hive produces).  For example, I
> might choose to put the materialized view work I'm doing in both branch-1
> and master, but the HBase metastore work only in master.  This should avoid
> fragmentation by keeping branch-1 a subset of master.
>
> 2) For the next 12 months we will port critical bug fixes (crashes,
> security issues, wrong results) to branch-1 as well as fixing them on
> master.  We might choose to lengthen this time depending on how stable
> master is and how fast the uptake is.  This avoids branch-1 being
> immediately abandoned by developers while users are still depending on it.
>
> Alan.
>
>   Chris Drome 
>  May 22, 2015 at 0:49
> I understand the motivation and benefits of creating a branch-2 where more
> disruptive work can go on without affecting branch-1. While not necessarily
> against this approach, from Yahoo's standpoint, I do have some questions
> (concerns).
> Upgrading to a new version of Hive requires a significant commitment of
> time and resources to stabilize and certify a build for deployment to our
> clusters. Given the size of our clusters and scale of datasets, we have to
> be particularly careful about adopting new functionality. However, at the
> same time we are interested in new testing and making available new
> features and functionality. That said, we would have to rely on branch-1
> for the immediate future.
> One concern is that branch-1 would be left to stagnate, at which point
> there would be no option but for users to move to branch-2 as branch-1
> would be effectively end-of-lifed. I'm not sure how long this would take,
> but it would eventually happen as a direct result of the very reason for
> creating branch-2.
> A related concern is how disruptive the code changes will be in branch-2.
> I imagine that changes in early in branch-2 will be easy to backport to
> branch-1, while this effort will become more difficult, if not impractical,
> as time goes. If the code bases diverge too much then this could lead to
> more pressure for users of branch-1 to add features just to branch-1, which
> has been mentioned as undesirable. By the same token, backporting any code
> in branch-2 will require an increasing amount of effort, which contributors
> to branch-2 may not be interested in committing to.
> These questions affect us directly because, while we require a certain
> amount of stability, we also like to pull in new functionality that will be
> of value to our users. For example, our current 0.13 release is probably
> closer to 0.14 at this point. Given the lifespan of a release, it is often
> more palatable to backport features and bugfixes than to jump to a new
> version.
>
> The good thing about this proposal is the opportunity to evaluate and
> clean up alot of the old code.
> Thanks,
> chris
>
>
>
> On Monday, May 18, 2015 11:48 AM,

Re: hbase column without prefix

2015-07-23 Thread kulkarni.swar...@gmail.com
Hey,

Just so that I understand your issue better, why do you think it should be

key: one, value: 0.5
key: two: value: 0.5

instead of

key: tag_one, value: 0.5
key: tag_two, value: 0.5

when you know that the prefixes for your columns are tag_. Hive won't
really do anything but simply pull all the columns that start with the
given prefix and add them to the key for your map which is exactly what you
are seeing here.


On Wed, Jul 22, 2015 at 10:03 AM, Wojciech Indyk 
wrote:

> Hi!
> I've created an issue https://issues.apache.org/jira/browse/HIVE-11329
> and need an advice is it a bug or should it be a new feature, e.g. a
> flag to enable somewhere in a table definition?
> I am eager to create a patch, however I need some help with design a
> work to do (e.g. which modules affect this thing).
>
> Kindly regards
> Wojciech Indyk
>



-- 
Swarnim


Re: hbase column without prefix

2015-07-23 Thread kulkarni.swar...@gmail.com
So let me ask you this. If we did not have the support for pulling data via
prefixes, there would be two options for us to pull this data. One, wither
we provide just the column family name like "fam:" and let hive pull
everything under that column family and stuff it in a map with key being
the column name. Or, the other option would be to provide the column names
individually. In either case, the column prefixes would end up in the hive
column name. My intend behind adding this support was to have a shortcut
way which was an extension of the existing support to pull all columns by
providing a "family_name:" to pulling just the columns that start with
given prefix. Everything else should stay same and consistent. That said, I
am ok with adding a flag to hide the prefix in the column name, IMO it
would be confusing for someone to understand why in this particular case
the prefix needs to be hidden vs not in any other case.

Does that make sense?

On Thu, Jul 23, 2015 at 9:46 AM, Wojciech Indyk 
wrote:

> Hello!
>
> Yes, but if I define a map prefix "tag_" I don't want to receive the
> prefix for each element of the map. I know what the prefix for the map
> is. It is hard to join such data with another structures which doesn't
> have prefixes. All in all it's easier to integrate data without
> prefixes. IMO Prefixes are artificial structure (like 'super-column')
> to optimize queries and be able to store a map in hbase. That's why i
> want to cut prefixes.
>
> What do you think about it? Does it make sense for you? Even if it's
> not a bug it would be nice to have option to hide prefixes in keys of
> map.
>
> Kindly regards
> Wojciech Indyk
>
>
> 2015-07-23 16:32 GMT+02:00 kulkarni.swar...@gmail.com
> :
> > Hey,
> >
> > Just so that I understand your issue better, why do you think it should
> be
> >
> > key: one, value: 0.5
> > key: two: value: 0.5
> >
> > instead of
> >
> > key: tag_one, value: 0.5
> > key: tag_two, value: 0.5
> >
> > when you know that the prefixes for your columns are tag_. Hive won't
> > really do anything but simply pull all the columns that start with the
> > given prefix and add them to the key for your map which is exactly what
> you
> > are seeing here.
> >
> >
> > On Wed, Jul 22, 2015 at 10:03 AM, Wojciech Indyk <
> wojciechin...@gmail.com>
> > wrote:
> >
> >> Hi!
> >> I've created an issue https://issues.apache.org/jira/browse/HIVE-11329
> >> and need an advice is it a bug or should it be a new feature, e.g. a
> >> flag to enable somewhere in a table definition?
> >> I am eager to create a patch, however I need some help with design a
> >> work to do (e.g. which modules affect this thing).
> >>
> >> Kindly regards
> >> Wojciech Indyk
> >>
> >
> >
> >
> > --
> > Swarnim
>



-- 
Swarnim


Re: [ANNOUNCE] New Hive PMC Member - Sushanth Sowmyan

2015-07-23 Thread kulkarni.swar...@gmail.com
Congrats Sushanth!

On Thu, Jul 23, 2015 at 3:40 PM, Eugene Koifman 
wrote:

> Congratulations!
>
> On 7/22/15, 9:45 AM, "Carl Steinbach"  wrote:
>
> >I am pleased to announce that Sushanth Sowmyan has been elected to the
> >Hive
> >Project Management Committee. Please join me in congratulating Sushanth!
> >
> >Thanks.
> >
> >- Carl
>
>


-- 
Swarnim


Re: Hive column mapping to hbase

2015-08-05 Thread kulkarni.swar...@gmail.com
Sunile,

Starting hive 0.12, you can use prefixes to pull the columns corresponding
to the column families. So in your case as long as you have sensible
prefixes, for example for everything in july, if you use "july-DATE-speed",
then you can simply do something like WITH SERDEPROPERTIES
('hbase.columns.mapping' = ':key,columnfamily:july.*') and it will pull
everything automatically related to the july column. In that way you don't
have to define things statically.

Hope that helps.

[1] https://issues.apache.org/jira/browse/HIVE-3725

On Tue, Aug 4, 2015 at 9:06 PM, Manjee, Sunile 
wrote:

>
> I would appreciate any assistance.
>
> Hive forces me to predefine column mappings :  WITH SERDEPROPERTIES
> ('hbase.columns.mapping' = ':key,columnfamily:july8-speed')
>
> I need to create columns in hbase based on dates (which are values in my
> source) and append some other field like measurement. This will then make
> up my column name i.e. July8-speed. Predefining these seem senseless as I
> do not know which dates and/or measurements I will get from source data.
> Hive forces me to create a static mapping like I have shown above.
>
> Any assistance or insights would be appreciated.
>



-- 
Swarnim


[DISCUSS] Hive and HBase dependency

2015-08-12 Thread kulkarni.swar...@gmail.com
Hi all,

It seems like our current dependency on HBase is a little fuzzy to say the
least. And with increased features relying on HBase(HBase integration,
HBase metastore etc), I think it would be worth giving a thought into how
we want to manage this dependency. I have also seen regressions[1][2] come
up recently due to this dependency not managed properly. Plus we need to
think about moving to HBase 1.0 soon as well to take advantage of the
backwards compatibility guarantees that HBase is providing.

Our current HBase dependency is 0.98.9. Also with out current bifurcation
of branches to create a 1.x branch for stability and 2.x for bleeding edge,
I propose that we still keep the version to 0.98.9 on the 1.x branch and
move to HBase 1.0 in our 2.0 branch. In that way we can start taking
advantage of the latest updates to the HBase API in our 2.x branch and
still keep 1.x backwards compatible by avoiding a direct jump to HBase 1.0.
If we decide to go this route, we might need to revert back some of the
compatibility breaking changes[2] that sneaked into 1.x and move them over
to 2.x.

Thoughts?

Thanks,
Swarnim


[1] https://issues.apache.org/jira/browse/HIVE-10990
[2] https://issues.apache.org/jira/browse/HIVE-8898


Re: [DISCUSS] Hive and HBase dependency

2015-08-14 Thread kulkarni.swar...@gmail.com
Thanks Alan. I created [1] to revert the non-passive changes from 1.x.

Out of curiosity, what are your plans on merging the metastore branch to
master? It seems like some coordination might be needed as some of the
stuff in the hive hbase integration might need some massaging before that
is done.

[1] https://issues.apache.org/jira/browse/HIVE-11559

On Thu, Aug 13, 2015 at 12:52 PM, Alan Gates  wrote:

> On the hbase-metastore branch I've actually already moved to HBase 1.1.
> I'm +1 for moving to 1.1 or 1.0 on master and staying at 0.98 on branch-1.
>
> Alan.
>
> kulkarni.swar...@gmail.com
> August 12, 2015 at 8:43
> Hi all,
>
> It seems like our current dependency on HBase is a little fuzzy to say the
> least. And with increased features relying on HBase(HBase integration,
> HBase metastore etc), I think it would be worth giving a thought into how
> we want to manage this dependency. I have also seen regressions[1][2] come
> up recently due to this dependency not managed properly. Plus we need to
> think about moving to HBase 1.0 soon as well to take advantage of the
> backwards compatibility guarantees that HBase is providing.
>
> Our current HBase dependency is 0.98.9. Also with out current bifurcation
> of branches to create a 1.x branch for stability and 2.x for bleeding edge,
> I propose that we still keep the version to 0.98.9 on the 1.x branch and
> move to HBase 1.0 in our 2.0 branch. In that way we can start taking
> advantage of the latest updates to the HBase API in our 2.x branch and
> still keep 1.x backwards compatible by avoiding a direct jump to HBase 1.0.
> If we decide to go this route, we might need to revert back some of the
> compatibility breaking changes[2] that sneaked into 1.x and move them over
> to 2.x.
>
> Thoughts?
>
> Thanks,
> Swarnim
>
>
> [1] https://issues.apache.org/jira/browse/HIVE-10990
> [2] https://issues.apache.org/jira/browse/HIVE-8898
>
>


-- 
Swarnim


Re: [DISCUSS] Hive and HBase dependency

2015-08-14 Thread kulkarni.swar...@gmail.com
Yeah. I don't think HBase 1.0 vs 1.1 should really make a difference. Just
issues like [1] had me concerned that something within the existing
codebase might not work really well with 1.0.

[1] https://issues.apache.org/jira/browse/HIVE-10990

On Fri, Aug 14, 2015 at 1:47 PM, Alan Gates  wrote:

> My hope is to have it ready to merge into master soon (like in a few
> weeks). I don't think it will affect anything in the hive hbase integration
> other than we need to make sure we can work with the same version of
> hbase.  If we needed to move back to HBase 1.0 for that I think that would
> be ok.
>
> Alan.
>
> kulkarni.swar...@gmail.com
> August 14, 2015 at 11:12
> Thanks Alan. I created [1] to revert the non-passive changes from 1.x.
>
> Out of curiosity, what are your plans on merging the metastore branch to
> master? It seems like some coordination might be needed as some of the
> stuff in the hive hbase integration might need some massaging before that
> is done.
>
> [1] https://issues.apache.org/jira/browse/HIVE-11559
>
>
>
>
> --
> Swarnim
> Alan Gates 
> August 13, 2015 at 10:52
> On the hbase-metastore branch I've actually already moved to HBase 1.1.
> I'm +1 for moving to 1.1 or 1.0 on master and staying at 0.98 on branch-1.
>
> Alan.
>
> kulkarni.swar...@gmail.com
> August 12, 2015 at 8:43
> Hi all,
>
> It seems like our current dependency on HBase is a little fuzzy to say the
> least. And with increased features relying on HBase(HBase integration,
> HBase metastore etc), I think it would be worth giving a thought into how
> we want to manage this dependency. I have also seen regressions[1][2] come
> up recently due to this dependency not managed properly. Plus we need to
> think about moving to HBase 1.0 soon as well to take advantage of the
> backwards compatibility guarantees that HBase is providing.
>
> Our current HBase dependency is 0.98.9. Also with out current bifurcation
> of branches to create a 1.x branch for stability and 2.x for bleeding edge,
> I propose that we still keep the version to 0.98.9 on the 1.x branch and
> move to HBase 1.0 in our 2.0 branch. In that way we can start taking
> advantage of the latest updates to the HBase API in our 2.x branch and
> still keep 1.x backwards compatible by avoiding a direct jump to HBase 1.0.
> If we decide to go this route, we might need to revert back some of the
> compatibility breaking changes[2] that sneaked into 1.x and move them over
> to 2.x.
>
> Thoughts?
>
> Thanks,
> Swarnim
>
>
> [1] https://issues.apache.org/jira/browse/HIVE-10990
> [2] https://issues.apache.org/jira/browse/HIVE-8898
>
>


-- 
Swarnim


Re: hiveserver2 hangs

2015-08-20 Thread kulkarni.swar...@gmail.com
Sanjeev,

Can you tell me more details about your hive version/hadoop version etc.

On Wed, Aug 19, 2015 at 1:35 PM, Sanjeev Verma 
wrote:

> Can somebody gives me some pointer to looked upon?
>
> On Wed, Aug 19, 2015 at 9:26 AM, Sanjeev Verma 
> wrote:
>
>> Hi
>> We are experiencing a strange problem with the hiveserver2, in one of the
>> job it gets the GC limit exceed from mapred task and hangs even having
>> enough heap available.we are not able to identify what causing this issue.
>> Could anybody help me identify the issue and let me know what pointers I
>> need to looked up.
>>
>> Thanks
>>
>
>


-- 
Swarnim


Re: hiveserver2 hangs

2015-08-20 Thread kulkarni.swar...@gmail.com
Sanjeev,

One possibility is that you are running into[1] which affects hive 0.13. Is
it possible for you to apply the patch on [1] and see if it fixes your
problem?

[1] https://issues.apache.org/jira/browse/HIVE-10410

On Thu, Aug 20, 2015 at 6:12 PM, Sanjeev Verma 
wrote:

> We are using hive-0.13 with hadoop1.
>
> On Thu, Aug 20, 2015 at 11:49 AM, kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com> wrote:
>
>> Sanjeev,
>>
>> Can you tell me more details about your hive version/hadoop version etc.
>>
>> On Wed, Aug 19, 2015 at 1:35 PM, Sanjeev Verma > > wrote:
>>
>>> Can somebody gives me some pointer to looked upon?
>>>
>>> On Wed, Aug 19, 2015 at 9:26 AM, Sanjeev Verma <
>>> sanjeev.verm...@gmail.com> wrote:
>>>
>>>> Hi
>>>> We are experiencing a strange problem with the hiveserver2, in one of
>>>> the job it gets the GC limit exceed from mapred task and hangs even having
>>>> enough heap available.we are not able to identify what causing this issue.
>>>> Could anybody help me identify the issue and let me know what pointers
>>>> I need to looked up.
>>>>
>>>> Thanks
>>>>
>>>
>>>
>>
>>
>> --
>> Swarnim
>>
>
>


-- 
Swarnim


Patches needing review love

2015-08-21 Thread kulkarni.swar...@gmail.com
Hey all,

I have couple of patches currently review state which are either ready to
merge or need review. If I can have someone help me out with these, I would
really appreciate that.

HIVE-11513 (Ready to merge)
HIVE-5277 (Ready to merge)
HIVE-11559 (Needs review)
HIVE-11469 (Needs review)

Swarnim


Re: [ANNOUNCE] New Hive Committer - Lars Francke

2015-09-07 Thread kulkarni.swar...@gmail.com
Congrats!

On Mon, Sep 7, 2015 at 3:54 AM, Carl Steinbach  wrote:

> The Apache Hive PMC has voted to make Lars Francke a committer on the
> Apache Hive Project.
>
> Please join me in congratulating Lars!
>
> Thanks.
>
> - Carl
>
>


-- 
Swarnim


Re: hiveserver2 hangs

2015-09-08 Thread kulkarni.swar...@gmail.com
How much memory have you currently provided to HS2? Have you tried bumping
that up?

On Mon, Sep 7, 2015 at 1:09 AM, Sanjeev Verma 
wrote:

> *I am getting the following exception when the HS2 is crashing, any idea
> why it has happening*
>
> "pool-1-thread-121" prio=4 tid=19283 RUNNABLE
> at java.lang.OutOfMemoryError.(OutOfMemoryError.java:48)
> at java.util.Arrays.copyOf(Arrays.java:2271)
> Local Variable: byte[]#1
> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
> at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutput
> Stream.java:93)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
> Local Variable: org.apache.thrift.TByteArrayOutputStream#42
> Local Variable: byte[]#5378
> at org.apache.thrift.transport.TSaslTransport.write(TSaslTransp
> ort.java:446)
> at org.apache.thrift.transport.TSaslServerTransport.write(TSasl
> ServerTransport.java:41)
> at org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryP
> rotocol.java:163)
> at org.apache.thrift.protocol.TBinaryProtocol.writeString(TBina
> ryProtocol.java:186)
> Local Variable: byte[]#2
> at org.apache.hive.service.cli.thrift.TStringColumn$TStringColu
> mnStandardScheme.write(TStringColumn.java:490)
> Local Variable: java.util.ArrayList$Itr#1
> at org.apache.hive.service.cli.thrift.TStringColumn$TStringColu
> mnStandardScheme.write(TStringColumn.java:433)
> Local Variable: org.apache.hive.service.cli.th
> rift.TStringColumn$TStringColumnStandardScheme#1
> at org.apache.hive.service.cli.thrift.TStringColumn.write(TStri
> ngColumn.java:371)
> at org.apache.hive.service.cli.thrift.TColumn.standardSchemeWri
> teValue(TColumn.java:381)
> Local Variable: org.apache.hive.service.cli.thrift.TColumn#504
> Local Variable: org.apache.hive.service.cli.thrift.TStringColumn#453
> at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:244)
> at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:213)
> at org.apache.thrift.TUnion.write(TUnion.java:152)
>
>
>
> On Fri, Aug 21, 2015 at 6:16 AM, kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com> wrote:
>
>> Sanjeev,
>>
>> One possibility is that you are running into[1] which affects hive 0.13.
>> Is it possible for you to apply the patch on [1] and see if it fixes your
>> problem?
>>
>> [1] https://issues.apache.org/jira/browse/HIVE-10410
>>
>> On Thu, Aug 20, 2015 at 6:12 PM, Sanjeev Verma > > wrote:
>>
>>> We are using hive-0.13 with hadoop1.
>>>
>>> On Thu, Aug 20, 2015 at 11:49 AM, kulkarni.swar...@gmail.com <
>>> kulkarni.swar...@gmail.com> wrote:
>>>
>>>> Sanjeev,
>>>>
>>>> Can you tell me more details about your hive version/hadoop version etc.
>>>>
>>>> On Wed, Aug 19, 2015 at 1:35 PM, Sanjeev Verma <
>>>> sanjeev.verm...@gmail.com> wrote:
>>>>
>>>>> Can somebody gives me some pointer to looked upon?
>>>>>
>>>>> On Wed, Aug 19, 2015 at 9:26 AM, Sanjeev Verma <
>>>>> sanjeev.verm...@gmail.com> wrote:
>>>>>
>>>>>> Hi
>>>>>> We are experiencing a strange problem with the hiveserver2, in one of
>>>>>> the job it gets the GC limit exceed from mapred task and hangs even 
>>>>>> having
>>>>>> enough heap available.we are not able to identify what causing this 
>>>>>> issue.
>>>>>> Could anybody help me identify the issue and let me know what
>>>>>> pointers I need to looked up.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Swarnim
>>>>
>>>
>>>
>>
>>
>> --
>> Swarnim
>>
>
>


-- 
Swarnim


Re: [DISCUSS] github integration

2015-09-08 Thread kulkarni.swar...@gmail.com
I personally am a big fan of pull requests which is primarily the reason
for a similar proposal that I made almost a year and half ago[1] :). I
think the consensus we reached at the time was that to move the primary
source code from svn to git(which we did) but still use patches submitted
to JIRAs to maintain a permalink to the changes and also because it's
little harder to treat a pull requests as a patch.

[1] http://qnalist.com/questions/4754349/proposal-to-switch-to-pull-requests

On Tue, Sep 8, 2015 at 5:53 PM, Owen O'Malley  wrote:

> All,
>I think we should use the github integrations that Apache infra has
> introduced. You can read about it here:
>
>
> https://blogs.apache.org/infra/entry/improved_integration_between_apache_and
>
> The big win from my point of view is that you can use github pull requests
> for doing reviews. All of the traffic from the pull request is sent to
> Apache email lists and vice versa.
>
> Thoughts?
>
>Owen
>



-- 
Swarnim


Re: hiveserver2 hangs

2015-09-08 Thread kulkarni.swar...@gmail.com
Sanjeev,

I am going off this exception in the stacktrace that you posted.

"at java.lang.OutOfMemoryError.(OutOfMemoryError.java:48)"

which def. indicates that it's not very happy memory wise. I would def.
recommend to bump up the memory and see if it helps. If not, we can debug
further from there.

On Tue, Sep 8, 2015 at 12:17 PM, Sanjeev Verma 
wrote:

> What this exception implies here? how to identify the problem here.
> Thanks
>
> On Tue, Sep 8, 2015 at 10:44 PM, Sanjeev Verma 
> wrote:
>
>> We have 8GB HS2 java heap, we have not tried any bumping.
>>
>> On Tue, Sep 8, 2015 at 8:14 PM, kulkarni.swar...@gmail.com <
>> kulkarni.swar...@gmail.com> wrote:
>>
>>> How much memory have you currently provided to HS2? Have you tried
>>> bumping that up?
>>>
>>> On Mon, Sep 7, 2015 at 1:09 AM, Sanjeev Verma >> > wrote:
>>>
>>>> *I am getting the following exception when the HS2 is crashing, any
>>>> idea why it has happening*
>>>>
>>>> "pool-1-thread-121" prio=4 tid=19283 RUNNABLE
>>>> at java.lang.OutOfMemoryError.(OutOfMemoryError.java:48)
>>>> at java.util.Arrays.copyOf(Arrays.java:2271)
>>>> Local Variable: byte[]#1
>>>> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
>>>> at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutput
>>>> Stream.java:93)
>>>> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
>>>> Local Variable: org.apache.thrift.TByteArrayOutputStream#42
>>>> Local Variable: byte[]#5378
>>>> at org.apache.thrift.transport.TSaslTransport.write(TSaslTransp
>>>> ort.java:446)
>>>> at org.apache.thrift.transport.TSaslServerTransport.write(TSasl
>>>> ServerTransport.java:41)
>>>> at org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryP
>>>> rotocol.java:163)
>>>> at org.apache.thrift.protocol.TBinaryProtocol.writeString(TBina
>>>> ryProtocol.java:186)
>>>> Local Variable: byte[]#2
>>>> at org.apache.hive.service.cli.thrift.TStringColumn$TStringColu
>>>> mnStandardScheme.write(TStringColumn.java:490)
>>>> Local Variable: java.util.ArrayList$Itr#1
>>>> at org.apache.hive.service.cli.thrift.TStringColumn$TStringColu
>>>> mnStandardScheme.write(TStringColumn.java:433)
>>>> Local Variable: org.apache.hive.service.cli.th
>>>> rift.TStringColumn$TStringColumnStandardScheme#1
>>>> at org.apache.hive.service.cli.thrift.TStringColumn.write(TStri
>>>> ngColumn.java:371)
>>>> at org.apache.hive.service.cli.thrift.TColumn.standardSchemeWri
>>>> teValue(TColumn.java:381)
>>>> Local Variable: org.apache.hive.service.cli.thrift.TColumn#504
>>>> Local Variable: org.apache.hive.service.cli.thrift.TStringColumn#453
>>>> at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:244)
>>>> at org.apache.thrift.TUnion$TUnionStandardScheme.write(TUnion.java:213)
>>>> at org.apache.thrift.TUnion.write(TUnion.java:152)
>>>>
>>>>
>>>>
>>>> On Fri, Aug 21, 2015 at 6:16 AM, kulkarni.swar...@gmail.com <
>>>> kulkarni.swar...@gmail.com> wrote:
>>>>
>>>>> Sanjeev,
>>>>>
>>>>> One possibility is that you are running into[1] which affects hive
>>>>> 0.13. Is it possible for you to apply the patch on [1] and see if it fixes
>>>>> your problem?
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/HIVE-10410
>>>>>
>>>>> On Thu, Aug 20, 2015 at 6:12 PM, Sanjeev Verma <
>>>>> sanjeev.verm...@gmail.com> wrote:
>>>>>
>>>>>> We are using hive-0.13 with hadoop1.
>>>>>>
>>>>>> On Thu, Aug 20, 2015 at 11:49 AM, kulkarni.swar...@gmail.com <
>>>>>> kulkarni.swar...@gmail.com> wrote:
>>>>>>
>>>>>>> Sanjeev,
>>>>>>>
>>>>>>> Can you tell me more details about your hive version/hadoop version
>>>>>>> etc.
>>>>>>>
>>>>>>> On Wed, Aug 19, 2015 at 1:35 PM, Sanjeev Verma <
>>>>>>> sanjeev.verm...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Can somebody gives me some pointer to looked upon?
>>>>>>>>
>>>>>>>> On Wed, Aug 19, 2015 at 9:26 AM, Sanjeev Verma <
>>>>>>>> sanjeev.verm...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi
>>>>>>>>> We are experiencing a strange problem with the hiveserver2, in one
>>>>>>>>> of the job it gets the GC limit exceed from mapred task and hangs even
>>>>>>>>> having enough heap available.we are not able to identify what causing 
>>>>>>>>> this
>>>>>>>>> issue.
>>>>>>>>> Could anybody help me identify the issue and let me know what
>>>>>>>>> pointers I need to looked up.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Swarnim
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Swarnim
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Swarnim
>>>
>>
>>
>


-- 
Swarnim


Patches needing review

2015-09-10 Thread kulkarni.swar...@gmail.com
Hello all,

I have couple of patches submitted and out for review for some time. If I
can get some help on getting them reviewed and merged, I would highly
appreciate that!

HIVE-11691 (Wiki update for developer debugging. Already one +1 from Lefty)
HIVE-11647 (HBase dependency bump to 1.1.1)
HIVE-11609 (10-100x perf improvement on HBase comp key queries)
HIVE-11590 (Log updates to AvroSerDe. Already one +1)
HIVE-11560 (Fixing a passivity issue introduced by HIVE-8898)
HIVE-10708 (Support to proactively check for avro reader/writer schema
compatibility)

Thanks again for help,
Swarnim


Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan

2015-09-17 Thread kulkarni.swar...@gmail.com
Congratulations! Well deserved!

On Thu, Sep 17, 2015 at 12:03 AM, Vikram Dixit K 
wrote:

> Congrats Ashutosh!
>
> On Wed, Sep 16, 2015 at 9:01 PM, Chetna C  wrote:
>
>> Congrats Ashutosh !
>>
>> Thanks,
>> Chetna Chaudhari
>>
>> On 17 September 2015 at 06:53, Navis Ryu  wrote:
>>
>> > Congratulations!
>> >
>> > 2015-09-17 9:35 GMT+09:00 Xu, Cheng A :
>> > > Congratulations, Ashutosh!
>> > >
>> > > -Original Message-
>> > > From: Mohammad Islam [mailto:misla...@yahoo.com.INVALID]
>> > > Sent: Thursday, September 17, 2015 8:23 AM
>> > > To: u...@hive.apache.org; Hive
>> > > Subject: Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan
>> > >
>> > > Congratulations Asutosh!
>> > >
>> > >
>> > >  On Wednesday, September 16, 2015 4:51 PM, Bright Ling <
>> > brig...@hostworks.com.au> wrote:
>> > >
>> > >
>> > >  #yiv7221259285 #yiv7221259285 -- _filtered #yiv7221259285
>> > {font-family:SimSun;panose-1:2 1 6 0 3 1 1 1 1 1;} _filtered
>> #yiv7221259285
>> > {font-family:PMingLiU;panose-1:2 2 5 0 0 0 0 0 0 0;} _filtered
>> > #yiv7221259285 {panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv7221259285
>> > {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered
>> > #yiv7221259285 {font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4 2 4;}
>> > _filtered #yiv7221259285 {panose-1:2 2 5 0 0 0 0 0 0 0;} _filtered
>> > #yiv7221259285 {panose-1:2 1 6 0 3 1 1 1 1 1;}#yiv7221259285
>> #yiv7221259285
>> > p.yiv7221259285MsoNormal, #yiv7221259285 li.yiv7221259285MsoNormal,
>> > #yiv7221259285 div.yiv7221259285MsoNormal
>> > {margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;}#yiv7221259285
>> a:link,
>> > #yiv7221259285 span.yiv7221259285MsoHyperlink
>> > {color:blue;text-decoration:underline;}#yiv7221259285 a:visited,
>> > #yiv7221259285 span.yiv7221259285MsoHyperlinkFollowed
>> > {color:purple;text-decoration:underline;}#yiv7221259285
>> > p.yiv7221259285MsoAcetate, #yiv7221259285 li.yiv7221259285MsoAcetate,
>> > #yiv7221259285 div.yiv7221259285MsoAcetate
>> > {margin:0cm;margin-bottom:.0001pt;font-size:8.0pt;}#yiv7221259285
>> > span.yiv7221259285EmailStyle17 {color:#1F497D;}#yiv7221259285
>> > span.yiv7221259285BalloonTextChar {}#yiv7221259285
>> > .yiv7221259285MsoChpDefault {font-size:10.0pt;} _filtered #yiv7221259285
>> > {margin:72.0pt 72.0pt 72.0pt 72.0pt;}#yiv7221259285
>> > div.yiv7221259285WordSection1 {}#yiv7221259285 Congratulations Asutosh!
>> >From: Sathi Chowdhury [mailto:sathi.chowdh...@lithium.com]
>> > > Sent: Thursday, 17 September 2015 8:04 AM
>> > > To: u...@hive.apache.org
>> > > Subject: Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan
>> > Congrats Asutosh!From:Sergey Shelukhin
>> > > Reply-To: "u...@hive.apache.org"
>> > > Date: Wednesday, September 16, 2015 at 2:31 PM
>> > > To: "u...@hive.apache.org"
>> > > Subject: Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan
>> > Congrats!From:Alpesh Patel 
>> > > Reply-To: "u...@hive.apache.org" 
>> > > Date: Wednesday, September 16, 2015 at 13:24
>> > > To: "u...@hive.apache.org" 
>> > > Subject: Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan
>> > Congratulations AshutoshOn Wed, Sep 16, 2015 at 1:23 PM, Pengcheng
>> > Xiong  wrote: Congratulations Ashutosh!On Wed,
>> Sep
>> > 16, 2015 at 1:17 PM, John Pullokkaran 
>> > wrote: Congrats Ashutosh!From:Vaibhav Gumashta <
>> > vgumas...@hortonworks.com>
>> > > Reply-To: "u...@hive.apache.org" 
>> > > Date: Wednesday, September 16, 2015 at 1:01 PM
>> > > To: "u...@hive.apache.org" , "
>> dev@hive.apache.org"
>> > 
>> > > Cc: Ashutosh Chauhan 
>> > > Subject: Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan
>> > Congrats Ashutosh! —VaibhavFrom:Prasanth Jayachandran <
>> > pjayachand...@hortonworks.com>
>> > > Reply-To: "u...@hive.apache.org" 
>> > > Date: Wednesday, September 16, 2015 at 12:50 PM
>> > > To: "dev@hive.apache.org" , "
>> u...@hive.apache.org"
>> > 
>> > > Cc: "dev@hive.apache.org" , Ashutosh Chauhan <
>> > hashut...@apache.org>
>> > > Subject: Re: [ANNOUNCE] New Hive PMC Chair - Ashutosh Chauhan
>> > Congratulations Ashutosh!
>> > >
>> > >  On Wed, Sep 16, 2015 at 12:48 PM -0700, "Xuefu Zhang" <
>> > xzh...@cloudera.com> wrote: Congratulations, Ashutosh!. Well-deserved.
>> > >
>> > > Thanks to Carl also for the hard work in the past few years!
>> > >
>> > > --Xuefu
>> > >
>> > > On Wed, Sep 16, 2015 at 12:39 PM, Carl Steinbach 
>> wrote:
>> > >
>> > >> I am very happy to announce that Ashutosh Chauhan is taking over as
>> > >> the new VP of the Apache Hive project. Ashutosh has been a longtime
>> > >> contributor to Hive and has played a pivotal role in many of the
>> major
>> > >> advances that have been made over the past couple of years. Please
>> > >> join me in congratulating Ashutosh on his new role!
>> > >>
>> > >
>> > >
>> >
>>
>
>
>
> --
> Nothing better than when appreciated for hard work.
> -Mark
>



-- 
Swarnim


Re: Derby version used by Hive

2015-09-28 Thread kulkarni.swar...@gmail.com
Richard,

A quick eye-balling of the code doesn't show anything that could
potentially be a blocker for this upgrade. Also +1 on staying on the latest
and greatest. Please feel free to open up a JIRA and submit the patch.

Also just out of curiosity, what are you really using a derby backed store
for?

On Mon, Sep 28, 2015 at 11:02 AM, Richard Hillegas 
wrote:

>
>
> I haven't received a response to the following message, which I posted last
> week. Maybe my message rambled too much. Here is an attempt to pose my
> question more succinctly:
>
> Q: Does anyone know of any reason why we can't upgrade Hive's Derby version
> to 10.12.1.1, the new version being vetted by the Derby community right
> now?
>
> Thanks,
> -Rick
>
> > I am following the Hive build instructions here:
> >
>
> https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-InstallationandConfiguration
> > .
> >
> > I noticed that Hive development seems to be using an old version of
> Derby:
> > 10.10.2.0. Is there some defect in the most recent Derby version
> > (10.11.1.1) which prevents Hive from upgrading to 10.11.1.1? The only
> > Hive-tagged Derby bug which I can find is
> > https://issues.apache.org/jira/browse/DERBY-6358. That issue doesn't
> seem
> > to be version-specific and it mentions a resolved Hive issue:
> > https://issues.apache.org/jira/browse/HIVE-8739.
> >
> > Staying with 10.10.2.0 makes sense if you need to run on some ancient
> JVMs:
> > Java SE 5 or Java ME CDC/Foundation Profile 1.1. Hadoop, however,
> requires
> > at least Java 6 according to
> > https://wiki.apache.org/hadoop/HadoopJavaVersions.
> >
> > Note that the Derby community expects to release version 10.12.1.1 soon:
> > https://wiki.apache.org/db-derby/DerbyTenTwelveOneRelease. This might be
> a
> > good opportunity for Hive to upgrade to a more capable version of Derby.
> >
> > I mention this because the Derby version used by Hive ends up on the
> > classpath used by downstream projects (like Spark). That makes it awkward
> > for downstream projects to use more current Derby versions. Do you know
> of
> > any reason that downstream projects shouldn't override the Derby version
> > currently preferred by Hive?
> >
> > Thanks,
> > -Rick
>



-- 
Swarnim


Re: Avro column type in Hive

2015-09-28 Thread kulkarni.swar...@gmail.com
Sergey,

Is your table a partitioned or a non-partitioned one? I have usually seen
this problem manifest itself for partitioned tables and that is mostly
where the pruning bites. So if you now try to add a partition to this
table, you might see an exception like:

java.sql.BatchUpdateException: Data truncation: Data too long for column
'TYPE_NAME' at row 1)

The "TYPE_NAME" is not actually a definition of the Avro schema.  Instead,
it is a definition of the type structure in Hive terms.  I assume it is
used for things such as validating the query before it is executed, etc.

On Mon, Sep 28, 2015 at 7:38 PM, Chaoyu Tang  wrote:

> Yes, when you described the avro table, what you get back was actually from
> your avro schema instead of database table. The avro table is NOT
> considered as a metastore backed SerDe. But that it has its columns
> populated to DB (e.g. HIVE-6308
> ) is mainly for column
> statistics purpose, which obviously is not applicable to your case which
> has a type name > 100kb.
>
> Chaoyu
>
> On Mon, Sep 28, 2015 at 8:12 PM, Sergey Shelukhin 
> wrote:
>
> > Hi.
> > I noticed that when I create an Avro table using a very large schema
> file,
> > mysql metastore silently truncates the TYPE_NAME in COLUMNS_V2 table to
> > the size of varchar (4000); however, when I do describe on the table, it
> > still displays the whole type name (around 100Kb long) that I presume it
> > gets from deserializer.
> > Is the value in TYPE_NAME used for anything for Avro tables?
> >
> >
>



-- 
Swarnim


Re: Hive join does not execute

2012-05-10 Thread kulkarni.swar...@gmail.com
It looks more like a permissions problem to me. Just make sure that
whatever directories hadoop is writing to are owned by hadoop itself.

Also it looks a little weird to me that it is using the
"RawLocalFileSystem" instead of the "DistributedFileSystem". You might want
to look at "fs.default.name" property in core-site.xml and see if it is
pointing to your HDFS location.

Hope that helps.

On Thu, May 10, 2012 at 11:29 AM, Mahsa Mofidpoor wrote:

> Hi,
>
> When I want to join two tables, I receive the following error:
>
> 12/05/10 12:03:31 WARN conf.HiveConf: hive-site.xml not found on CLASSPATH
> WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please
> use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties
> files.
> Execution log at:
> /tmp/umroot/umroot_20120510120303_4d0145bb-27fa-4d4a-8cbc-95d8353fccaf.log
> ENOENT: No such file or directory
> at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
>  at org.apache.hadoop.fs.FileUtil.execSetPermission(FileUtil.java:692)
> at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:647)
>  at
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
>  at
> org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
> at
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
>  at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
>  at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:435)
>  at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:693)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Job Submission failed with exception
> 'org.apache.hadoop.io.nativeio.NativeIOException(No such file or directory)'
> Execution failed with exit status: 2
> Obtaining error information
>
> Task failed!
> Task ID:
>   Stage-1
>
> Logs:
>
> /tmp/umroot/hive.log
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
>
>
> I use hadoop-0.20.2 (single-node setup) and I have build Hive through  the
> latest source code.
>
>
> Thank you in advance  for your help,
>
> Mahsa
>



-- 
Swarnim


getStructFieldData method on StructObjectInspector

2012-05-25 Thread kulkarni.swar...@gmail.com
I am trying to write a custom ObjectInspector extending the
StructObjectInspector and got a little confused about the use of the
getStructFieldData method on the inspector. Looking at the definition of
the method:

public Object getStructFieldData(Object data, StructField fieldRef);

I understand that the use of this method is to retrieve the specific given
field from the buffer. However, what I don't understand is what is it
expected to return. I looked around the tests and related code and mostly
stuff returned was either a LazyPrimitive or a LazyNonPrimitive, but I
couldn't find anything that enforces this(specially given that the return
type is a plain "Object")! Does this mean that I am free to return even my
custom object as a return type of this method? If so, what is the guarantee
that it will be interpreted correctly down the pipeline?

Thanks,
-- 
Swarnim


Re: Developing Hive UDF in eclipse

2012-06-05 Thread kulkarni.swar...@gmail.com
Did you try this[1]? It had got me most of my way through the process.

[1] https://cwiki.apache.org/Hive/gettingstarted-eclipsesetup.html

On Tue, Jun 5, 2012 at 8:49 AM, Arun Prakash wrote:

> Hi Friends,
> I tried to develop udf for hive but i am getting package import error
> in eclipse.
>
> import org.apache.hadoop.hive.ql.exec.UDF;
>
>
> How to import hive package in eclipse?
>
>
> Any inputs much appreciated.
>
>
>
> Best Regards
>  Arun Prakash C.K
>
> Keep On Sharing Your Knowledge with Others
>



-- 
Swarnim


Re: Casting exception while converting from "LazyDouble" to "LazyString"

2012-07-10 Thread kulkarni.swar...@gmail.com
Hi Kanna,

This might just mean that in your query you are having a STRING type for a
field which is actually a DOUBLE.

On Tue, Jul 10, 2012 at 3:05 PM, Kanna Karanam wrote:

>  Has anyone seen this error before? Am I missing anything here?
>
> ** **
>
> 2012-07-10 11:11:02,203 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201207091248_0107_m_00_0:
> java.lang.RuntimeException:
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
> processing row {"name":"zach johnson","age":77,"gpa":3.27}
>
> at
> org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)
>
> at
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>
> at
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
>
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)*
> ***
>
> at org.apache.hadoop.mapred.Child$4.run(Child.java:271)***
> *
>
> at java.security.AccessController.doPrivileged(Native
> Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:396)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1124)
> 
>
> at org.apache.hadoop.mapred.Child.main(Child.java:265)
>
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
> Error while processing row {"name":"zach johnson","age":77,"gpa":3.27}
>
> at
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)**
> **
>
> at
> org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
>
> ... 8 more
>
> Caused by: java.lang.ClassCastException:
> org.apache.hadoop.hive.serde2.lazy.LazyDouble cannot be cast to
> org.apache.hadoop.hive.serde2.lazy.LazyString
>
> at
> org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector.getPrimitiveWritableObject(LazyStringObjectInspector.java:47)
> 
>
> at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:351)
> 
>
> at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:255)
> 
>
> at
> org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:202)
> 
>
> at
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:236)
> 
>
> at
> org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
>
> at
> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
>
> at
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
> 
>
> at
> org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
>
> at
> org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
>
> at
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531)**
> **
>
> ** **
>
> ** **
>
> Thanks,
>
> Kanna
>



-- 
Swarnim


Re: Custom UserDefinedFunction in Hive

2012-08-07 Thread kulkarni.swar...@gmail.com
Have you tried using EXPLAIN[1] on your query? I usually like to use that
to get a better understanding of what my query is actually doing and
debugging at other times.

[1] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain

On Tue, Aug 7, 2012 at 12:20 PM, Raihan Jamal  wrote:

> Hi Jan,
>
>
> I figured that out, it is working fine for me now. The only question I
> have is, if I am doing like this-
>
>
>
> SELECT * FROM REALTIME where dt= yesterdaydate('MMdd') LIMIT 10;
>
>
>
> Then the above query will be evaluated as below right?
>
>
>
> SELECT * FROM REALTIME where dt= ‘20120806’ LIMIT 10;
>
>
>
> So that means it will look for data in the corresponding dt partition 
> *(20120806)
> *only right as above table is partitioned on dt column ? And it will not
> scan the whole table right?**
>
>
>
> *Raihan Jamal*
>
>
>
> On Mon, Aug 6, 2012 at 10:56 PM, Jan Dolinár  wrote:
>
>> Hi Jamal,
>>
>> Check if the function really returns what it should and that your data
>> are really in MMdd format. You can do this by simple query like this:
>>
>> SELECT dt, yesterdaydate('MMdd') FROM REALTIME LIMIT 1;
>>
>> I don't see anything wrong with the function itself, it works well for me
>> (although I tested it in hive 0.7.1). The only thing I would change about
>> it would be to optimize it by calling 'new' only at the time of
>> construction and reusing the object when the function is called, but that
>> should not affect the functionality at all.
>>
>> Best regards,
>> Jan
>>
>>
>>
>>
>> On Tue, Aug 7, 2012 at 3:39 AM, Raihan Jamal wrote:
>>
>>> *Problem*
>>>
>>> I created the below UserDefinedFunction to get the yesterday's day in
>>> the format I wanted as I will be passing the format into this below method
>>> from the query.
>>>
>>>
>>>
>>> *public final class YesterdayDate extends UDF {*
>>>
>>> * *
>>>
>>> *public String evaluate(final String format) { *
>>>
>>> *DateFormat dateFormat = new
>>> SimpleDateFormat(format); *
>>>
>>> *Calendar cal = Calendar.getInstance();*
>>>
>>> *cal.add(Calendar.DATE, -1); *
>>>
>>> *return
>>> dateFormat.format(cal.getTime()).toString(); *
>>>
>>> *} *
>>>
>>> *}*
>>>
>>>
>>>
>>>
>>>
>>> So whenever I try to run the query like below by adding the jar to
>>> classpath and creating the temporary function yesterdaydate, I always get
>>> zero result back-
>>>
>>>
>>>
>>> hive> create temporary function *yesterdaydate* as
>>> 'com.example.hive.udf.YesterdayDate';
>>>
>>> OK
>>>
>>> Time taken: 0.512 seconds
>>>
>>>
>>>
>>> Below is the query I am running-
>>>
>>>
>>>
>>> *hive> SELECT * FROM REALTIME where dt= yesterdaydate('MMdd') LIMIT
>>> 10;*
>>>
>>> *OK*
>>>
>>> * *
>>>
>>> And I always get zero result back but the data is there in that table
>>> for Aug 5th.**
>>>
>>>
>>>
>>> What wrong I am doing? Any suggestions will be appreciated.
>>>
>>>
>>>
>>>
>>>
>>> NOTE:- As I am working with Hive 0.6 so it doesn’t support variable
>>> substitution thing, so I cannot use hiveconf here and the above table has
>>> been partitioned on dt(date) column.**
>>>
>>
>>
>


-- 
Swarnim


Re: Some Weird Behavior

2012-08-07 Thread kulkarni.swar...@gmail.com
What is the hive version that you are using?

On Tue, Aug 7, 2012 at 12:57 PM, Techy Teck  wrote:

> I am not sure about the data, but when we do
>
> SELECT count(*) from data_realtime where dt='20120730' and uid is null
>
> I get the count
>
> but If I do-
>
> SELECT * from data_realtime where dt='20120730' and uid is null
>
> I get zero record back. But if all the record is NULL then I should be
> getting NULL record back right?
>
>
> But I am not getting anything back and that is the reason it is making me
> more confuse.
>
>
>
>
>
>
> On Tue, Aug 7, 2012 at 10:31 AM, Yue Guan  wrote:
>
> > Just in case, all Record is null when uid is null?
> >
> > On Tue, Aug 7, 2012 at 1:14 PM, Techy Teck 
> > wrote:
> > > SELECT count(*) from data_realtime where dt='20120730' and uid is null
> > >
> > >
> > >
> > > I get the count as 1509
> > >
> > >
> > >
> > > So that means If I will be doing
> > >
> > >
> > >
> > > SELECT * from data_realtime where dt='20120730' and uid is null
> > >
> > >
> > >
> > > I should be seeing those records in which uid is null? right?
> > >
> > > But I get zero record back with the above query. Why is it so? Its very
> > > strange and why is it happening like this. Something wrong with the
> Hive?
> > >
> > >
> > >
> > > Can anyone suggest me what is happening?
> > >
> > >
> > >
> > >
> >
>



-- 
Swarnim


Re: Some Weird Behavior

2012-08-07 Thread kulkarni.swar...@gmail.com
In that case you might want to try "count(1)" instead of "count(*)" and see
if that makes any difference. [1]

[1] https://issues.apache.org/jira/browse/HIVE-287

On Tue, Aug 7, 2012 at 1:07 PM, Techy Teck  wrote:

> I am running Hive 0.6.
>
>
>
>
>
> On Tue, Aug 7, 2012 at 11:04 AM, kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com> wrote:
>
>> What is the hive version that you are using?
>>
>>
>> On Tue, Aug 7, 2012 at 12:57 PM, Techy Teck wrote:
>>
>>> I am not sure about the data, but when we do
>>>
>>> SELECT count(*) from data_realtime where dt='20120730' and uid is null
>>>
>>> I get the count
>>>
>>> but If I do-
>>>
>>> SELECT * from data_realtime where dt='20120730' and uid is null
>>>
>>> I get zero record back. But if all the record is NULL then I should be
>>> getting NULL record back right?
>>>
>>>
>>> But I am not getting anything back and that is the reason it is making me
>>> more confuse.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Aug 7, 2012 at 10:31 AM, Yue Guan  wrote:
>>>
>>> > Just in case, all Record is null when uid is null?
>>> >
>>> > On Tue, Aug 7, 2012 at 1:14 PM, Techy Teck 
>>> > wrote:
>>> > > SELECT count(*) from data_realtime where dt='20120730' and uid is
>>> null
>>> > >
>>> > >
>>> > >
>>> > > I get the count as 1509
>>> > >
>>> > >
>>> > >
>>> > > So that means If I will be doing
>>> > >
>>> > >
>>> > >
>>> > > SELECT * from data_realtime where dt='20120730' and uid is null
>>> > >
>>> > >
>>> > >
>>> > > I should be seeing those records in which uid is null? right?
>>> > >
>>> > > But I get zero record back with the above query. Why is it so? Its
>>> very
>>> > > strange and why is it happening like this. Something wrong with the
>>> Hive?
>>> > >
>>> > >
>>> > >
>>> > > Can anyone suggest me what is happening?
>>> > >
>>> > >
>>> > >
>>> > >
>>> >
>>>
>>
>>
>>
>> --
>> Swarnim
>>
>
>


-- 
Swarnim


Re: Logging info is not present in console output

2012-08-07 Thread kulkarni.swar...@gmail.com
Are you running via console? The default logging level is WARN.

$HIVE_HOME/bin/hive -hiveconf hive.root.logger=INFO,console


This should print the INFO messages onto the console.


On Tue, Aug 7, 2012 at 4:07 PM, Ablimit Aji  wrote:

> Hi,
>
> I have put some LOG.info() statements inside the join operator and I'm not
> seeing them by running a join statement.
> How can I see it? or is there any better way of debugging ?
>
> Thanks,
> Ablimit
>



-- 
Swarnim


Access to trigger jobs on jenkins

2013-08-04 Thread kulkarni.swar...@gmail.com
Hello,

I was wondering if it is possible to get access to be able to trigger jobs
on the jenkins server? Or is that access limited to committers?

Thanks,

-- 
Swarnim


Re: Access to trigger jobs on jenkins

2013-08-05 Thread kulkarni.swar...@gmail.com
Hi Brock,

Yes I was looking to trigger the pre-commit builds without having to
check-in a new patch everytime to auto-trigger them. I assumed they were
similar to the *regular* builds?


On Mon, Aug 5, 2013 at 7:43 AM, Brock Noland  wrote:

> Hi,
>
> Are you looking to trigger the pre-commit builds?
>
> Unfortunately to trigger *regular* builds you'd need an Apache username
> according the Apache Infra Jenkins <http://wiki.apache.org/general/Jenkins
> >page.
>
> Brock
>
>
> On Sun, Aug 4, 2013 at 1:37 PM, kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com> wrote:
>
> > Hello,
> >
> > I was wondering if it is possible to get access to be able to trigger
> jobs
> > on the jenkins server? Or is that access limited to committers?
> >
> > Thanks,
> >
> > --
> > Swarnim
> >
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
>



-- 
Swarnim


Re: [Discuss] project chop up

2013-08-07 Thread kulkarni.swar...@gmail.com
> I'd like to propose we move towards Maven.

Big +1 on this. Most of the major apache projects(hadoop, hbase, avro etc.)
are maven based.

Also can't agree more that the current build system is frustrating to say
the least. Another issue I had with the existing ant based system is that
there are no checkpointing capabilities[1]. So if a 6 hour build fails
after 5hr 30 minutes, most of the things even though successful have to be
rebuilt which is very time consuming. Maven reactors have inbuilt support
for lot of this stuff.

[1] https://issues.apache.org/jira/browse/HIVE-3449.


On Wed, Aug 7, 2013 at 2:06 PM, Brock Noland  wrote:

> Thus far there hasn't been any dissent to managing our modules with maven.
>  In addition there have been several comments positive on a move towards
> maven. I'd like to add Ivy seems to have issues managing multiple versions
> of libraries. For example in HIVE-3632 Ivy cache had to be cleared when
> testing patches that installed the new version of DataNucleus  I have had
> the same issue on HIVE-4388. Requiring the deletion of the ivy cache
> is extremely painful for developers that don't have access to high
> bandwidth connections or live in areas far from California where most of
> these jars are hosted.
>
> I'd like to propose we move towards Maven.
>
>
> On Sat, Jul 27, 2013 at 1:19 PM, Mohammad Islam 
> wrote:
>
> >
> >
> > Yes hive build and test cases got convoluted as the project scope
> > gradually increased. This is the time to take action!
> >
> > Based on my other Apache experiences, I prefer the option #3 "Breakup the
> > projects within our own source tree". Make multiple modules or
> > sub-projects. By default, only key modules will be built.
> >
> > Maven could be a possible candidate.
> >
> > Regards,
> > Mohammad
> >
> >
> >
> > 
> >  From: Edward Capriolo 
> > To: "dev@hive.apache.org" 
> > Sent: Saturday, July 27, 2013 7:03 AM
> > Subject: Re: [Discuss] project chop up
> >
> >
> > Or feel free to suggest different approach. I am used to managing
> software
> > as multi-module maven projects.
> > From a development standpoint if I was working on beeline, it would be
> nice
> > to only require some of the sub-projects to be open in my IDE to do that.
> > Also managing everything globally is not ideal.
> >
> > Hive's project layout, build, and test infrastructure is just funky. It
> has
> > to do a few interesting things (shims, testing), but I do not think what
> we
> > are doing justifies the massive ant build system we have. Ant is so ten
> > years ago.
> >
> >
> >
> > On Sat, Jul 27, 2013 at 12:04 AM, Alan Gates 
> > wrote:
> >
> > > But I assume they'd still be a part of targets like package, tar, and
> > > binary?  Making them compile and test separately and explicitly load
> the
> > > core Hive jars from maven/ivy seems reasonable.
> > >
> > > Alan.
> > >
> > > On Jul 26, 2013, at 8:40 PM, Brock Noland wrote:
> > >
> > > > Hi,
> > > >
> > > > I think thats part of it but I'd like to decouple the downstream
> > projects
> > > > even further so that the only connection is the dependency on the
> hive
> > > jars.
> > > >
> > > > Brock
> > > > On Jul 26, 2013 10:10 PM, "Alan Gates" 
> wrote:
> > > >
> > > >> I'm not sure how this is different from what hcat does today.  It
> > needs
> > > >> Hive's jars to compile, so it's one of the last things in the
> compile
> > > step.
> > > >> Would moving the other modules you note to be in the same category
> be
> > > >> enough?  Did you want to also make it so that the default ant target
> > > >> doesn't compile those?
> > > >>
> > > >> Alan.
> > > >>
> > > >> On Jul 26, 2013, at 4:09 PM, Edward Capriolo wrote:
> > > >>
> > > >>> My mistake on saying hcat was a fork metastore. I had a brain fart
> > for
> > > a
> > > >>> moment.
> > > >>>
> > > >>> One way we could do this is create a folder called downstream. In
> our
> > > >>> release step we can execute the downstream builds and then copy the
> > > files
> > > >>> we need back. So nothing downstream will be on the classpath of the
> > > main
> > > >>> project.
> > > >>>
> > > >>> This could help us breakup ql as well. Things like exotic file
> > formats
> > > ,
> > > >>> and things that are pluggable like zk locking can go here. That
> might
> > > be
> > > >>> overkill.
> > > >>>
> > > >>> For now we can focus on building downstream and hivethrift1might be
> > the
> > > >>> first thing to try to downstream.
> > > >>>
> > > >>>
> > > >>> On Friday, July 26, 2013, Thejas Nair 
> > wrote:
> > >  +1 to the idea of making the build of core hive and other
> downstream
> > >  components independent.
> > > 
> > >  bq.  I was under the impression that Hcat and hive-metastore was
> > >  supposed to merge up somehow.
> > > 
> > >  The metastore code was never forked. Hcat was just using
> > >  hive-metastore and making the metadata available to rest of hadoop
> > >  (pig, java MR..).
> > >  A lot o

Re: Fixing Trunk tests and getting stable nightly build on b.a.o

2014-05-30 Thread kulkarni.swar...@gmail.com
Hi Lewis,

Are there any specific tests that you are seeing trouble with? If so,
please feel free to log appropriate JIRAs to get them fixed (and submit
patches ;) )

There is a developer guide[1] that explains in quite detail how to run
tests.

> Are there any suggested EXTRA_PARAMETERS to be passing in to the JVM when 
> invoking
a build?

I think it depends. If you are running the full suite, most probably yes.
You might need to do something like export MAVEN_OPTS ="-Xmx2g
-XX:MaxPermSize=256M.

Hope that helps.

[1] https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ


On Thu, May 29, 2014 at 6:28 PM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:

> Hi Folks,
> Is there any interest in getting a stable build on builds.apache.org?
> I just checked out trunk code and it appears broken out of the box...
> It also eats up memory like mad.
> Are there any suggested EXTRA_PARAMETERS to be passing in to the JVM when
> invoking a build?
> Thanks
> Lewis
>
> --
> *Lewis*
>



-- 
Swarnim


Re: Documentation Policy

2014-06-10 Thread kulkarni.swar...@gmail.com
> Writing documentation sooner rather than later is likely to increases the
chances of things getting documented.

Big +1 on this. I think documentation contributes towards one of the major
technical debts(I personally have quite a bit for the patches I
contributed). IMHO committers may choose to reject patches that don't have
usage documentation if they include significant work which practically
cannot be consumed without proper documentation.

Slightly tangential but how we do choose on adding this to some of the
already resolved JIRAs that are missing documentation? I can volunteer to
dig through our JIRA queue and find some of these out but probably would
need some help from the contributors on these to be sure that they are
doc'ed properly. :)


On Tue, Jun 10, 2014 at 5:33 PM, Thejas Nair  wrote:

> > Also, I don't think we need to wait for end of the release cycle to start
> >> documenting features for the next release.
> >
> >
> > Agreed, but I think we should wait until the next release is less than
> two
> > months away.  What do other people think?
>
> We have been releasing almost every 3-4 months. So that is the longest
> un-released version documentation would be in the docs.
> Writing documentation sooner rather than later is likely to increases
> the chances of things getting documented. It is easier to get details
> from developers while the details are still fresh in their minds. It
> would also even the load on documentation volunteers over the time.
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>



-- 
Swarnim


Re: Documentation Policy

2014-06-11 Thread kulkarni.swar...@gmail.com
> Feel free to label such jiras with this keyword and ask the contributors
for more information if you need any.

Cool. I'll start chugging through the queue today adding labels as apt.


On Tue, Jun 10, 2014 at 9:45 PM, Thejas Nair  wrote:

> > Shall we lump 0.13.0 and 0.13.1 doc tasks as TODOC13?
> Sounds good to me.
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>



-- 
Swarnim


Re: Documentation Policy

2014-06-11 Thread kulkarni.swar...@gmail.com
Going through the issues, I think overall Lefty did an awesome job catching
and documenting most of them in time. Following are some of the 0.13 and
0.14 ones which I found which either do not have documentation or have
outdated one and probably need one to be consumeable. Contributors, feel
free to remove the label if you disagree.

*TODOC13:*
https://issues.apache.org/jira/browse/HIVE-6827?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC13%20AND%20status%20in%20(Resolved%2C%20Closed)

*TODOC14:*
https://issues.apache.org/jira/browse/HIVE-6999?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC14%20AND%20status%20in%20(Resolved%2C%20Closed)

I'll continue digging through the queue going backwards to 0.12 and 0.11
and see if I find similar stuff there as well.



On Wed, Jun 11, 2014 at 10:36 AM, kulkarni.swar...@gmail.com <
kulkarni.swar...@gmail.com> wrote:

> > Feel free to label such jiras with this keyword and ask the contributors
> for more information if you need any.
>
> Cool. I'll start chugging through the queue today adding labels as apt.
>
>
> On Tue, Jun 10, 2014 at 9:45 PM, Thejas Nair 
> wrote:
>
>> > Shall we lump 0.13.0 and 0.13.1 doc tasks as TODOC13?
>> Sounds good to me.
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified
>> that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender
>> immediately
>> and delete it from your system. Thank You.
>>
>
>
>
> --
> Swarnim
>



-- 
Swarnim


Re: Documentation Policy

2014-06-13 Thread kulkarni.swar...@gmail.com
+1 on deleting the TODOC tag as I think it's assumed by default that once
an enhancement is done, it will be doc'ed. We may consider adding an
additional "docdone" tag but I think we can instead just wait for a +1 from
the contributor that the documentation is satisfactory (and assume a
implicit +1 for no reply) before deleting the TODOC tag.


On Fri, Jun 13, 2014 at 1:32 PM, Szehon Ho  wrote:

> Yea, I'd imagine the TODOC tag pollutes the query of TODOC's and confuses
> the state of a JIRA, so its probably best to remove it.
>
> The idea of "docdone" is to query what docs got produced and needs review?
>  It might be nice to have a tag for that, to easily signal to contributor
> or interested parties to take a look.
>
> On a side note, I already find very helpful your JIRA comments with links
> to doc-wikis, both to inform the contributor and just as reference for
> others.  Thanks again for the great work.
>
>
> On Fri, Jun 13, 2014 at 1:33 AM, Lefty Leverenz 
> wrote:
>
> > One more question:  what should we do after the documentation is done
> for a
> > JIRA ticket?
> >
> > (a) Just remove the TODOC## label.
> > (b) Replace TODOC## with docdone (no caps, no version number).
> > (c) Add a docdone label but keep TODOC##.
> > (d) Something else.
> >
> >
> > -- Lefty
> >
> >
> > On Thu, Jun 12, 2014 at 12:54 PM, Brock Noland 
> wrote:
> >
> > > Thank you guys! This is great work.
> > >
> > >
> > > On Wed, Jun 11, 2014 at 6:20 PM, kulkarni.swar...@gmail.com <
> > > kulkarni.swar...@gmail.com> wrote:
> > >
> > > > Going through the issues, I think overall Lefty did an awesome job
> > > catching
> > > > and documenting most of them in time. Following are some of the 0.13
> > and
> > > > 0.14 ones which I found which either do not have documentation or
> have
> > > > outdated one and probably need one to be consumeable. Contributors,
> > feel
> > > > free to remove the label if you disagree.
> > > >
> > > > *TODOC13:*
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HIVE-6827?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC13%20AND%20status%20in%20(Resolved%2C%20Closed)
> > > >
> > > > *TODOC14:*
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HIVE-6999?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC14%20AND%20status%20in%20(Resolved%2C%20Closed)
> > > >
> > > > I'll continue digging through the queue going backwards to 0.12 and
> > 0.11
> > > > and see if I find similar stuff there as well.
> > > >
> > > >
> > > >
> > > > On Wed, Jun 11, 2014 at 10:36 AM, kulkarni.swar...@gmail.com <
> > > > kulkarni.swar...@gmail.com> wrote:
> > > >
> > > > > > Feel free to label such jiras with this keyword and ask the
> > > > contributors
> > > > > for more information if you need any.
> > > > >
> > > > > Cool. I'll start chugging through the queue today adding labels as
> > apt.
> > > > >
> > > > >
> > > > > On Tue, Jun 10, 2014 at 9:45 PM, Thejas Nair <
> the...@hortonworks.com
> > >
> > > > > wrote:
> > > > >
> > > > >> > Shall we lump 0.13.0 and 0.13.1 doc tasks as TODOC13?
> > > > >> Sounds good to me.
> > > > >>
> > > > >> --
> > > > >> CONFIDENTIALITY NOTICE
> > > > >> NOTICE: This message is intended for the use of the individual or
> > > entity
> > > > >> to
> > > > >> which it is addressed and may contain information that is
> > > confidential,
> > > > >> privileged and exempt from disclosure under applicable law. If the
> > > > reader
> > > > >> of this message is not the intended recipient, you are hereby
> > notified
> > > > >> that
> > > > >> any printing, copying, dissemination, distribution, disclosure or
> > > > >> forwarding of this communication is strictly prohibited. If you
> have
> > > > >> received this communication in error, please contact the sender
> > > > >> immediately
> > > > >> and delete it from your system. Thank You.
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Swarnim
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Swarnim
> > > >
> > >
> >
>



-- 
Swarnim


Re: Documentation Policy

2014-06-14 Thread kulkarni.swar...@gmail.com
A few more from older releases:

*0.10*:
https://issues.apache.org/jira/browse/HIVE-2397?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC10%20AND%20status%20in%20(Resolved%2C%20Closed)%20ORDER%20BY%20priority%20DESC

*0.11:*
https://issues.apache.org/jira/browse/HIVE-3073?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC11%20AND%20status%20in%20(Resolved%2C%20Closed)%20ORDER%20BY%20priority%20DESC

*0.12:*
https://issues.apache.org/jira/browse/HIVE-5161?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC12%20AND%20status%20in%20(Resolved%2C%20Closed)%20ORDER%20BY%20priority%20DESC

Should we create  JIRA for these so that the work to be done on these does
not get lost?



On Fri, Jun 13, 2014 at 5:59 PM, Lefty Leverenz 
wrote:

> Agreed, deleting TODOC## simplifies the labels field, so we should just use
> comments to keep track of docs done.
>
> Besides, doc tasks can get complicated -- my gmail inbox has a few messages
> with simultaneous done and to-do labels -- so comments are best for
> tracking progress.  Also, as Szehon noticed, links in the comments make it
> easy to find the docs.
>
> +1 on (a):  delete TODOCs when done; don't add any new labels.
>
> -- Lefty
>
>
> On Fri, Jun 13, 2014 at 1:31 PM, kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com> wrote:
>
> > +1 on deleting the TODOC tag as I think it's assumed by default that once
> > an enhancement is done, it will be doc'ed. We may consider adding an
> > additional "docdone" tag but I think we can instead just wait for a +1
> from
> > the contributor that the documentation is satisfactory (and assume a
> > implicit +1 for no reply) before deleting the TODOC tag.
> >
> >
> > On Fri, Jun 13, 2014 at 1:32 PM, Szehon Ho  wrote:
> >
> > > Yea, I'd imagine the TODOC tag pollutes the query of TODOC's and
> confuses
> > > the state of a JIRA, so its probably best to remove it.
> > >
> > > The idea of "docdone" is to query what docs got produced and needs
> > review?
> > >  It might be nice to have a tag for that, to easily signal to
> contributor
> > > or interested parties to take a look.
> > >
> > > On a side note, I already find very helpful your JIRA comments with
> links
> > > to doc-wikis, both to inform the contributor and just as reference for
> > > others.  Thanks again for the great work.
> > >
> > >
> > > On Fri, Jun 13, 2014 at 1:33 AM, Lefty Leverenz <
> leftylever...@gmail.com
> > >
> > > wrote:
> > >
> > > > One more question:  what should we do after the documentation is done
> > > for a
> > > > JIRA ticket?
> > > >
> > > > (a) Just remove the TODOC## label.
> > > > (b) Replace TODOC## with docdone (no caps, no version number).
> > > > (c) Add a docdone label but keep TODOC##.
> > > > (d) Something else.
> > > >
> > > >
> > > > -- Lefty
> > > >
> > > >
> > > > On Thu, Jun 12, 2014 at 12:54 PM, Brock Noland 
> > > wrote:
> > > >
> > > > > Thank you guys! This is great work.
> > > > >
> > > > >
> > > > > On Wed, Jun 11, 2014 at 6:20 PM, kulkarni.swar...@gmail.com <
> > > > > kulkarni.swar...@gmail.com> wrote:
> > > > >
> > > > > > Going through the issues, I think overall Lefty did an awesome
> job
> > > > > catching
> > > > > > and documenting most of them in time. Following are some of the
> > 0.13
> > > > and
> > > > > > 0.14 ones which I found which either do not have documentation or
> > > have
> > > > > > outdated one and probably need one to be consumeable.
> Contributors,
> > > > feel
> > > > > > free to remove the label if you disagree.
> > > > > >
> > > > > > *TODOC13:*
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HIVE-6827?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC13%20AND%20status%20in%20(Resolved%2C%20Closed)
> > > > > >
> > > > > > *TODOC14:*
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HIVE-6999?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC14%20AND%20status%20in%20(Resolved%2C%20Closed)
> > > > > >
> &g

Re: [ANNOUNCE] New Hive Committers - Gopal Vijayaraghavan and Szehon Ho

2014-06-23 Thread kulkarni.swar...@gmail.com
Congratulations guys!


On Mon, Jun 23, 2014 at 2:09 AM, Lefty Leverenz 
wrote:

> Bravo, Szehon and Gopal!
>
> -- Lefty
>
>
> On Mon, Jun 23, 2014 at 12:53 AM, Gopal V  wrote:
>
> > On 6/22/14, 8:42 PM, Carl Steinbach wrote:
> >
> >> The Apache Hive PMC has voted to make Gopal Vijayaraghavan and Szehon Ho
> >> committers on the Apache Hive Project.
> >>
> >
> > Thanks everyone! And congrats Szehon!
> >
> > Cheers,
> > Gopal
> >
>



-- 
Swarnim


Hive JIRA slow/dead

2012-10-08 Thread kulkarni.swar...@gmail.com
Seems like the hive JIRA is extremely slow to respond since this morning.
Is there anyway to may be cycle the instance to fix the issue?

Thanks,

-- 
Swarnim


Re: hive 0.10 release

2012-11-19 Thread kulkarni.swar...@gmail.com
There are couple of enhancements that I have been working on mainly related
to the hive/hbase integration. It would be awesome if it is possible at all
to include them in this release. None of them should really be high risk. I
have patches submitted for few of them. Will try to get for others
submitted in next couple of days. Any specific deadline that I should be
looking forward to?

[1] https://issues.apache.org/jira/browse/HIVE-2599 (Patch Available)
[2] https://issues.apache.org/jira/browse/HIVE-3553 (Patch Available)
[3] https://issues.apache.org/jira/browse/HIVE-3211
[4] https://issues.apache.org/jira/browse/HIVE-3555
[5] https://issues.apache.org/jira/browse/HIVE-3725


On Mon, Nov 19, 2012 at 4:55 PM, Ashutosh Chauhan wrote:

> Another quick update. I have created a hive-0.10 branch. At this point,
> HIVE-3678 is a blocker to do a 0.10 release. There are few others nice to
> have which were there in my previous email. I will be happy to merge new
> patches between now and RC if folks request for it and are low risk.
>
> Thanks,
> Ashutosh
> On Thu, Nov 15, 2012 at 2:29 PM, Ashutosh Chauhan  >wrote:
>
> > Good progress. Looks like folks are on board. I propose to cut the branch
> > in next couple of days. There are few jiras which are patch ready which I
> > want to get into the hive-0.10 release, including HIVE-3255 HIVE-2517
> > HIVE-3400 HIVE-3678
> > Ed has already made a request for HIVE-3083.  If folks have other patches
> > they want see in 0.10, please chime in.
> > Also, request to other committers to help in review patches. There are
> > quite a few in Patch Available state.
> >
> > Thanks,
> > Ashutosh
> >
> >
> > On Thu, Nov 8, 2012 at 3:22 PM, Owen O'Malley 
> wrote:
> >
> >> +1
> >>
> >>
> >> On Thu, Nov 8, 2012 at 3:18 PM, Carl Steinbach 
> wrote:
> >>
> >> > +1
> >> >
> >> > On Wed, Nov 7, 2012 at 11:23 PM, Alexander Lorenz <
> wget.n...@gmail.com
> >> > >wrote:
> >> >
> >> > > +1, good karma
> >> > >
> >> > > On Nov 8, 2012, at 4:58 AM, Namit Jain  wrote:
> >> > >
> >> > > > +1 to the idea
> >> > > >
> >> > > > On 11/8/12 6:33 AM, "Edward Capriolo" 
> >> wrote:
> >> > > >
> >> > > >> That sounds good. I think this issue needs to be solved as well
> as
> >> > > >> anything else that produces a bugus query result.
> >> > > >>
> >> > > >> https://issues.apache.org/jira/browse/HIVE-3083
> >> > > >>
> >> > > >> Edward
> >> > > >>
> >> > > >> On Wed, Nov 7, 2012 at 7:50 PM, Ashutosh Chauhan <
> >> > hashut...@apache.org>
> >> > > >> wrote:
> >> > > >>> Hi,
> >> > > >>>
> >> > > >>> Its been a while since we released 0.10 more than six months
> ago.
> >> All
> >> > > >>> this
> >> > > >>> while, lot of action has happened with various cool features
> >> landing
> >> > in
> >> > > >>> trunk. Additionally, I am looking forward to HiveServer2 landing
> >> in
> >> > > >>> trunk.  So, I propose that we cut the branch for 0.10 soon
> >> afterwards
> >> > > >>> and
> >> > > >>> than release it. Thoughts?
> >> > > >>>
> >> > > >>> Thanks,
> >> > > >>> Ashutosh
> >> > > >
> >> > >
> >> > > --
> >> > > Alexander Alten-Lorenz
> >> > > http://mapredit.blogspot.com
> >> > > German Hadoop LinkedIn Group: http://goo.gl/N8pCF
> >> > >
> >> > >
> >> >
> >>
> >
> >
>



-- 
Swarnim


Proposal to switch to pull requests

2014-03-05 Thread kulkarni.swar...@gmail.com
Hello,

Since we have a nice mirrored git repository for hive[1], any specific
reason why we can't switch to doing pull requests instead of patches? IMHO
pull requests are awesome for peer review plus it is also very easy to keep
track of JIRAs with open pull requests instead of looking for JIRAs in a
"Patch Available" state. Also since they get updated automatically, it is
also very easy to see if a review comment made by a reviewer was addressed
properly or not.

Thoughts?

Thanks,

[1] https://github.com/apache/hive

-- 
Swarnim


Re: Proposal to switch to pull requests

2014-03-07 Thread kulkarni.swar...@gmail.com
+1


On Fri, Mar 7, 2014 at 1:05 AM, Thejas Nair  wrote:

> Should we start with moving our primary source code repository from
> svn to git ? I feel git is more powerful and easy to use (once you go
> past the learning curve!).
>
>
> On Wed, Mar 5, 2014 at 7:39 AM, Brock Noland  wrote:
> > Personally I prefer the Github workflow, but I believe there have been
> > some challenges with that since the source for apache projects must be
> > stored in apache source control (git or svn).
> >
> > Relevent:
> https://blogs.apache.org/infra/entry/improved_integration_between_apache_and
> >
> > On Wed, Mar 5, 2014 at 9:19 AM, kulkarni.swar...@gmail.com
> >  wrote:
> >> Hello,
> >>
> >> Since we have a nice mirrored git repository for hive[1], any specific
> >> reason why we can't switch to doing pull requests instead of patches?
> IMHO
> >> pull requests are awesome for peer review plus it is also very easy to
> keep
> >> track of JIRAs with open pull requests instead of looking for JIRAs in a
> >> "Patch Available" state. Also since they get updated automatically, it
> is
> >> also very easy to see if a review comment made by a reviewer was
> addressed
> >> properly or not.
> >>
> >> Thoughts?
> >>
> >> Thanks,
> >>
> >> [1] https://github.com/apache/hive
> >>
> >> --
> >> Swarnim
> >
> >
> >
> > --
> > Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>



-- 
Swarnim


Re: Bumping a few JIRAs

2014-03-20 Thread kulkarni.swar...@gmail.com
I am also definitely willing to help out with reviewing the JIRAs. Just my
+1 won't matter much as it is non-binding. :)


On Thu, Mar 20, 2014 at 1:21 PM, Lefty Leverenz wrote:

> I gave HIVE-6331  a +1
> and
> asked for a trivial fix in
> HIVE-5652,
> then I'll give it +1 also.
>
> But I can't vouch for the technical information except as far as common
> sense takes me.  So can someone else take a look too?
>
> -- Lefty
>
>
> On Thu, Mar 20, 2014 at 12:46 PM, Xuefu Zhang  wrote:
>
> > Thanks for reaching out. Since the first two were initially reviewed by
> > Lefty, it's better to get her +1 in order to be committed if she's
> > available.
> >
> > I can take a look at HIVE-6510.
> >
> > Thanks,
> > Xuefu
> >
> >
> > On Thu, Mar 20, 2014 at 9:32 AM, Lars Francke  > >wrote:
> >
> > > Hi,
> > >
> > > I have submitted a couple of minor JIRAs with patches but nothing has
> > > happened for months. I'd like to get these in if possible.
> > >
> > > Is there anything I can do to help that process?
> > >
> > > 
> > > 
> > > 
> > >
> > > Thanks for your help.
> > >
> > > Cheers,
> > > Lars
> > >
> >
>



-- 
Swarnim


Re: Bumping a few JIRAs

2014-03-20 Thread kulkarni.swar...@gmail.com
Left few minor comments on the JIRAs


On Thu, Mar 20, 2014 at 2:42 PM, kulkarni.swar...@gmail.com <
kulkarni.swar...@gmail.com> wrote:

> I am also definitely willing to help out with reviewing the JIRAs. Just my
> +1 won't matter much as it is non-binding. :)
>
>
> On Thu, Mar 20, 2014 at 1:21 PM, Lefty Leverenz 
> wrote:
>
>> I gave HIVE-6331 <https://issues.apache.org/jira/browse/HIVE-6331> a +1
>> and
>> asked for a trivial fix in
>> HIVE-5652<https://issues.apache.org/jira/browse/HIVE-5652>,
>> then I'll give it +1 also.
>>
>> But I can't vouch for the technical information except as far as common
>> sense takes me.  So can someone else take a look too?
>>
>> -- Lefty
>>
>>
>> On Thu, Mar 20, 2014 at 12:46 PM, Xuefu Zhang 
>> wrote:
>>
>> > Thanks for reaching out. Since the first two were initially reviewed by
>> > Lefty, it's better to get her +1 in order to be committed if she's
>> > available.
>> >
>> > I can take a look at HIVE-6510.
>> >
>> > Thanks,
>> > Xuefu
>> >
>> >
>> > On Thu, Mar 20, 2014 at 9:32 AM, Lars Francke > > >wrote:
>> >
>> > > Hi,
>> > >
>> > > I have submitted a couple of minor JIRAs with patches but nothing has
>> > > happened for months. I'd like to get these in if possible.
>> > >
>> > > Is there anything I can do to help that process?
>> > >
>> > > <https://issues.apache.org/jira/browse/HIVE-5652>
>> > > <https://issues.apache.org/jira/browse/HIVE-6331>
>> > > <https://issues.apache.org/jira/browse/HIVE-6510>
>> > >
>> > > Thanks for your help.
>> > >
>> > > Cheers,
>> > > Lars
>> > >
>> >
>>
>
>
>
> --
> Swarnim
>



-- 
Swarnim


Review Requests

2013-02-04 Thread kulkarni.swar...@gmail.com
Hello,

I opened up two reviews for small issues, HIVE-3553[1] and HIVE-3725[2]. If
you guys get a chance to review and provide feedback on it, I will really
appreciate.

Thanks,

[1] https://reviews.apache.org/r/9275/
[2] https://reviews.apache.org/r/9276/

-- 
Swarnim


Re: Review Requests

2013-02-05 Thread kulkarni.swar...@gmail.com
Thanks Mark. Appreciate that. I'll take a look.


On Mon, Feb 4, 2013 at 10:23 PM, Mark Grover wrote:

> Swarnim,
> I left some comments on  reviewboard.
>
> On Mon, Feb 4, 2013 at 8:00 AM, kulkarni.swar...@gmail.com <
> kulkarni.swar...@gmail.com> wrote:
>
> > Hello,
> >
> > I opened up two reviews for small issues, HIVE-3553[1] and HIVE-3725[2].
> If
> > you guys get a chance to review and provide feedback on it, I will really
> > appreciate.
> >
> > Thanks,
> >
> > [1] https://reviews.apache.org/r/9275/
> > [2] https://reviews.apache.org/r/9276/
> >
> > --
> > Swarnim
> >
>



-- 
Swarnim


Re: Review Requests

2013-02-20 Thread kulkarni.swar...@gmail.com
Would someone have a chance to take a quick look at these review
requests[1][2].

[1] https://reviews.apache.org/r/9275/
[2] https://reviews.apache.org/r/9276/

Thanks,


On Tue, Feb 5, 2013 at 10:00 AM, kulkarni.swar...@gmail.com <
kulkarni.swar...@gmail.com> wrote:

> Thanks Mark. Appreciate that. I'll take a look.
>
>
> On Mon, Feb 4, 2013 at 10:23 PM, Mark Grover 
> wrote:
>
>> Swarnim,
>> I left some comments on  reviewboard.
>>
>> On Mon, Feb 4, 2013 at 8:00 AM, kulkarni.swar...@gmail.com <
>> kulkarni.swar...@gmail.com> wrote:
>>
>> > Hello,
>> >
>> > I opened up two reviews for small issues, HIVE-3553[1] and
>> HIVE-3725[2]. If
>> > you guys get a chance to review and provide feedback on it, I will
>> really
>> > appreciate.
>> >
>> > Thanks,
>> >
>> > [1] https://reviews.apache.org/r/9275/
>> > [2] https://reviews.apache.org/r/9276/
>> >
>> > --
>> > Swarnim
>> >
>>
>
>
>
> --
> Swarnim
>



-- 
Swarnim


Preferred way to run unit tests

2013-04-12 Thread kulkarni.swar...@gmail.com
Hello,

I have been trying to run the unit tests for the last hive release (0.10).
For me they have been taking  in access of 10 hrs to run (not to mention
the occasional failures with some of the flaky tests).

Current I am just doing a "ant clean package test". Is there a better way
to run these? Also is it possible for the build to ignore any test failures
and complete?

Thanks for any help.

-- 
Swarnim


Re: is there set of queries, which can be used to benchmark the hive performance?

2013-04-16 Thread kulkarni.swar...@gmail.com
Hi Rob,

HiBench[1] is one I have seen most commonly used.

[1] https://github.com/intel-hadoop/HiBench/tree/master/hivebench


On Tue, Apr 16, 2013 at 6:42 PM, ur lops  wrote:

> I am looking to benchmark my database with hive. but before I do that,
> I want to run a set of tests on hive to benchmark hive. Is there
> something exists in hive, similar to pig gridmix?
> Thanks in advance
> Rob.
>



-- 
Swarnim