Brock, Could someone looking to reviewing this? We're hoping to get parquet rolled out internally and this is a pretty important feature for us.
Thanks, -Dan On Mon, May 19, 2014 at 10:50 AM, Daniel Weeks <dwe...@netflix.com> wrote: > No, my test passed and I don't think any of the others are related. > > -Dan > > > On Mon, May 19, 2014 at 10:44 AM, Brock Noland <br...@cloudera.com> wrote: > >> Hi, >> >> Did any of your tests fail? If not, then we should be able to review. >> >> Brock >> >> >> On Mon, May 19, 2014 at 12:12 PM, Daniel Weeks <dwe...@netflix.com> >> wrote: >> >>> Brock, >>> >>> I'm not sure where we stand at this point. Do I need to resubmit after >>> the problems with Java 7 are worked through or is it ok to leave it in its >>> current state? >>> >>> Thanks, >>> Dan >>> >>> >>> On Fri, May 16, 2014 at 10:41 AM, Brock Noland <br...@cloudera.com> >>> wrote: >>> >>>> I believe the results for the latest patch have just been posted. >>>> You'll see a bunch of unrelated failures since we just switched to Java 7. >>>> >>>> >>>> On Fri, May 16, 2014 at 12:39 PM, Daniel Weeks <dwe...@netflix.com> >>>> wrote: >>>> >>>>> I updated the patch on the JIRA ticket, but the Hive QA hasn't >>>>> triggered yet. I had problems with this previously and was just wondering >>>>> if I hit the same issue again. >>>>> >>>>> Thanks, >>>>> Dan >>>>> >>>>> >>>>> On Tue, May 13, 2014 at 4:21 PM, Xuefu Zhang <xzh...@cloudera.com> >>>>> wrote: >>>>> >>>>>> Yeah. I saw some doubts on HIVE-6936 as well. Not sure whether or >>>>>> when it can get thru. I'm fine with the global config approach, which, >>>>>> once >>>>>> in place, will probably stay unless it's changed before it's released. >>>>>> >>>>>> >>>>>> On Tue, May 13, 2014 at 4:07 PM, Daniel Weeks <dwe...@netflix.com> >>>>>> wrote: >>>>>> >>>>>>> That would be nice, but I didn't see a lot of movement on that issue >>>>>>> in the last few weeks. Since the parquet integration can be done in two >>>>>>> steps, it isn't really dependent on 6936 for the many who want to use >>>>>>> column based index as the default. >>>>>>> >>>>>>> Any idea what the timeline is for 6936? Is this even a priority? >>>>>>> >>>>>>> Thanks, >>>>>>> Dan >>>>>>> >>>>>>> >>>>>>> On Tue, May 13, 2014 at 3:43 PM, Xuefu Zhang <xzh...@cloudera.com> >>>>>>> wrote: >>>>>>> >>>>>>>> I actually meant pushing >>>>>>>> https://issues.apache.org/jira/browse/HIVE-6936 forward first. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, May 13, 2014 at 3:41 PM, Xuefu Zhang <xzh...@cloudera.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Thanks, Daniel. It might be better if we can push HIVE-6938 >>>>>>>>> forward so that we can do it once for all. It's hard to remove a >>>>>>>>> config >>>>>>>>> once being released. >>>>>>>>> >>>>>>>>> --Xuefu >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, May 13, 2014 at 2:59 PM, Daniel Weeks <dwe...@netflix.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I've updated the patch for HIVE-6938 >>>>>>>>>> <https://issues.apache.org/jira/browse/HIVE-6938> to be a global >>>>>>>>>> setting (maintaining the default behavior for existing parquet-hive >>>>>>>>>> users). >>>>>>>>>> When HIVE-6936 gets sorted out and a path is determined for >>>>>>>>>> exposing table >>>>>>>>>> properties to input formats, I'll update to also allow a table level >>>>>>>>>> switch. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> -Dan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, May 13, 2014 at 12:28 PM, Daniel Weeks < >>>>>>>>>> dwe...@netflix.com> wrote: >>>>>>>>>> >>>>>>>>>>> Xuefu, >>>>>>>>>>> >>>>>>>>>>> Unfortunately, parquet can't simply try by name and fallback to >>>>>>>>>>> index. The two approaches are orthogonal and mixing modes can >>>>>>>>>>> cause all >>>>>>>>>>> sorts of problems. You can read a little more about the various >>>>>>>>>>> access >>>>>>>>>>> schemes here: >>>>>>>>>>> https://github.com/Parquet/parquet-format/issues/91 >>>>>>>>>>> >>>>>>>>>>> The JIRA you indicated is exactly what we need to make this >>>>>>>>>>> configurable at the table level. >>>>>>>>>>> >>>>>>>>>>> I can modify my patch to use the global setting and ignore the >>>>>>>>>>> table setting until 6936 is resolved. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Dan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, May 13, 2014 at 10:59 AM, Xuefu Zhang < >>>>>>>>>>> xzh...@cloudera.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> My preference is less configurations. Could parquet first >>>>>>>>>>>> access by name, and retry by index upon failure? As long as we >>>>>>>>>>>> clearly >>>>>>>>>>>> document the behavior, we should be Okay. >>>>>>>>>>>> >>>>>>>>>>>> If configuration turns out to be most viable, Does this help in >>>>>>>>>>>> any way? https://issues.apache.org/jira/browse/HIVE-6936 >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Xuefu >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, May 12, 2014 at 7:20 PM, Brock Noland < >>>>>>>>>>>> br...@cloudera.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Daniel, >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you for the information. Nong, Szehon, or Xeufu, do you >>>>>>>>>>>>> have any thoughts on this? If we are going to have a global flag, >>>>>>>>>>>>> my >>>>>>>>>>>>> thought would be default this to on. >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> Brock >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, May 12, 2014 at 6:09 PM, Daniel Weeks < >>>>>>>>>>>>> dwe...@netflix.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Brock and Szehon, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I took a look at the failures and it was the missing test >>>>>>>>>>>>>> output file. However, I discovered a larger issue that my >>>>>>>>>>>>>> result in a >>>>>>>>>>>>>> change to the approach. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Initially, I wanted to make column index access a toggle >>>>>>>>>>>>>> using table properties, but the interaction necessary between >>>>>>>>>>>>>> the Serde (to >>>>>>>>>>>>>> get the table property) and the input format (to read the >>>>>>>>>>>>>> records) doesn't >>>>>>>>>>>>>> allow me to pass the hint along. I was using the config loaded >>>>>>>>>>>>>> by the >>>>>>>>>>>>>> Serde to check a table property and then set a property in the >>>>>>>>>>>>>> config that >>>>>>>>>>>>>> eventually is used to init the input format. In most cases, >>>>>>>>>>>>>> that change is >>>>>>>>>>>>>> available to the input format, but there seems to be a few edge >>>>>>>>>>>>>> cases where >>>>>>>>>>>>>> the config used is different. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I can revert this to a global setting (which does work), but >>>>>>>>>>>>>> I was hoping you might have an idea as to how this might work as >>>>>>>>>>>>>> a table >>>>>>>>>>>>>> property. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Any ideas would help, Thanks, >>>>>>>>>>>>>> -Dan >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, May 8, 2014 at 5:17 PM, Brock Noland < >>>>>>>>>>>>>> br...@cloudera.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have re-uploaded the patch since we had trouble with our >>>>>>>>>>>>>>> patch testing infra and assigned it to Daniel. I'll take a look >>>>>>>>>>>>>>> after tests >>>>>>>>>>>>>>> pass. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers! >>>>>>>>>>>>>>> Brock >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, May 8, 2014 at 4:49 PM, Daniel Weeks < >>>>>>>>>>>>>>> dwe...@netflix.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Brock, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I was wondering if you could take a look at the patch for >>>>>>>>>>>>>>>> HIVE-6938 <https://issues.apache.org/jira/browse/HIVE-6938> >>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>> adds support for column rename with parquet file. The patch >>>>>>>>>>>>>>>> has been >>>>>>>>>>>>>>>> available for a while now without review and It's important >>>>>>>>>>>>>>>> for us to >>>>>>>>>>>>>>>> transition to parquet. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Dan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >