It seemed that ROW__OFFSET__INSIDE__BLOCK is meaningful only with SequenceFileFormat (+block compression) or RCFileFormat.
2012/10/3 Edward Capriolo <edlinuxg...@gmail.com> > Make sure virtual column support is turned on in your hive-site.xml. I > have a feeling that this field is only supported inside certain input > formats because I was unable to get a non-very number out of it. (I > think it only works with index files) > > On Wed, Oct 3, 2012 at 4:20 AM, afancy <grou...@gmail.com> wrote: > > Hi, > > > > Could anybody explain me what ROW__OFFSET__INSIDE__BLOCK means? > > For example, I make the following query, and return two rows. But why > does > > the column of ROW__OFFSET__INSIDE__BLOCK show 0? > > For my understanding from the name of column, it should return the line > > number of the records in the block files, but now both are 0. So, what > is > > the BLOCK, BLOCK offset, and row offset in a block? > > The Hive bitmap document is very confusing. > > > > > > hive> SELECT `url`, INPUT__FILE__NAME,BLOCK__OFFSET__INSIDE__FILE, > > ROW__OFFSET__INSIDE__BLOCK FROM `testresult` WHERE > > url='http://www.domain022.tl04/page035.html'; > > > > http://www.domain022.tl04/page035.html > > hdfs://pc01:54310/user/hive/warehouse/testresult/testresults.csv 0 0 > > http://www.domain022.tl04/page035.html > > hdfs://pc01:54310/user/hive/warehouse/testresult/testresults.csv 3200250 > 0 > > Time taken: 19.653 seconds > > hive> > > > > > > Regards, > > afancy > > >