But HDP 2.2 uses HDFS 2.6.0... very hard to convince our admins to upgrade.

Would you recommend us to upgrade to 2.6.0? I'll ask them to consult HWX if
you say yes. :)

Jianshi

On Fri, Nov 14, 2014 at 9:42 AM, Ted Yu <[email protected]> wrote:

> No.
> The upcoming HDP 2.2 does have that fix.
>
> Cheers
>
> On Thu, Nov 13, 2014 at 5:38 PM, Jianshi Huang <[email protected]>
> wrote:
>
> > Oh, btw, is latest HDP 2.1(0.98.0.2.1.7.0-784-hadoop2) have this fix?
> >
> > Jianshi
> >
> > On Fri, Nov 14, 2014 at 9:37 AM, Jianshi Huang <[email protected]>
> > wrote:
> >
> > > Thanks Ted.
> > >
> > > I think the fix you mentioned is this one HBASE-12078
> > > <https://issues.apache.org/jira/browse/HBASE-12078>.
> > >
> > > Not sure when our Hadoop admin would upgrade it, ahhh....
> > >
> > > Jianshi
> > >
> > > On Thu, Nov 13, 2014 at 11:15 PM, Ted Yu <[email protected]> wrote:
> > >
> > >> Keep in mind that Prefix Tree encoding has higher overhead in write
> path
> > >> compared to other data block encoding methods.
> > >>
> > >> Please use 0.98.7 which has the latest fixes for Prefix Tree encoding.
> > >>
> > >> Cheers
> > >>
> > >> On Thu, Nov 13, 2014 at 1:27 AM, Jianshi Huang <
> [email protected]
> > >
> > >> wrote:
> > >>
> > >> > Thanks Ram,
> > >> >
> > >> > How about Prefix Tree based encoding then? HBASE-4676
> > >> > <https://issues.apache.org/jira/browse/HBASE-4676> says it's also
> > >> possible
> > >> > to do suffix tries? Then it could be a nice fit for JSON String (or
> > any
> > >> > long value where changes are small).
> > >> >
> > >> > Maybe I should just flatten JSON to columns, hmm...what's the
> overhead
> > >> for
> > >> > a column?
> > >> >
> > >> > Jianshi
> > >> >
> > >> > On Thu, Nov 13, 2014 at 4:49 PM, ramkrishna vasudevan <
> > >> > [email protected]> wrote:
> > >> >
> > >> > > >>So is it possible to specify FASTDIFF for rowkey/column and DIFF
> > for
> > >> > > value
> > >> > > cell?
> > >> > > No that is not possible now. All the encoding is per KV only.
> > >> > > But what you say is definitely worth trying.
> > >> > >
> > >> > > >>So would you recommend storing JSON flattened as many columns?
> > >> > > May be yes.  But I have practically not used JSON formats so I may
> > >> not be
> > >> > > the best person to comment on this.
> > >> > >
> > >> > > Regards
> > >> > > Ram
> > >> > >
> > >> > > On Thu, Nov 13, 2014 at 2:01 PM, Jianshi Huang <
> > >> [email protected]>
> > >> > > wrote:
> > >> > >
> > >> > > > Thanks Ram,
> > >> > > >
> > >> > > > So is it possible to specify FASTDIFF for rowkey/column and DIFF
> > for
> > >> > > value
> > >> > > > cell?
> > >> > > >
> > >> > > > So would you recommend storing JSON flattened as many columns?
> > >> > > >
> > >> > > > Jianshi
> > >> > > >
> > >> > > > On Thu, Nov 13, 2014 at 2:08 PM, ramkrishna vasudevan <
> > >> > > > [email protected]> wrote:
> > >> > > >
> > >> > > > > Hi
> > >> > > > >
> > >> > > > > >> Since I'm storing
> > >> > > > > historical data (snapshot data) and changes between adjacent
> > value
> > >> > > cells
> > >> > > > > are relatively small.
> > >> > > > >
> > >> > > > > If the values are changing even if it is smaller the FASTDIFF
> > will
> > >> > > > rewrite
> > >> > > > > the value part.  Only if there are exact matches then it would
> > >> skip
> > >> > the
> > >> > > > > value part. JFYI.
> > >> > > > >
> > >> > > > > Regards
> > >> > > > > Ram
> > >> > > > >
> > >> > > > > On Thu, Nov 13, 2014 at 11:23 AM, Jianshi Huang <
> > >> > > [email protected]
> > >> > > > >
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > I thought FASTDIFF was only for rowkey and columns, great if
> > it
> > >> > also
> > >> > > > > works
> > >> > > > > > in value cell.
> > >> > > > > >
> > >> > > > > > And thanks for the bjson link!
> > >> > > > > >
> > >> > > > > > Jianshi
> > >> > > > > >
> > >> > > > > > On Thu, Nov 13, 2014 at 1:18 PM, Ted Yu <
> [email protected]>
> > >> > wrote:
> > >> > > > > >
> > >> > > > > > > There is FASTDIFF data block encoding.
> > >> > > > > > >
> > >> > > > > > > See also http://bjson.org/
> > >> > > > > > >
> > >> > > > > > > Cheers
> > >> > > > > > >
> > >> > > > > > > On Nov 12, 2014, at 9:08 PM, Jianshi Huang <
> > >> > > [email protected]>
> > >> > > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > > Hi,
> > >> > > > > > > >
> > >> > > > > > > > I'm currently saving JSON in pure String format in the
> > value
> > >> > cell
> > >> > > > and
> > >> > > > > > > > depends on HBase' block compression to reduce the
> overhead
> > >> of
> > >> > > JSON.
> > >> > > > > > > >
> > >> > > > > > > > I'm wondering if there's a more space efficient way to
> > store
> > >> > > JSON?
> > >> > > > > > > > (there're lots of 0s and 1s, JSON String actually is an
> OK
> > >> > > format)
> > >> > > > > > > >
> > >> > > > > > > > I want to keep the value as a Map since the schema of
> > source
> > >> > data
> > >> > > > > might
> > >> > > > > > > > change over time.
> > >> > > > > > > >
> > >> > > > > > > > Also is there a DIFF based encoding for values? Since
> I'm
> > >> > storing
> > >> > > > > > > > historical data (snapshot data) and changes between
> > adjacent
> > >> > > value
> > >> > > > > > cells
> > >> > > > > > > > are relatively small.
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > Thanks,
> > >> > > > > > > > --
> > >> > > > > > > > Jianshi Huang
> > >> > > > > > > >
> > >> > > > > > > > LinkedIn: jianshi
> > >> > > > > > > > Twitter: @jshuang
> > >> > > > > > > > Github & Blog: http://huangjs.github.com/
> > >> > > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > --
> > >> > > > > > Jianshi Huang
> > >> > > > > >
> > >> > > > > > LinkedIn: jianshi
> > >> > > > > > Twitter: @jshuang
> > >> > > > > > Github & Blog: http://huangjs.github.com/
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > Jianshi Huang
> > >> > > >
> > >> > > > LinkedIn: jianshi
> > >> > > > Twitter: @jshuang
> > >> > > > Github & Blog: http://huangjs.github.com/
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Jianshi Huang
> > >> >
> > >> > LinkedIn: jianshi
> > >> > Twitter: @jshuang
> > >> > Github & Blog: http://huangjs.github.com/
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > Jianshi Huang
> > >
> > > LinkedIn: jianshi
> > > Twitter: @jshuang
> > > Github & Blog: http://huangjs.github.com/
> > >
> >
> >
> >
> > --
> > Jianshi Huang
> >
> > LinkedIn: jianshi
> > Twitter: @jshuang
> > Github & Blog: http://huangjs.github.com/
> >
>



-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Reply via email to