Thanks Ted.

I think the fix you mentioned is this one HBASE-12078
<https://issues.apache.org/jira/browse/HBASE-12078>.

Not sure when our Hadoop admin would upgrade it, ahhh....

Jianshi

On Thu, Nov 13, 2014 at 11:15 PM, Ted Yu <[email protected]> wrote:

> Keep in mind that Prefix Tree encoding has higher overhead in write path
> compared to other data block encoding methods.
>
> Please use 0.98.7 which has the latest fixes for Prefix Tree encoding.
>
> Cheers
>
> On Thu, Nov 13, 2014 at 1:27 AM, Jianshi Huang <[email protected]>
> wrote:
>
> > Thanks Ram,
> >
> > How about Prefix Tree based encoding then? HBASE-4676
> > <https://issues.apache.org/jira/browse/HBASE-4676> says it's also
> possible
> > to do suffix tries? Then it could be a nice fit for JSON String (or any
> > long value where changes are small).
> >
> > Maybe I should just flatten JSON to columns, hmm...what's the overhead
> for
> > a column?
> >
> > Jianshi
> >
> > On Thu, Nov 13, 2014 at 4:49 PM, ramkrishna vasudevan <
> > [email protected]> wrote:
> >
> > > >>So is it possible to specify FASTDIFF for rowkey/column and DIFF for
> > > value
> > > cell?
> > > No that is not possible now. All the encoding is per KV only.
> > > But what you say is definitely worth trying.
> > >
> > > >>So would you recommend storing JSON flattened as many columns?
> > > May be yes.  But I have practically not used JSON formats so I may not
> be
> > > the best person to comment on this.
> > >
> > > Regards
> > > Ram
> > >
> > > On Thu, Nov 13, 2014 at 2:01 PM, Jianshi Huang <
> [email protected]>
> > > wrote:
> > >
> > > > Thanks Ram,
> > > >
> > > > So is it possible to specify FASTDIFF for rowkey/column and DIFF for
> > > value
> > > > cell?
> > > >
> > > > So would you recommend storing JSON flattened as many columns?
> > > >
> > > > Jianshi
> > > >
> > > > On Thu, Nov 13, 2014 at 2:08 PM, ramkrishna vasudevan <
> > > > [email protected]> wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > >> Since I'm storing
> > > > > historical data (snapshot data) and changes between adjacent value
> > > cells
> > > > > are relatively small.
> > > > >
> > > > > If the values are changing even if it is smaller the FASTDIFF will
> > > > rewrite
> > > > > the value part.  Only if there are exact matches then it would skip
> > the
> > > > > value part. JFYI.
> > > > >
> > > > > Regards
> > > > > Ram
> > > > >
> > > > > On Thu, Nov 13, 2014 at 11:23 AM, Jianshi Huang <
> > > [email protected]
> > > > >
> > > > > wrote:
> > > > >
> > > > > > I thought FASTDIFF was only for rowkey and columns, great if it
> > also
> > > > > works
> > > > > > in value cell.
> > > > > >
> > > > > > And thanks for the bjson link!
> > > > > >
> > > > > > Jianshi
> > > > > >
> > > > > > On Thu, Nov 13, 2014 at 1:18 PM, Ted Yu <[email protected]>
> > wrote:
> > > > > >
> > > > > > > There is FASTDIFF data block encoding.
> > > > > > >
> > > > > > > See also http://bjson.org/
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > > On Nov 12, 2014, at 9:08 PM, Jianshi Huang <
> > > [email protected]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I'm currently saving JSON in pure String format in the value
> > cell
> > > > and
> > > > > > > > depends on HBase' block compression to reduce the overhead of
> > > JSON.
> > > > > > > >
> > > > > > > > I'm wondering if there's a more space efficient way to store
> > > JSON?
> > > > > > > > (there're lots of 0s and 1s, JSON String actually is an OK
> > > format)
> > > > > > > >
> > > > > > > > I want to keep the value as a Map since the schema of source
> > data
> > > > > might
> > > > > > > > change over time.
> > > > > > > >
> > > > > > > > Also is there a DIFF based encoding for values? Since I'm
> > storing
> > > > > > > > historical data (snapshot data) and changes between adjacent
> > > value
> > > > > > cells
> > > > > > > > are relatively small.
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > --
> > > > > > > > Jianshi Huang
> > > > > > > >
> > > > > > > > LinkedIn: jianshi
> > > > > > > > Twitter: @jshuang
> > > > > > > > Github & Blog: http://huangjs.github.com/
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Jianshi Huang
> > > > > >
> > > > > > LinkedIn: jianshi
> > > > > > Twitter: @jshuang
> > > > > > Github & Blog: http://huangjs.github.com/
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Jianshi Huang
> > > >
> > > > LinkedIn: jianshi
> > > > Twitter: @jshuang
> > > > Github & Blog: http://huangjs.github.com/
> > > >
> > >
> >
> >
> >
> > --
> > Jianshi Huang
> >
> > LinkedIn: jianshi
> > Twitter: @jshuang
> > Github & Blog: http://huangjs.github.com/
> >
>



-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Reply via email to