Hi, This regex seems to work
*JsonAgent.sinks.Hbase-sink.serializer.regex =[^_]*"(.+).{1},(.+),(.+),(.+).{1}* Remember we were getting the below as ROW (incorrect) beforehand {"rowkey":"eff0bdc7-d6b1-40b5-ad0a-b8181173b806" The first positional column is the ROW_KEY. *We need to strip all except the UUID itself* [^_]*"(.+).{1} means Get rid of everything *from start until and including first quote* and also *get rid of last quote *just getting the ROW_KEY itself eff0bdc7-d6b1-40b5-ad0a-b8181173b806 And also we wanted to *get rid of '}' *from last column in this case the price column (.+).{1} Means get rid of last character Now the search via ROW_KEY works hbase(main):483:0> *get 'trading:MARKETDATAHBASEBATCH', '19735b2e-91b6-4cc8-afcb-f02c00bd52a3'* COLUMN CELL PRICE_INFO:key timestamp=1581883743642, value=19735b2e-91b6-4cc8-afcb-f02c00bd52a3 PRICE_INFO:partition timestamp=1581883743642, value=6 PRICE_INFO:price timestamp=1581883743642, value= "price":108.7 PRICE_INFO:ticker timestamp=1581883743642, value="ticker":"IBM" PRICE_INFO:timeissued timestamp=1581883743642, value= "timeissued":"2020-02-16T20:19:43" PRICE_INFO:timestamp timestamp=1581883743642, value=1581883739646 PRICE_INFO:topic timestamp=1581883743642, value=md 7 row(s) in 0.0040 seconds Hope this helps Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Sun, 16 Feb 2020 at 10:47, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > BTW > > When I turn out headers in the conf fle > > JsonAgent.sinks.Hbase-sink.serializer.depositHeaders=true > > I get > > {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1" > *column=PRICE_INFO:key*, timestamp=1581849565330, > *value=f8a6e006-35bb-4470-9a7b-9273b8aa83f*1 > {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1" > column=PRICE_INFO:partition, timestamp=1581849565330, value=5 > {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1" > column=PRICE_INFO:price, timestamp=1581849565330, value= "price":202.74} > {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1" > column=PRICE_INFO:ticker, timestamp=1581849565330, value="ticker":"IBM" > {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1" > column=PRICE_INFO:timeissued, timestamp=1581849565330, value= > "timeissued":"2020-02-16T10:50:05" > {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1" > column=PRICE_INFO:timestamp, timestamp=1581849565330, value=1581849561330 > {"rowkey":"f8a6e006-35bb-4470-9a7b-9273b8aa83f1" > column=PRICE_INFO:topic, timestamp=1581849565330, value=md > > So it displays the key alright value=f8a6e006-35bb-4470-9a7b-9273b8aa83f1 > > But cannot search on that key! > > hbase(main):333:0> get 'trading:MARKETDATAHBASEBATCH', > 'f8a6e006-35bb-4470-9a7b-9273b8aa83f1' > COLUMN CELL > 0 row(s) in 0.0540 seconds > > > > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Sat, 15 Feb 2020 at 15:12, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> Hi, >> >> I have streaming Kafka that sends data to flume in the following JSON >> format >> >> This is the record is sent via Kafka >> >> 7d645a0f-0386-4405-8af1-7fca908fe928 >> {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928","ticker":"IBM", >> "timeissued":"2020-02-14T20:32:29", "price":140.11} >> >> Note that "7d645a0f-0386-4405-8af1-7fca908fe928" is the key and there are >> 4 columns in value including the key itself as another column. >> >> The Flume configuration file is as follows >> >> # Describing/Configuring the sink >> JsonAgent.channels.hdfs-channel-1.type = memory >> JsonAgent.channels.hdfs-channel-1.capacity = 300 >> JsonAgent.channels.hdfs-channel-1.transactionCapacity = 100 >> *JsonAgent.sinks.Hbase-sink.type = org.apache.flume.sink.hbase.HBaseSink* >> JsonAgent.sinks.Hbase-sink.channel =hdfs-channel-1 >> JsonAgent.sinks.Hbase-sink.table =trading:MARKETDATAHBASEBATCH >> JsonAgent.sinks.Hbase-sink.columnFamily=PRICE_INFO >> >> JsonAgent.sinks.Hbase-sink.serializer=org.apache.flume.sink.hbase.RegexHbaseEventSerializer >> *JsonAgent.sinks.Hbase-sink.serializer.regex =(.+),(.+),(.+),(.+)* >> >> *JsonAgent.sinks.Hbase-sink.serializer.rowKeyIndex = >> 0JsonAgent.sinks.Hbase-sink.serializer.colNames >> =ROW_KEY,ticker,timeissued,price* >> JsonAgent.sinks.Hbase-sink.serializer.regexIgnoreCase = true >> JsonAgent.sinks.Hbase-sink.batchSize =100 >> >> This works and posts records to Hbase as follows: >> >> ROW >> COLUMN+CELL >> {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928" >> column=PRICE_INFO:price, timestamp=1581711715292, value= "price":140.11} >> {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928" >> column=PRICE_INFO:ticker, timestamp=1581711715292, value="ticker":"IBM" >> {"rowkey":"7d645a0f-0386-4405-8af1-7fca908fe928" >> column=PRICE_INFO:timeissued, timestamp=1581711715292, value= >> "timeissued":"2020-02-14T20:32:29" >> 1 row(s) in 0.0050 seconds >> >> However there is a problem. the rowkey value includes redundant >> characters {"rowkey": that do not allow for records to be searched in Hbase >> based on rowkey value! When I try to ignore the redundant characters by >> twicking regex, unfortunately no rows are added to Hbase table. Example as >> follows: >> >> JsonAgent.sinks.Hbase-sink.serializer.regex = (?<=^.{9}).+,(.+),(.+),(.+) >> >> Appreciate any advice. >> >> Thanks, >> >> Mich >> >> >> >> >