Andrey, thanks. You are right that I am using Thrift v1. I was following example under : hbase-examples/src/main/cpp/DemoClient.cpp. It looks like pretty old, and actually its scan example:
> scanner = client.scannerOpenWithStop(t, "00020", "00040", columnNames, > dummyAttributes); > doesn't work. I googled a bit, and it looks like HBase recommend Thrift2 now? Demai On Mon, Mar 9, 2015 at 3:41 PM, Andrey Stepachev <oct...@gmail.com> wrote: > Sorry Demai, I have no access to that code currently. > > But what you described seems that you use > thrift v1. I'd recommend to use thrift2. > > Also it is a good idea to check thrift server configuration: > 1. blocking/nonblocking/hsha, and framed or not > 2. size of thread pool > > > > On Mon, Mar 9, 2015 at 9:26 PM, Demai Ni <nid...@gmail.com> wrote: > > > Andrey and all, > > > > thanks for the input. Andrey, if possible, do you mind share your code > > segment so I can follow the setting on your side? > > > > I have exactly the same thought when face the result first time. I was > > expecting a little bit performance issue (10~20%) when using Thrift(C++), > > and not as much. > > > > Now I am looking into the C++ api call. Original, I used > > "client.scannerGet(value, scanner)" ,which will do a lot of prepare > > work(like flush) for each call. I just changed the code to use > > "client.scannerGetList(value,scanner, 10000);". Sure enough, the > > performance improved. However, for a similiar comparison, I did set java > > client to 10000 batch/cache. Here is the new code: > > > > > *C++* > > > TScan tscan; > > > int scanner = client.scannerOpenWithScan(t, tscan, > dummyAttributes); > > > int count = 0; > > > try { > > > while (true) { > > > std::vector<TRowResult> value; > > > > > > client.scannerGetList(value,scanner, *10000*); > > > if (value.size() == 0) { > > > break; > > > } else count+=value.size(); > > > } > > > > > > > *Java * > > int total = 0; > > > > scan = new Scan(); > > > > * scan.setCaching(10000); scan.setBatch(10000);* > > resScanner = table.getScanner(scan); > > int count = 0; > > for (Result res: resScanner) { > > count ++; > > } > > > > so both client code improved as expected, and the Thrift C++ still take > 3X > > time comparing to Java: > > C++ : real 6m46.845s, user 1m59.636s, sys 0m11.984s > > Java: real 2m27.245s, user 0m17.624s, sys 0m4.779s > > > > To be fair, I am able to setCaching on Java Client, but didn't find a way > > to do the same through the C++ API, which also make some difference > > > > Demai > > > > > > On Sun, Mar 8, 2015 at 1:40 PM, Andrey Stepachev <oct...@gmail.com> > wrote: > > > > > Hi Demai. > > > > > > Thats seems odd for me, in my tests I got very similar performance. > > > I'd like to suggest to check that scans have identical parameters > > > (cache size in particular). That can bring very different performance > > > in you case. > > > > > > Thanks. > > > > > > On Sun, Mar 8, 2015 at 6:50 PM, Mike Axiak <m...@axiak.net> wrote: > > > > > > > If you're going the JNI route, the best bet is to embed a VM in your > C > > > > project. You use "java -s -p" to create the required header files and > > > > compile linking against the java library. This article talks about > > > > how to talk from C to Java: > > > > > > > > > > > > > > http://www.codeproject.com/Articles/22881/How-to-Call-Java-Functions-from-C-Using-JNI > > > > > > > > Best, > > > > Mike > > > > > > > > On Sun, Mar 8, 2015 at 10:29 AM, Michael Segel > > > > <michael_se...@hotmail.com> wrote: > > > > > JNI example? > > > > > > > > > > I don’t have one… my client’s own the code so I can’t take it with > me > > > > and share. > > > > > (The joys of being a consultant means you can’t take it with you > and > > > you > > > > need to make sure you don’t xfer IP accidentally. ) > > > > > > > > > > > > > > > Maybe in one of the HBase books? Or just google for a JNI example > on > > > the > > > > web since its straight forward Java code to connect to HBase and then > > > > straight JNI t talk to C/C++ > > > > > > > > > > > > > > >> On Mar 7, 2015, at 5:56 PM, Demai Ni <nid...@gmail.com> wrote: > > > > >> > > > > >> Nick, thanks. I will give REST a try. However, if it use the same > > > > design, > > > > >> the result probably will be the same. > > > > >> > > > > >> Michael, I was thinking about the same thing through JNI. Is there > > an > > > > >> example I can follow? > > > > >> > > > > >> Mike (Axiak), I run the C++ client on the same linux machine as > the > > > > hbase > > > > >> and thrift. The HBase uses ip 127.0.0.1 and thrift uses 0.0.0.0. > It > > > > doesn't > > > > >> make a difference, does it? > > > > >> > > > > >> Anyway, considering Thrift will get the scan result from HBase > > first, > > > > then > > > > >> my c++ client the same data from Thrift. It definitely > > cost(probably) > > > > >> double the time/cpu. So JNI may be the right way to go. Is there > an > > > > example > > > > >> I can use? thanks > > > > >> > > > > >> Demai > > > > >> > > > > >> On Sat, Mar 7, 2015 at 1:54 PM, Mike Axiak <m...@axiak.net> > wrote: > > > > >> > > > > >>> What if you install the thrift server locally on every C++ client > > > > >>> machine? I'd imagine performance should be similar to native java > > > > >>> performance at that point. > > > > >>> > > > > >>> -Mike > > > > >>> > > > > >>> On Sat, Mar 7, 2015 at 4:49 PM, Michael Segel < > > > > michael_se...@hotmail.com> > > > > >>> wrote: > > > > >>>> Or you could try a java connection wrapped by JNI so you can > call > > it > > > > >>> from your C++ app. > > > > >>>> > > > > >>>>> On Mar 7, 2015, at 1:00 PM, Nick Dimiduk <ndimi...@gmail.com> > > > wrote: > > > > >>>>> > > > > >>>>> You can try the REST gateway, though it has the same basic > > > > architecture > > > > >>> as > > > > >>>>> the thrift gateway. May be the details work out in your favor > > over > > > > rest. > > > > >>>>> > > > > >>>>> On Fri, Mar 6, 2015 at 11:31 PM, nidmgg <nid...@gmail.com> > > wrote: > > > > >>>>> > > > > >>>>>> Stack, > > > > >>>>>> > > > > >>>>>> Thanks for the quick response. Well, the extra layer really > kill > > > the > > > > >>>>>> Performance. The 'hop' is so expensive > > > > >>>>>> > > > > >>>>>> Is there another C/C++ api to try out? I saw there is a jira > > > > >>> Hbase-1015, > > > > >>>>>> but was inactive for a while. > > > > >>>>>> > > > > >>>>>> Demai > > > > >>>>>> > > > > >>>>>> Stack <st...@duboce.net> wrote: > > > > >>>>>> > > > > >>>>>>> Is it because of the 'hop'? Java goes against RS. The thrift > > C++ > > > > >>> goes to > > > > >>>>>> a > > > > >>>>>>> thriftserver which hosts a java client and then it goes to > the > > > RS? > > > > >>>>>>> St.Ack > > > > >>>>>>> > > > > >>>>>>> On Fri, Mar 6, 2015 at 4:46 PM, Demai Ni <nid...@gmail.com> > > > wrote: > > > > >>>>>>> > > > > >>>>>>>> hi, guys, > > > > >>>>>>>> > > > > >>>>>>>> I am trying to get a rough idea about the performance > > comparison > > > > >>> between > > > > >>>>>>>> c++ and java client when access HBase table, and is > surprised > > to > > > > find > > > > >>>>>> out > > > > >>>>>>>> that Thrift (c++) is 4X slower > > > > >>>>>>>> > > > > >>>>>>>> The performance result is: > > > > >>>>>>>> C++: real *16m11.313s*; user 5m3.642s; sys > 2m21.388s > > > > >>>>>>>> Java: real *4m6.012s*;user 0m31.228s; sys 0m8.018s > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> I have a single node HBase(98.6) cluster, with 1X TPCH > loaded, > > > and > > > > >>> use > > > > >>>>>> the > > > > >>>>>>>> largest table : lineitem, which has 6M rows, roughly 600MB > > data. > > > > >>>>>>>> > > > > >>>>>>>> For c++ client, I used the thrift example provided by > > > > hbase-examples, > > > > >>>>>> the > > > > >>>>>>>> C++ code looks like: > > > > >>>>>>>> > > > > >>>>>>>>> std::string t("lineitem"); > > > > >>>>>>>>> int scanner = client.scannerOpenWithScan(t, tscan, > > > > >>> dummyAttributes); > > > > >>>>>>>>> int count = 0; > > > > >>>>>>>>> .. > > > > >>>>>>>>> while (true) { > > > > >>>>>>>>> std::vector<TRowResult> value; > > > > >>>>>>>>> client.scannerGet(value, scanner); > > > > >>>>>>>>> if (value.size() == 0) break; > > > > >>>>>>>>> count ++; > > > > >>>>>>>>> } > > > > >>>>>>>>> > > > > >>>>>>>>> std::cout << count << " rows scanned"<< std::endl; > > > > >>>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> For java client is the most simple one: > > > > >>>>>>>> > > > > >>>>>>>>> HTable table = new HTable(conf,"lineitem"); > > > > >>>>>>>>> > > > > >>>>>>>>> Scan scan = new Scan(); > > > > >>>>>>>>> ResultScanner resScanner; > > > > >>>>>>>>> resScanner = table.getScanner(scan); > > > > >>>>>>>>> int count = 0; > > > > >>>>>>>>> for (Result res: resScanner) { > > > > >>>>>>>>> count ++; > > > > >>>>>>>>> } > > > > >>>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> Since most of the time should be on I/O, I don't expect any > > > > >>> significant > > > > >>>>>>>> difference between Thrift(C++) and Java. Any ideas? Many > > thanks > > > > >>>>>>>> > > > > >>>>>>>> Demai > > > > >>>>>>>> > > > > >>>>>> > > > > >>>> > > > > >>>> The opinions expressed here are mine, while they may reflect a > > > > cognitive > > > > >>> thought, that is purely accidental. > > > > >>>> Use at your own risk. > > > > >>>> Michael Segel > > > > >>>> michael_segel (AT) hotmail.com > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>> > > > > > > > > > > The opinions expressed here are mine, while they may reflect a > > > cognitive > > > > thought, that is purely accidental. > > > > > Use at your own risk. > > > > > Michael Segel > > > > > michael_segel (AT) hotmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Andrey. > > > > > > > > > -- > Andrey. >