Re: Does spark utilize the sorted order of hbase keys, when using hbase as data source

2015-04-07 Thread Ted Yu
Then splitting according to user id's is out of the question :-) On Tue, Apr 7, 2015 at 8:12 AM, Юра wrote: > There are 500 millions distinct users... > > 2015-04-07 17:45 GMT+03:00 Ted Yu : > >> How many distinct users are stored in HBase ? >> >> TableInputFormat produces splits where number of

Re: Does spark utilize the sorted order of hbase keys, when using hbase as data source

2015-04-07 Thread Юра
There are 500 millions distinct users... 2015-04-07 17:45 GMT+03:00 Ted Yu : > How many distinct users are stored in HBase ? > > TableInputFormat produces splits where number of splits matches the number > of regions in a table. You can write your own InputFormat which splits > according to user

Re: Does spark utilize the sorted order of hbase keys, when using hbase as data source

2015-04-07 Thread Ted Yu
How many distinct users are stored in HBase ? TableInputFormat produces splits where number of splits matches the number of regions in a table. You can write your own InputFormat which splits according to user id. FYI On Tue, Apr 7, 2015 at 7:36 AM, Юра wrote: > Hello, guys! > > I am a newbie