Got it. Thanks Jake. Will do. Safdar
On Mon, Jun 25, 2012 at 4:16 PM, Jake Luciani <jak...@gmail.com> wrote: > Hi Sarfar, > > Yes you should make it a multiple. The issue is each shard 'sticks' to a > given node but there is no way to guarantee 5 random keys will equally > distribute across 5 nodes. The idea is eventually they will as you add > more and more keys. So increasing shards at once can make that happen > faster. You can change this parameter and restart the nodes without > affecting your old data. > > If you have more issues raise it on the github issue tab for Solandra. > > -Jake > > On Mon, Jun 25, 2012 at 2:23 AM, Safdar Kureishy < > safdar.kurei...@gmail.com> wrote: > >> Hi Jake, >> >> Thanks. Yes, I forgot to mention also that I had raised the >> solandra.shards.at.once param from 4 to 5 (to match the # of nodes). Should >> I have raised it to 10 or 15 (multiple of 5)? I have added all the >> documents that I needed to the index now. It appears the distribution >> became more even at a later stage, after indexing 12 million Nutch >> documents. The distribution is now 35G / 35G / 56G / 324M / 51G, but there >> is still one node that has a small fraction (i.e 324M) of what the other >> nodes have. In addition, some nodes also have about double the data as >> others (e.g., 56G vs 35G). If you think that increasing >> solandra.shards.at.once param will further improve the distribution, what >> would I need to do to enforce that change when the cluster is running, now >> that all the data has already been added to the index? And on the flip >> side, if the change cannot be made for existing data, what would happen (to >> existing + new data) if the setting was changed and the servers were >> restarted? >> >> Lastly, is there another mailing list I should be using for Solandra >> questions? I couldn't find one.... >> >> Thanks, >> Safdar >> >> >> >> >> >> On Mon, Jun 25, 2012 at 4:16 AM, Jake Luciani <jak...@gmail.com> wrote: >> >>> Hi Safdar, >>> >>> If you want to get better utilization of the cluster raise the >>> solandra.shards.at.once param in solandra.properties >>> >>> -Jake >>> >>> >>> >>> On Sun, Jun 24, 2012 at 11:00 AM, Safdar Kureishy < >>> safdar.kurei...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I've searched online but was unable to find any leads for the problem >>>> below. This mailing list seemed the most appropriate place. Apologies in >>>> advance if that isn't the case. >>>> >>>> I'm running a 5-node Solandra cluster (Solr + Cassandra). I've setup >>>> the nodes with tokens *evenly distributed across the token space*, for >>>> a 5-node cluster (as evidenced below under the "effective-ownership" column >>>> of the "nodetool ring" output). My data is a set of a few million crawled >>>> web pages, crawled using Nutch, and also indexed using the "solrindex" >>>> command available through Nutch. AFAIK, the key for each document generated >>>> from the crawled data is the URL. >>>> >>>> Based on the "load" values for the nodes below, despite adding about 3 >>>> million web pages to this index via the HTTP Rest API (e.g.: >>>> http://9.9.9.x:8983/solandra/index/update....), some nodes are still >>>> "empty". Specifically, nodes 9.9.9.1 and 9.9.9.3 have just a few kilobytes >>>> (shown in *bold* below) of the index, while the remaining 3 nodes are >>>> consistently getting hammered by all the data. If the RandomPartioner >>>> (which is what I'm using for this cluster) is supposed to achieve an even >>>> distribution of keys across the token space, why is it that the data below >>>> is skewed in this fashion? Literally, no key was yet been hashed to the >>>> nodes 9.9.9.1 and 9.9.9.3 below. Could someone possibly shed some light on >>>> this absurdity?. >>>> >>>> [me@hm1 solandra-app]$ bin/nodetool -h hm1 ring >>>> Address DC Rack Status State Load >>>> Effective-Owership Token >>>> >>>> 136112946768375385385349842972707284580 >>>> 9.9.9.0 datacenter1 rack1 Up Normal 7.57 GB >>>> 20.00% 0 >>>> 9.9.9.1 datacenter1 rack1 Up Normal *21.44 KB* >>>> 20.00% 34028236692093846346337460743176821145 >>>> 9.9.9.2 datacenter1 rack1 Up Normal 14.99 GB >>>> 20.00% 68056473384187692692674921486353642290 >>>> 9.9.9.3 datacenter1 rack1 Up Normal *50.79 KB* >>>> 20.00% 102084710076281539039012382229530463435 >>>> 9.9.9.4 datacenter1 rack1 Up Normal 15.22 GB >>>> 20.00% 136112946768375385385349842972707284580 >>>> >>>> Thanks in advance. >>>> >>>> Regards, >>>> Safdar >>>> >>> >>> >>> >>> -- >>> http://twitter.com/tjake >>> >> >> > > > -- > http://twitter.com/tjake >