Hi and thank you for the reply. My comment follow: > Your tests are not close to what you are going to have in production My tests are exactly what we will have in production. 1 node or in best case 2 nodes. We don't care about durability here. Our request per second will be extremely low too(300 per day ?) > IMHO, here are few recommendations: > > 1. Build a cluster with at least 5 nodes with N=3 and R=W=2 (You can > update your bucket properties via PBC with Java) > 2. Use PBC instead of HTTP. Hmm i ran some tests that showed that PBC is slower. Keep in mind that the import script is working on the same node as riak . Also we use 2i indexes. The docs say that 2i is emulated using PBC (Whatever that means). Dunno if this is a problem .
Secondary Indexes (emulated, native) ✓✗ I will try this tho... > 3. If you are only importing data call > .store()....withoutFetch().execute() to avoid unnecessary roundtrips. I already do this. The problem goes deeper as i am not only inserting but also updating some keys. I.e. : 1.Fetch 2.Merge results 3.Store back > > If you test using unrealistic scenarios you will find unpleasant > surprises when you are about to be go live so better to set your > expectations right at the beginning. > > HTH, > > Guido. > > On 29/10/13 14:59, Georgi Ivanov wrote: > > Hello, > > I am importing some big data to Riak. > > I am importing like 10GB per day and i have to import one year of data. > > The task is to speed up the initial import. After that i will import on > > daily basis, so the speed is not very important. > > > > I am using JAVA HTTP client. So far my test show that the fastest setup is > > to use n_val 1 and import to single server. > > > > I tested importing on 2 servers (with n_val:2), but it is actually slower. > > My JAVA client is multi-threaded. > > > > My idea is to use n_val:1 on single node, then increase the n_val:2 and > > add > > one more node to the cluster. The problem is that i don't see the storage > > to grow when i change n_val : 2 > > I was looking at Riak Active Anti-Entropy feature and i am expecting my > > storage to grow after i increase the n_val. Unfortunately this is not the > > case or i don't understand AAE feature .... > > I can't any changes in storage size at all. I don't want to go in > > direction of force repair as it would take forever. > > > > Can anyone shed some light on AAE ? Or any tips for speeding up the import > > in general. > > > > To summarize the situation : > > 1. One Riak node with n_val : 1 , eLevelDb as back-end > > 2. Import data. > > 3. Change n_val to 2 > > 4. Join one more node to the cluster. > > > > What i expect to happen : > > To have all the keys distributed to 2 riak nodes with n_val:2 > > So if i had 1TB of data on node1 with n_val:1 , after changing to n_val 2 > > and joining one more node, to have 1TB of data on each node. > > > > > > _______________________________________________ > > riak-users mailing list > > riak-users@lists.basho.com > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com