On July 17, 2013 08:45:01 AM Kelly McLaughlin wrote:
> Matthew,
> 
> I find it really surprising that you don't see any difference in behavior
> when you set delete_mode to keep. I think it would be helpful if you could
> outline your specific setup and give the steps to reproduce what you're
> seeing to be able to make a determination if this represents a bug or not.
> Thanks.
> 
> Kelly
Hi Kelly,

Sure, no problem.  Hardware wise, I have:
 - An AMD Phenom II X6 Desktop with 16G memory, and a HDD with an SSD cache.
 - An Intel Ivy Bridge Dual Core (+HT) Laptop with 16G memory and SSD.
Both have lots of free memory + disk space for running my tests, and my 
Desktop never seems to be IO bound.  Both machines are connected over Ethernet 
on the same LAN.

On top of that hardware, both are running two instances of Riak each, all 
forming one 4 node cluster.  I'm using the default ring size of 64.  I've also 
upgraded all the nodes to the latest release, 1.4, using the 1.4 tag from Git.  
I'm not using this to seriously benchmark Riak, so I don't think this setup 
should cause any issues.  I'm also going to setup a really cluster for 
production use, so ring size is not a concern.
Each Riak instance uses LevelDB as the datastore, Riak Search is disabled.  
I'm using Riak's PB API for access, and I've bumped up the backlog parameter 
to 1024 for now.  Originally my program would connect to a single node, but 
recently I've been playing with HAProxy locally, and now I use that to connect 
to all four instances.  The problem existed before I implemented HAProxy.  
Riak Control is also enabled on one node per computer.

For my application, it effectively stores in Riak two pieces of information.  
First it stores a list of keys associated with an object, and then stores an 
individual item at each key.  I limit the number of keys to 10000 per object.

For my test suite, I automatically clean up after each test by listing all the 
keys associated with a bucket, and then delete each key individually.  I only 
store items in two buckets, so this cleans the slate before each run.

The test that has the high chance of failing is testing how the system deals 
with inserting 10000 items against one object.  The key list remains below 1M.  
Occasionally I see other tests fail, but I think this one fails more often as 
it stresses the entire system the most.  If I stop the automatic cleanup, the 
not found key is also not findable by Curl either.

Before posting, I would delete and insert keys, without using a vclock.  I had 
figured this was safe as I ran with allow_mult=true on both buckets, and I 
implemented conflict resolution first.  As suggested on this list, I now have 
the 10000 item test suite use vclocks from start to finish.  However, I still 
see this behaviour.

I've attached a program (written in go as that is what I'm using) to this 
email which triggers the behaviour.  As far as I understand Riak, it is 
properly fetching vclocks whenever possible.  The library I'm using (located 
at: github.com/tpjg/goriakpbc ) was just recently updated to ensure that 
vclocks are fetched, even if the item is deleted.  I am using an up to date 
version of the library.  The program is acts similarly to my app, but paired 
down as far as possible.  Note that this behaviour is unpredictable, and this 
program will sometimes execute fine.
I only tested this program against the default delete_mode setting.  Also, 
using HAProxy seems to trigger the issue far more readily, but it happens fine 
without it.


If there is any other information I can provide to help, let me know.

Thanks,
-- 
Matthew
package main

import riak "github.com/tpjg/goriakpbc"
import "fmt"
import "strconv"
import "sync"

func setupBucket(cli *riak.Client, bucketName string) error {
	bucket, err := cli.NewBucket(bucketName)
	if err != nil {
		return err
	}
	err = bucket.SetAllowMult(true)
	if err != nil {
		return err
	}
	return nil
}

const do_keys = 10000

func main() {
	// Connect + setup bucket
	con := riak.NewClientPool("localhost:9000", 100)
	err := con.Connect()
	if err != nil {
		panic(err)
	}
	bucket, err := con.NewBucket("test_bucket_no_one_has")
	if err != nil {
		panic(err)
	}
	err = bucket.SetAllowMult(true)
	if err != nil {
		panic(err)
	}
	
	// Ok, first insert 10000 items.
	wg := sync.WaitGroup{}
	wg.Add(do_keys)
	for i := 0; i < do_keys; i++ {
		go func(i int) {
			defer wg.Done()
			item, err := bucket.Get(strconv.Itoa(i))
			if item == nil {
				panic(err)
			}
			item.Data = []byte("ASDF")
			err = item.Store()
			if err != nil {
				panic(err)
			}
		}(i)
	}
	wg.Wait()
	fmt.Println("Done insert")
	
	// Verify items exist
	wg.Add(do_keys)
	for i := 0; i < do_keys; i++ {
		go func(i int) {
			defer wg.Done()
			_, err := bucket.Get(strconv.Itoa(i))
			if err != nil {
				fmt.Printf("Failed to fetch item %v err %s\n", i, err)
			}
		}(i)
	}
	wg.Wait()
	fmt.Println("Done fetch")
	
	// And Delete
	keys, err := bucket.ListKeys()
	if err != nil {
		panic(err)
	}
	wg.Add(len(keys))
	for _, key := range keys {
		go func(key string) {
			defer wg.Done()
			obj, err := bucket.Get(string(key))
			if obj == nil {
				panic(err)
			}
			err = obj.Destroy()
			if err != nil {
				panic(err)
			}
		}(string(key))
	}
	wg.Wait()
	fmt.Println("Done Delete")
	
	fmt.Println("DONE")
}

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to