On July 17, 2013 08:45:01 AM Kelly McLaughlin wrote: > Matthew, > > I find it really surprising that you don't see any difference in behavior > when you set delete_mode to keep. I think it would be helpful if you could > outline your specific setup and give the steps to reproduce what you're > seeing to be able to make a determination if this represents a bug or not. > Thanks. > > Kelly Hi Kelly,
Sure, no problem. Hardware wise, I have: - An AMD Phenom II X6 Desktop with 16G memory, and a HDD with an SSD cache. - An Intel Ivy Bridge Dual Core (+HT) Laptop with 16G memory and SSD. Both have lots of free memory + disk space for running my tests, and my Desktop never seems to be IO bound. Both machines are connected over Ethernet on the same LAN. On top of that hardware, both are running two instances of Riak each, all forming one 4 node cluster. I'm using the default ring size of 64. I've also upgraded all the nodes to the latest release, 1.4, using the 1.4 tag from Git. I'm not using this to seriously benchmark Riak, so I don't think this setup should cause any issues. I'm also going to setup a really cluster for production use, so ring size is not a concern. Each Riak instance uses LevelDB as the datastore, Riak Search is disabled. I'm using Riak's PB API for access, and I've bumped up the backlog parameter to 1024 for now. Originally my program would connect to a single node, but recently I've been playing with HAProxy locally, and now I use that to connect to all four instances. The problem existed before I implemented HAProxy. Riak Control is also enabled on one node per computer. For my application, it effectively stores in Riak two pieces of information. First it stores a list of keys associated with an object, and then stores an individual item at each key. I limit the number of keys to 10000 per object. For my test suite, I automatically clean up after each test by listing all the keys associated with a bucket, and then delete each key individually. I only store items in two buckets, so this cleans the slate before each run. The test that has the high chance of failing is testing how the system deals with inserting 10000 items against one object. The key list remains below 1M. Occasionally I see other tests fail, but I think this one fails more often as it stresses the entire system the most. If I stop the automatic cleanup, the not found key is also not findable by Curl either. Before posting, I would delete and insert keys, without using a vclock. I had figured this was safe as I ran with allow_mult=true on both buckets, and I implemented conflict resolution first. As suggested on this list, I now have the 10000 item test suite use vclocks from start to finish. However, I still see this behaviour. I've attached a program (written in go as that is what I'm using) to this email which triggers the behaviour. As far as I understand Riak, it is properly fetching vclocks whenever possible. The library I'm using (located at: github.com/tpjg/goriakpbc ) was just recently updated to ensure that vclocks are fetched, even if the item is deleted. I am using an up to date version of the library. The program is acts similarly to my app, but paired down as far as possible. Note that this behaviour is unpredictable, and this program will sometimes execute fine. I only tested this program against the default delete_mode setting. Also, using HAProxy seems to trigger the issue far more readily, but it happens fine without it. If there is any other information I can provide to help, let me know. Thanks, -- Matthew
package main import riak "github.com/tpjg/goriakpbc" import "fmt" import "strconv" import "sync" func setupBucket(cli *riak.Client, bucketName string) error { bucket, err := cli.NewBucket(bucketName) if err != nil { return err } err = bucket.SetAllowMult(true) if err != nil { return err } return nil } const do_keys = 10000 func main() { // Connect + setup bucket con := riak.NewClientPool("localhost:9000", 100) err := con.Connect() if err != nil { panic(err) } bucket, err := con.NewBucket("test_bucket_no_one_has") if err != nil { panic(err) } err = bucket.SetAllowMult(true) if err != nil { panic(err) } // Ok, first insert 10000 items. wg := sync.WaitGroup{} wg.Add(do_keys) for i := 0; i < do_keys; i++ { go func(i int) { defer wg.Done() item, err := bucket.Get(strconv.Itoa(i)) if item == nil { panic(err) } item.Data = []byte("ASDF") err = item.Store() if err != nil { panic(err) } }(i) } wg.Wait() fmt.Println("Done insert") // Verify items exist wg.Add(do_keys) for i := 0; i < do_keys; i++ { go func(i int) { defer wg.Done() _, err := bucket.Get(strconv.Itoa(i)) if err != nil { fmt.Printf("Failed to fetch item %v err %s\n", i, err) } }(i) } wg.Wait() fmt.Println("Done fetch") // And Delete keys, err := bucket.ListKeys() if err != nil { panic(err) } wg.Add(len(keys)) for _, key := range keys { go func(key string) { defer wg.Done() obj, err := bucket.Get(string(key)) if obj == nil { panic(err) } err = obj.Destroy() if err != nil { panic(err) } }(string(key)) } wg.Wait() fmt.Println("Done Delete") fmt.Println("DONE") }
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com