10x for your comments. Since this is part of my thesis I do have to define the experiments properly. At this stage, however, I am trying to ascertain what are the tools and environment variables I need to take into account. So I am trying to isolate the query from the OS influences (and other background writers in postgreSQL itself) first as much as possible, and then explain precisely what influences I can't change. I am trying to understand the mechanisms of the environment that would influence my experiments and ultimately document these and others for future works in the area. Its invaluable to future researchers that will have to do the same to not do the same work twice.
I was googling for more info about circumventing the kupdated and its 2.2,2.6 counterparts and it seems that there is quite a bit complaining that O_SYNC or O_DIRECT don't do the trick but I guess I'll have to try them to see for myself. Regards, tzahi. > -----Original Message----- > From: guy keren [mailto:[EMAIL PROTECTED] > Sent: Thursday, January 13, 2005 11:37 PM > To: Tzahi Fadida > Cc: 'Muli Ben-Yehuda'; linux-il@cs.huji.ac.il > Subject: RE: Getting io statistics on processes. > > > > On Thu, 13 Jan 2005, Tzahi Fadida wrote: > > > I am implementing a new sql operator algorithm called > > full-disjunction. I need to experiment with different environment > > conditions for it and different cost considerations. I can estimate > > the io cost by counting the operations of reading and > writing tuples > > from the database but it would be very unaccurate. I am working on > > postgreSQL and it turns out it only guesstimate the io cost of a > > query. Just a few prob: > > - postgreSQL is MVCC and thus the tables could be larger than they > > actually contain visible tuples. > > - the tuples could occupy 4 times their size in the disk. > > - what about caching, how does it helps, detters. > > - debugging, to see I am not double reading. > > and the list goes on... > > PostgreSQL can compile In windows where at least on the > surface using > > the taskmanager related libraries I can access a process read/write > > io. however, I can never know whats with the background > writer and it > > has probably the same issues as in linux with counting those. > > I am better off with linux where I can prove its not doing > anything more > > than it claims. > > you should still define things in a more precise manner (not > for us - for your work to be meaningfull). > > then you should define the manner of measuring things. > > run it before the changes, run it after the changes, and see > the difference. > > what i've learned from my attempts at performance measurment, > is that it is very hard to define the proper experiments that > will lead to results that re-create "real life usage" of applications. > > just some examples: > > you have cache. you need to decide - should i measure when > the cache is empty (i.e. running the test right after a > reboot), or after the cache is populated with valid data > (i.e. run the test several times, and only then running it > once more while emasuring). > > and cache is just one example - you'll probably have other > things too (e.g. running a single transaction at a time or > several? how large sohuld the database tables get? large > enough to overflow the cache, or small enough to fit into the > page cache and thus avoid any real hard-disk access? should i > perform the tests several times with different ammounts of > RAM? with different CPU types? with different types of > storage arrays (which would make a tremendous difference, if > your test is I/O bound)... > > -- > guy > > "For world domination - press 1, > or dial 0, and please hold, for the creator." -- nob o. dy > > ================================================================= To unsubscribe, send mail to [EMAIL PROTECTED] with the word "unsubscribe" in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]