Paul Eggert wrote: > > The module is, so far, not optimized for asymptotic performance. > > How many temporary files does 'sort' use? Is more than 10000 temp files > > realistic (each of size > 10 MB, giving a total of 100 GB) ? > > Yes, I'm afraid it is. People are sorting larger files these days, > and GNU sort doesn't always pick good sizes for temporaries.
OK, I'll rewrite the datastructures to have average O(1) insertion and removal cost. > The function names use the verbs "enqueue" and "dequeue", not the noun > "queue". So we don't need a noun; we need two verbs. How about > "register" and "unregister"? Yes, thanks. I'll use these function names. > After looking at the code a bit more, I have another comment -- more a > question, really. Are all those "volatile"s really needed? > > The C and POSIX standards are basically useless when it comes to > "volatile", so I was wondering what informal rules of thumb you were > using when programming, when deciding when to use "volatile" and when > not to. The rule of thumb is: If a data field can be written by the main program and read by the signal handler, or vice versa, it needs to be marked as 'volatile'. Data fields which are read-only from the moment they are entered into the data structure (such as the contents of strings) are not marked 'volatile'; I hope 'volatile' is not needed here. 1) The practical reason is that while developing libsigsegv, I noticed that C does not have the notion of a "barrier": In code like this: set_up_datastructures(); /* HERE I would like a barrier */ signal(sig,my_signal_handler); I would like a barrier to ensure that all memory stores of set_up_datastructures() have been performed before the signal handler is activated. __asm__("" : : : "memory"); is not portable (works only with GCC). Calls to an empty global function are optimized away by GCC 4. For a concrete example, look at libsigsegv/sigsegv2.c: static volatile unsigned int logcount = 0; static void barrier () { } ... /* This access should call the handler. */ ((volatile int *)area2)[230] = 22; /* This access should call the handler. */ ((volatile int *)area3)[412] = 33; /* This access should not give a signal. */ ((volatile int *)area2)[135] = 22; /* This access should call the handler. */ ((volatile int *)area1)[612] = 11; barrier(); /* Check that the handler was called three times. */ if (logcount != 3) exit (1); If I omit the 'volatile' markers, gcc's scheduler will on some platforms move the access to the 'logcount' variable to before the four memory accesses. But at that moment its value is still 0 (since it's incremented by the signal handler). 2) Another example is when a new list element is added to a list that is accessed from the signal handler. If the compiler would reorder the memory stores such that the list element is added to the list before it is fully initialized, the signal handler would likely crash. Similarly here: tmpdir->file[tmpdir->file_count] = xstrdup (absolute_file_name); tmpdir->file_count++; 3) The theoretical reason is that in ISO C, section 5.1.2.3.(5) guarantees to me that memory stores have been performed at sequence points only if I mark them 'volatile'. (I don't understand what 5.1.2.3.(4) is talking about since GCC 4 takes the freedom to reorder memory loads and stores arbitrarily across sequence points, if it can prove that there is no aliasing.) > I still don't understand "volatile", to be honest. To me, "volatile" means: "Perform the memory access here and now, without merging several memory accesses, without changing the memory word size width, and without reordering." On multiprocessor systems with CPUs which don't have automatic memory bus synchronization (bus watching logic), it probably means that the compiler needs to emit instructions for explicit synchronization among the CPUs. ISO C++ says: "volatile is a hint to the implementation to avoid aggressive optimization involving the object because the value of the object might be changed by means undetectable by an implementation. See _intro.execution_ for detailed semantics. In general, the semantics of volatile are intended to be the same in C++ as they are in C." Bruno