On Friday 17 February 2012 06:28 am, David Xu wrote:
On 2012/2/17 16:06, Julian Elischer wrote:
On 2/16/12 11:41 PM, Julian Elischer wrote:
adding jkim as he seems to be the last person working with TSC.
On 2/16/12 6:42 PM, David Xu wrote:
On 2012/2/17 10:19, Julian Elischer wrote:
On 2/16/12 5:56 PM, David Xu wrote:
On 2012/2/17 8:42, Julian Elischer wrote:
Adding David Xu for his thoughts since he reqrote the code
in quesiton in revision 213098
On 2/16/12 2:57 PM, Julian Elischer wrote:
On 2/16/12 1:06 PM, Julian Elischer wrote:
On 2/16/12 9:34 AM, Andriy Gapon wrote:
on 15/02/2012 23:41 Julian Elischer said the following:
The program fio (an IO test in ports) uses pthreads
the following code (from fio-2.0.3, but its in earlier
code too) has suddenly started misbehaving.
clock_gettime(CLOCK_REALTIME,&t);
t.tv_sec += seconds + 10;
pthread_mutex_lock(&mutex->lock);
while (!mutex->value&& !ret) {
mutex->waiters++;
ret =
pthread_cond_timedwait(&mutex->cond,&mutex->lock,&t);
mutex->waiters--;
}
if (!ret) {
mutex->value--;
pthread_mutex_unlock(&mutex->lock);
}
It turns out that 'ret' sometimes comes back instantly
(on my machine) with a
value of 60 (ETIMEDOUT)
despite the fact that we set the timeout 10 seconds into
the future.
Has anyone else seen anything like this?
(and yes the condition variable attribute have been set
to use the REALTIME clock).
But why?
Just a hypothesis that maybe there is some issue with
time keeping on that system.
How would that code work out for you with MONOTONIC?
Jens Axboe, (CC'd) tried both CLOCK_REALTIME and
CLOCK_MONOTONIC, and they both had the same problem..
i.e. random early returns with ETIMEDOUT.
I think we will try move out machine forward to a newer
-stable to see if it resolves.
Kan upgraded the machine today to today's 9.x branch tip
and the problem still occurs.
8.x does not have this problem.
I have not got a 9-RELEASE machine to test on.. so I can
not tell if this came in with the burst of stuff
that came in after the 9.x branch was unfrozen after the
release of 9.0.
I am trying to reproduce the problem, do you have complete
sample code to test ?
I'm still looking the exact set
but on my machine (4 cpus) the program from ports sysutils/fio
exhibits the problem when used with
kern.timecounter.hardware=TSC-low and with the following
config file:
pu05 # cat config.fio
[global]
#clocksource=cpu
direct=1
rw=randread
bs=4096
fill_device=1
numjobs=16
iodepth=16
#ioengine=posixaio
#ioengine=psync
ioengine=psync
group_reporting
norandommap
time_based
runtime=60000
randrepeat=0
[file1]
filename=/dev/ada0
pu05 #
pu05 # fio config.fio
fio: this platform does not support process shared mutexes,
forcing use of threads. Use the 'thread' option to get rid of
this warning. file1: (g=0): rw=randread, bs=4K-4K/4K-4K,
ioengine=psync, iodepth=16 ...
file1: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=psync,
iodepth=16 fio 2.0.3
Starting 15 threads and 1 process
fio: job startup hung? exiting.
fio: 5 jobs failed to start
Segmentation fault (core dumped)
pu05#
The reason 5 jobs failed to start is because the parent timed
out on them immediately.
It didn't time out on 10 of them apparently.
if I set the timer to ACPI-fast it works as expected..
maybe following code can check to see if TSC-LOW works by let
the thread run
on each cpu.
gettimeofday(&prev, NULL);
int cpu = 0;
for (;;) {
cpuset_t set;
cpu = ++cpu % 4;
CPU_ZERO(&set);
CPU_SET(cpu,&set);
pthread_setaffinity_np(pthread_self(), sizeof(set),&set);
gettimeofday(&cur, NULL);
if ( timercmp(&prev,&cur,>=)) {
abort();
}
}
pu05# sysctl kern.timecounter.hardware=TSC-low
kern.timecounter.hardware: ACPI-fast -> TSC-low
pu05# ./test
^C
pu05# cat test.c
#include<stdlib.h>
#include<sys/param.h>
#include<sys/cpuset.h>
#include<pthread_np.h>
#include<sys/time.h>
main()
{
int cpu = 0;
struct timeval prev, cur;
gettimeofday(&prev, NULL);
for (;;) {
cpuset_t set;
cpu = ++cpu % 4;
CPU_ZERO(&set);
CPU_SET(cpu,&set);
pthread_setaffinity_np(pthread_self(), sizeof(set),
&set); gettimeofday(&cur, NULL);
if ( timercmp(&prev,&cur,>)) {
abort();
}
prev = cur;
}
}
pu05# ./test
minutes pass.......
^C
pu05#
so it looks as if the TSC is working ok..
I'm just going to check that the program is actually moving
CPU... yes it is moving around but I can't tell at what speed.
(according to top).
so we are still left with a question of "where is the problem?"
kernel TSC driver?
generic gettimeofday() code?
pthreads cond code?
the application?
I am running the fio test on my notebook which is using TSC-low,
it is on 9.0-RC3, I can not reproduce the problem for
minutes, then I interrupt it with ctrl-c:
http://people.freebsd.org/~davidxu/tsc_pthread/dmesg.txt
http://people.freebsd.org/~davidxu/tsc_pthread/tc.txt
http://people.freebsd.org/~davidxu/tsc_pthread/fio.txt
Your CPU is single-package, dual-core, and SMT-enabled. All cores
should be in perfect sync.
Jung-uk Kim
mine is too, yet it still has problems..
CPU: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz (2500.14-MHz
K8-class CPU)
Origin = "GenuineIntel" Id = 0x10676 Family = 6 Model = 17
Stepping = 6
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0xce3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1>
AMD Features=0x20100800<SYSCALL,NX,LM>
AMD Features2=0x1<LAHF>
TSC: P-state invariant, performance statistics
real memory = 8589934592 (8192 MB)
avail memory = 8214368256 (7833 MB)
Event timer "LAPIC" quality 400
ACPI APIC Table: <PTLTD APIC >
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s)
cpu0 (BSP): APIC ID: 0
cpu1 (AP): APIC ID: 1
cpu2 (AP): APIC ID: 2
cpu3 (AP): APIC ID: 3
ioapic0 <Version 2.0> irqs 0-23 on motherboard
ioapic1 <Version 2.0> irqs 24-47 on motherboard
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"