Re: svn commit: r238755 - head/sys/x86/x86

2012-07-25 Thread Andriy Gapon
on 25/07/2012 01:10 Jim Harris said the following:
> Author: jimharris
> Date: Tue Jul 24 22:10:11 2012
> New Revision: 238755
> URL: http://svn.freebsd.org/changeset/base/238755
> 
> Log:
>   Add rmb() to tsc_read_##x to enforce serialization of rdtsc captures.
>   
>   Intel Architecture Manual specifies that rdtsc instruction is not 
> serialized,
>   so without this change, TSC synchronization test would periodically fail,
>   resulting in use of HPET timecounter instead of TSC-low.  This caused
>   severe performance degradation (40-50%) when running high IO/s workloads 
> due to
>   HPET MMIO reads and GEOM stat collection.
>   
>   Tests on Xeon E5-2600 (Sandy Bridge) 8C systems were seeing TSC 
> synchronization
>   fail approximately 20% of the time.

Should rather the synchronization test be fixed if it's the culprit?
Or is this change universally good for the real uses of TSC?

>   Sponsored by: Intel
>   Reviewed by: kib
>   MFC after: 3 days
> 
> Modified:
>   head/sys/x86/x86/tsc.c
> 
> Modified: head/sys/x86/x86/tsc.c
> ==
> --- head/sys/x86/x86/tsc.cTue Jul 24 20:15:41 2012(r238754)
> +++ head/sys/x86/x86/tsc.cTue Jul 24 22:10:11 2012(r238755)
> @@ -328,6 +328,7 @@ init_TSC(void)
>  
>  #ifdef SMP
>  
> +/* rmb is required here because rdtsc is not a serializing instruction. */
>  #define  TSC_READ(x) \
>  static void  \
>  tsc_read_##x(void *arg)  \
> @@ -335,6 +336,7 @@ tsc_read_##x(void *arg)   \
>   uint32_t *tsc = arg;\
>   u_int cpu = PCPU_GET(cpuid);\
>   \
> + rmb();  \
>   tsc[cpu * 3 + x] = rdtsc32();   \
>  }
>  TSC_READ(0)
> 


-- 
Andriy Gapon


___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r238722 - in head/lib/msun: . ld128 ld80 man src

2012-07-25 Thread Peter Jeremy
On 2012-Jul-24 13:57:12 -0400, David Schultz  wrote:
>On Tue, Jul 24, 2012, Steve Kargl wrote:
>> On Tue, Jul 24, 2012 at 08:43:35AM +, Alexey Dokuchaev wrote:
>> > On Mon, Jul 23, 2012 at 07:13:56PM +, Steve Kargl wrote:
>> > >   Compute the exponential of x for Intel 80-bit format and IEEE 128-bit
>> > >   format.  These implementations are based on
>> > I believe some ports could benefit from OSVERSION bump for this one.
...
>against.  In this case, it would help any ports that have
>workarounds for the lack of expl() to compile both before and
>after this change.  But it's also important not to bump the
>version gratuitously if there's no reason to believe the change
>might introduce incompatibilities.

Hopefully, this is just the first of a series of similar commits over
the next 4-5 months so if we bump OSVERSION for this, we are probably
looking at another half-dozen or so bumps.  Do any ports actually have
a hard-wired decision for expl() (or other libm functions)?  I would
hope most ports that are interested in complex and/or long double
functions have some sort of configure-time test that will automatically
detect their presence or absence.

-- 
Peter Jeremy


pgpiJePWz6zye.pgp
Description: PGP signature


Re: svn commit: r238755 - head/sys/x86/x86

2012-07-25 Thread Konstantin Belousov
On Wed, Jul 25, 2012 at 10:20:02AM +0300, Andriy Gapon wrote:
> on 25/07/2012 01:10 Jim Harris said the following:
> > Author: jimharris
> > Date: Tue Jul 24 22:10:11 2012
> > New Revision: 238755
> > URL: http://svn.freebsd.org/changeset/base/238755
> > 
> > Log:
> >   Add rmb() to tsc_read_##x to enforce serialization of rdtsc captures.
> >   
> >   Intel Architecture Manual specifies that rdtsc instruction is not 
> > serialized,
> >   so without this change, TSC synchronization test would periodically fail,
> >   resulting in use of HPET timecounter instead of TSC-low.  This caused
> >   severe performance degradation (40-50%) when running high IO/s workloads 
> > due to
> >   HPET MMIO reads and GEOM stat collection.
> >   
> >   Tests on Xeon E5-2600 (Sandy Bridge) 8C systems were seeing TSC 
> > synchronization
> >   fail approximately 20% of the time.
> 
> Should rather the synchronization test be fixed if it's the culprit?
Synchronization test for what ?

> Or is this change universally good for the real uses of TSC?

What I understood from the Intel SDM, and also from additional experiments
which Jim kindly made despite me being annoying as usual, is that 'read
memory barrier' AKA LFENCE there is used for its secondary implementation
effects, not for load/load barrier as you might assume.

According to SDM, LFENCE fully drains execution pipeline (but comparing
with MFENCE, does not drain write buffers). The result is that RDTSC is
not started before previous instructions are finished.

For tsc test, this means that after the change RDTSC executions are not
reordered on the single core among themself. As I understand, CPU has
no dependency noted between two reads of tsc by RDTSC, which allows
later read to give lower value of counter. This is fixed by Intel by
introduction of RDTSCP instruction, which is defined to be serialization
point, and use of which (instead of LFENCE; RDTSC sequence) also fixes
test, as confirmed by Jim.

In fact, I now think that we should also apply the following patch.
Otherwise, consequtive calls to e.g. binuptime(9) could return decreased
time stamps. Note that libc __vdso_gettc.c already has LFENCE nearby the
tsc reads, which was done not for this reason, but apparently needed for
the reason too.

diff --git a/sys/x86/x86/tsc.c b/sys/x86/x86/tsc.c
index 085c339..229b351 100644
--- a/sys/x86/x86/tsc.c
+++ b/sys/x86/x86/tsc.c
@@ -594,6 +594,7 @@ static u_int
 tsc_get_timecount(struct timecounter *tc __unused)
 {
 
+   rmb();
return (rdtsc32());
 }
 
@@ -602,8 +603,9 @@ tsc_get_timecount_low(struct timecounter *tc)
 {
uint32_t rv;
 
+   rmb();
__asm __volatile("rdtsc; shrd %%cl, %%edx, %0"
-   : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx");
+   : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx");
return (rv);
 }
 


pgpNLybt2fPmY.pgp
Description: PGP signature


svn commit: r238765 - head/sys/dev/e1000

2012-07-25 Thread Luigi Rizzo
Author: luigi
Date: Wed Jul 25 11:28:15 2012
New Revision: 238765
URL: http://svn.freebsd.org/changeset/base/238765

Log:
  Use legacy interrupts as a default. This gives up to 10% speedup
  when used in qemu (and this driver is for non-PCIe cards,
  so probably its largest use is in virtualized environments).
  
  Approved by:  Jack Vogel
  MFC after:3 days

Modified:
  head/sys/dev/e1000/if_lem.c

Modified: head/sys/dev/e1000/if_lem.c
==
--- head/sys/dev/e1000/if_lem.c Wed Jul 25 10:55:14 2012(r238764)
+++ head/sys/dev/e1000/if_lem.c Wed Jul 25 11:28:15 2012(r238765)
@@ -239,6 +239,7 @@ static void lem_enable_wakeup(device
 static int lem_enable_phy_wakeup(struct adapter *);
 static voidlem_led_func(void *, int);
 
+#define EM_LEGACY_IRQ  /* slightly faster, at least in qemu */
 #ifdef EM_LEGACY_IRQ
 static voidlem_intr(void *);
 #else /* FAST IRQ */
@@ -1549,6 +1550,13 @@ lem_xmit(struct adapter *adapter, struct
u32 txd_upper, txd_lower, txd_used, txd_saved;
int error, nsegs, i, j, first, last = 0;
 
+extern int netmap_drop;
+   if (netmap_drop == 95) {
+dropme:
+   m_freem(*m_headp);
+   *m_headp = NULL;
+   return (ENOBUFS);
+   }
m_head = *m_headp;
txd_upper = txd_lower = txd_used = txd_saved = 0;
 
@@ -1688,6 +1696,9 @@ lem_xmit(struct adapter *adapter, struct
}
}
 
+   if (netmap_drop == 96)
+   goto dropme;
+
adapter->next_avail_tx_desc = i;
 
if (adapter->pcix_82544)
@@ -1715,6 +1726,16 @@ lem_xmit(struct adapter *adapter, struct
  */
 ctxd->lower.data |=
htole32(E1000_TXD_CMD_EOP | E1000_TXD_CMD_RS);
+
+if (netmap_drop == 97) {
+   static int count=0;
+   if (count++ & 63 != 0)
+ctxd->lower.data &=
+~htole32(E1000_TXD_CMD_RS);
+   else
+   D("preserve RS");
+
+}
/*
 * Keep track in the first buffer which
 * descriptor will be written back
@@ -1733,6 +1754,12 @@ lem_xmit(struct adapter *adapter, struct
adapter->link_duplex == HALF_DUPLEX)
lem_82547_move_tail(adapter);
else {
+extern int netmap_repeat;
+   if (netmap_repeat) {
+   int x;
+   for (x = 0; x < netmap_repeat; x++)
+   E1000_WRITE_REG(&adapter->hw, E1000_TDT(0), i);
+   }
E1000_WRITE_REG(&adapter->hw, E1000_TDT(0), i);
if (adapter->hw.mac.type == e1000_82547)
lem_82547_update_fifo_head(adapter,
@@ -2986,6 +3013,13 @@ lem_txeof(struct adapter *adapter)
return;
}
 #endif /* DEV_NETMAP */
+{
+   static int drops = 0;
+   if (netmap_copy && drops++ < netmap_copy)
+   return;
+   drops = 0;
+}
+
 if (adapter->num_tx_desc_avail == adapter->num_tx_desc)
 return;
 
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r238766 - in head/sys/dev/usb: . serial

2012-07-25 Thread Gavin Atkinson
Author: gavin
Date: Wed Jul 25 11:33:43 2012
New Revision: 238766
URL: http://svn.freebsd.org/changeset/base/238766

Log:
  Update the list of devices supported by uplcom.  Although this only adds
  one device (support for Motorola cables), this syncronises us with:
  
  OpenBSD src/sys/dev/usb/uplcom.c 1.56
  NetBSD  src/sys/dev/usb/uplcom.c 1.73
  Linux   kernel.org HEAD
  
  MFC after:1 week

Modified:
  head/sys/dev/usb/serial/uplcom.c
  head/sys/dev/usb/usbdevs

Modified: head/sys/dev/usb/serial/uplcom.c
==
--- head/sys/dev/usb/serial/uplcom.cWed Jul 25 11:28:15 2012
(r238765)
+++ head/sys/dev/usb/serial/uplcom.cWed Jul 25 11:33:43 2012
(r238766)
@@ -279,6 +279,7 @@ static const STRUCT_USB_HOST_ID uplcom_d
UPLCOM_DEV(PROLIFIC, DCU11),/* DCU-11 Phone Cable */
UPLCOM_DEV(PROLIFIC, HCR331),   /* HCR331 Card Reader */
UPLCOM_DEV(PROLIFIC, MICROMAX_610U),/* Micromax 610U modem */
+   UPLCOM_DEV(PROLIFIC, MOTOROLA), /* Motorola cable */
UPLCOM_DEV(PROLIFIC, PHAROS),   /* Prolific Pharos */
UPLCOM_DEV(PROLIFIC, PL2303),   /* Generic adapter */
UPLCOM_DEV(PROLIFIC, RSAQ2),/* I/O DATA USB-RSAQ2 */

Modified: head/sys/dev/usb/usbdevs
==
--- head/sys/dev/usb/usbdevsWed Jul 25 11:28:15 2012(r238765)
+++ head/sys/dev/usb/usbdevsWed Jul 25 11:33:43 2012(r238766)
@@ -2667,6 +2667,7 @@ product PRIMAX HP_RH304AA 0x4d17  HP RH30
 /* Prolific products */
 product PROLIFIC PL23010x  PL2301 Host-Host interface
 product PROLIFIC PL23020x0001  PL2302 Host-Host interface
+product PROLIFIC MOTOROLA  0x0307  Motorola Cable
 product PROLIFIC RSAQ2 0x04bb  PL2303 Serial (IODATA USB-RSAQ2)
 product PROLIFIC ALLTRONIX_GPRS0x0609  Alltronix ACM003U00 modem
 product PROLIFIC ALDIGA_AL11U  0x0611  AlDiga AL-11U modem
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r238769 - head/sys/netinet

2012-07-25 Thread Bjoern A. Zeeb
Author: bz
Date: Wed Jul 25 12:14:39 2012
New Revision: 238769
URL: http://svn.freebsd.org/changeset/base/238769

Log:
  Fix a problem when CARP is enabled on the interface for IPv4
  but not for IPv6.  The current checks in nd6_nbr.c along with the
  old version will result in ifa being NULL and subsequently the
  packet will be dropped.  This prevented NS/NA, from working and
  with that IPv6.
  
  Now return the ifa from the carp lookup function in two cases:
  1) if the address matches, is a carp address, and we are MASTER
 (as before),
  2) if the address matches but it is not a carp address at all (new).
  
  Reported by:  Peter Wemm (new Y! FreeBSD cluster, eating our own dogfood)
  Tested on:New Y! FreeBSD cluster machines
  Reviewed by:  glebius

Modified:
  head/sys/netinet/ip_carp.c

Modified: head/sys/netinet/ip_carp.c
==
--- head/sys/netinet/ip_carp.c  Wed Jul 25 12:06:52 2012(r238768)
+++ head/sys/netinet/ip_carp.c  Wed Jul 25 12:14:39 2012(r238769)
@@ -1027,23 +1027,31 @@ carp_send_na(struct carp_softc *sc)
}
 }
 
+/*
+ * Returns ifa in case it's a carp address and it is MASTER, or if the address
+ * matches and is not a carp address.  Returns NULL otherwise.
+ */
 struct ifaddr *
 carp_iamatch6(struct ifnet *ifp, struct in6_addr *taddr)
 {
struct ifaddr *ifa;
 
+   ifa = NULL;
IF_ADDR_RLOCK(ifp);
-   IFNET_FOREACH_IFA(ifp, ifa)
-   if (ifa->ifa_addr->sa_family == AF_INET6 &&
-   ifa->ifa_carp->sc_state == MASTER &&
-   IN6_ARE_ADDR_EQUAL(taddr, IFA_IN6(ifa))) {
+   TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
+   if (ifa->ifa_addr->sa_family != AF_INET6)
+   continue;
+   if (!IN6_ARE_ADDR_EQUAL(taddr, IFA_IN6(ifa)))
+   continue;
+   if (ifa->ifa_carp && ifa->ifa_carp->sc_state != MASTER)
+   ifa = NULL;
+   else
ifa_ref(ifa);
-   IF_ADDR_RUNLOCK(ifp);
-   return (ifa);
-   }
+   break;
+   }
IF_ADDR_RUNLOCK(ifp);
 
-   return (NULL);
+   return (ifa);
 }
 
 caddr_t
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r238755 - head/sys/x86/x86

2012-07-25 Thread Andriy Gapon
on 25/07/2012 13:21 Konstantin Belousov said the following:
> On Wed, Jul 25, 2012 at 10:20:02AM +0300, Andriy Gapon wrote:
>> on 25/07/2012 01:10 Jim Harris said the following:
>>> Author: jimharris
>>> Date: Tue Jul 24 22:10:11 2012
>>> New Revision: 238755
>>> URL: http://svn.freebsd.org/changeset/base/238755
>>>
>>> Log:
>>>   Add rmb() to tsc_read_##x to enforce serialization of rdtsc captures.
>>>   
>>>   Intel Architecture Manual specifies that rdtsc instruction is not 
>>> serialized,
>>>   so without this change, TSC synchronization test would periodically fail,
>>>   resulting in use of HPET timecounter instead of TSC-low.  This caused
>>>   severe performance degradation (40-50%) when running high IO/s workloads 
>>> due to
>>>   HPET MMIO reads and GEOM stat collection.
>>>   
>>>   Tests on Xeon E5-2600 (Sandy Bridge) 8C systems were seeing TSC 
>>> synchronization
>>>   fail approximately 20% of the time.
>>
>> Should rather the synchronization test be fixed if it's the culprit?
> Synchronization test for what ?

The synchronization test mentioned above.
So, oops, very sorry - I missed the fact that the change was precisely in the
test.  I confused it for another place where tsc is used.  Thank you for 
pointing
this out.

>> Or is this change universally good for the real uses of TSC?
> 
> What I understood from the Intel SDM, and also from additional experiments
> which Jim kindly made despite me being annoying as usual, is that 'read
> memory barrier' AKA LFENCE there is used for its secondary implementation
> effects, not for load/load barrier as you might assume.
> 
> According to SDM, LFENCE fully drains execution pipeline (but comparing
> with MFENCE, does not drain write buffers). The result is that RDTSC is
> not started before previous instructions are finished.

Yes, I am fully aware of this.

> For tsc test, this means that after the change RDTSC executions are not
> reordered on the single core among themself. As I understand, CPU has
> no dependency noted between two reads of tsc by RDTSC, which allows
> later read to give lower value of counter. This is fixed by Intel by
> introduction of RDTSCP instruction, which is defined to be serialization
> point, and use of which (instead of LFENCE; RDTSC sequence) also fixes
> test, as confirmed by Jim.

Yes.  I think that previously Intel recommended to precede rdtsc with cpuid for
all the same reasons.  Not sure if there is any difference performance-wise
comparing to lfence.
Unfortunately, rdtscp is not available on all CPUs, so using it would require
extra work.

> In fact, I now think that we should also apply the following patch.
> Otherwise, consequtive calls to e.g. binuptime(9) could return decreased
> time stamps. Note that libc __vdso_gettc.c already has LFENCE nearby the
> tsc reads, which was done not for this reason, but apparently needed for
> the reason too.
> 
> diff --git a/sys/x86/x86/tsc.c b/sys/x86/x86/tsc.c
> index 085c339..229b351 100644
> --- a/sys/x86/x86/tsc.c
> +++ b/sys/x86/x86/tsc.c
> @@ -594,6 +594,7 @@ static u_int
>  tsc_get_timecount(struct timecounter *tc __unused)
>  {
>  
> + rmb();
>   return (rdtsc32());
>  }
>  

This makes sense to me.  We probably want correctness over performance here.
[BTW, I originally thought that the change was here; brain malfunction]

> @@ -602,8 +603,9 @@ tsc_get_timecount_low(struct timecounter *tc)
>  {
>   uint32_t rv;
>  
> + rmb();
>   __asm __volatile("rdtsc; shrd %%cl, %%edx, %0"
> - : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx");
> + : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx");
>   return (rv);
>  }
>  

It would correct here too, but not sure if it would make any difference given 
that
some lower bits are discarded anyway.  Probably depends on exact CPU.


And, oh hmm, I read AMD Software Optimization Guide for AMD Family 10h 
Processors
and they suggest using cpuid (with a note that it may be intercepted in
virtualized environments) or _mfence_ in the discussed role (Appendix F of the
document).
Googling for 'rdtsc mfence lfence' yields some interesting results.

-- 
Andriy Gapon


___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r238770 - head/sys/dev/e1000

2012-07-25 Thread Luigi Rizzo
Author: luigi
Date: Wed Jul 25 12:51:33 2012
New Revision: 238770
URL: http://svn.freebsd.org/changeset/base/238770

Log:
  remove some extra testing code that slipped into the previous commit
  
  Reported-by: Alexander Motin

Modified:
  head/sys/dev/e1000/if_lem.c

Modified: head/sys/dev/e1000/if_lem.c
==
--- head/sys/dev/e1000/if_lem.c Wed Jul 25 12:14:39 2012(r238769)
+++ head/sys/dev/e1000/if_lem.c Wed Jul 25 12:51:33 2012(r238770)
@@ -1550,13 +1550,6 @@ lem_xmit(struct adapter *adapter, struct
u32 txd_upper, txd_lower, txd_used, txd_saved;
int error, nsegs, i, j, first, last = 0;
 
-extern int netmap_drop;
-   if (netmap_drop == 95) {
-dropme:
-   m_freem(*m_headp);
-   *m_headp = NULL;
-   return (ENOBUFS);
-   }
m_head = *m_headp;
txd_upper = txd_lower = txd_used = txd_saved = 0;
 
@@ -1696,9 +1689,6 @@ dropme:
}
}
 
-   if (netmap_drop == 96)
-   goto dropme;
-
adapter->next_avail_tx_desc = i;
 
if (adapter->pcix_82544)
@@ -1726,16 +1716,6 @@ dropme:
  */
 ctxd->lower.data |=
htole32(E1000_TXD_CMD_EOP | E1000_TXD_CMD_RS);
-
-if (netmap_drop == 97) {
-   static int count=0;
-   if (count++ & 63 != 0)
-ctxd->lower.data &=
-~htole32(E1000_TXD_CMD_RS);
-   else
-   D("preserve RS");
-
-}
/*
 * Keep track in the first buffer which
 * descriptor will be written back
@@ -1754,12 +1734,6 @@ if (netmap_drop == 97) {
adapter->link_duplex == HALF_DUPLEX)
lem_82547_move_tail(adapter);
else {
-extern int netmap_repeat;
-   if (netmap_repeat) {
-   int x;
-   for (x = 0; x < netmap_repeat; x++)
-   E1000_WRITE_REG(&adapter->hw, E1000_TDT(0), i);
-   }
E1000_WRITE_REG(&adapter->hw, E1000_TDT(0), i);
if (adapter->hw.mac.type == e1000_82547)
lem_82547_update_fifo_head(adapter,
@@ -3013,13 +2987,6 @@ lem_txeof(struct adapter *adapter)
return;
}
 #endif /* DEV_NETMAP */
-{
-   static int drops = 0;
-   if (netmap_copy && drops++ < netmap_copy)
-   return;
-   drops = 0;
-}
-
 if (adapter->num_tx_desc_avail == adapter->num_tx_desc)
 return;
 
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r238755 - head/sys/x86/x86

2012-07-25 Thread Konstantin Belousov
On Wed, Jul 25, 2012 at 03:29:34PM +0300, Andriy Gapon wrote:
> on 25/07/2012 13:21 Konstantin Belousov said the following:
> > On Wed, Jul 25, 2012 at 10:20:02AM +0300, Andriy Gapon wrote:
> >> on 25/07/2012 01:10 Jim Harris said the following:
> >>> Author: jimharris
> >>> Date: Tue Jul 24 22:10:11 2012
> >>> New Revision: 238755
> >>> URL: http://svn.freebsd.org/changeset/base/238755
> >>>
> >>> Log:
> >>>   Add rmb() to tsc_read_##x to enforce serialization of rdtsc captures.
> >>>   
> >>>   Intel Architecture Manual specifies that rdtsc instruction is not 
> >>> serialized,
> >>>   so without this change, TSC synchronization test would periodically 
> >>> fail,
> >>>   resulting in use of HPET timecounter instead of TSC-low.  This caused
> >>>   severe performance degradation (40-50%) when running high IO/s 
> >>> workloads due to
> >>>   HPET MMIO reads and GEOM stat collection.
> >>>   
> >>>   Tests on Xeon E5-2600 (Sandy Bridge) 8C systems were seeing TSC 
> >>> synchronization
> >>>   fail approximately 20% of the time.
> >>
> >> Should rather the synchronization test be fixed if it's the culprit?
> > Synchronization test for what ?
> 
> The synchronization test mentioned above.
> So, oops, very sorry - I missed the fact that the change was precisely in the
> test.  I confused it for another place where tsc is used.  Thank you for 
> pointing
> this out.
> 
> >> Or is this change universally good for the real uses of TSC?
> > 
> > What I understood from the Intel SDM, and also from additional experiments
> > which Jim kindly made despite me being annoying as usual, is that 'read
> > memory barrier' AKA LFENCE there is used for its secondary implementation
> > effects, not for load/load barrier as you might assume.
> > 
> > According to SDM, LFENCE fully drains execution pipeline (but comparing
> > with MFENCE, does not drain write buffers). The result is that RDTSC is
> > not started before previous instructions are finished.
> 
> Yes, I am fully aware of this.
> 
> > For tsc test, this means that after the change RDTSC executions are not
> > reordered on the single core among themself. As I understand, CPU has
> > no dependency noted between two reads of tsc by RDTSC, which allows
> > later read to give lower value of counter. This is fixed by Intel by
> > introduction of RDTSCP instruction, which is defined to be serialization
> > point, and use of which (instead of LFENCE; RDTSC sequence) also fixes
> > test, as confirmed by Jim.
> 
> Yes.  I think that previously Intel recommended to precede rdtsc with cpuid 
> for
> all the same reasons.  Not sure if there is any difference performance-wise
> comparing to lfence.
> Unfortunately, rdtscp is not available on all CPUs, so using it would require
> extra work.
> 
> > In fact, I now think that we should also apply the following patch.
> > Otherwise, consequtive calls to e.g. binuptime(9) could return decreased
> > time stamps. Note that libc __vdso_gettc.c already has LFENCE nearby the
> > tsc reads, which was done not for this reason, but apparently needed for
> > the reason too.
> > 
> > diff --git a/sys/x86/x86/tsc.c b/sys/x86/x86/tsc.c
> > index 085c339..229b351 100644
> > --- a/sys/x86/x86/tsc.c
> > +++ b/sys/x86/x86/tsc.c
> > @@ -594,6 +594,7 @@ static u_int
> >  tsc_get_timecount(struct timecounter *tc __unused)
> >  {
> >  
> > +   rmb();
> > return (rdtsc32());
> >  }
> >  
> 
> This makes sense to me.  We probably want correctness over performance here.
> [BTW, I originally thought that the change was here; brain malfunction]
> 
> > @@ -602,8 +603,9 @@ tsc_get_timecount_low(struct timecounter *tc)
> >  {
> > uint32_t rv;
> >  
> > +   rmb();
> > __asm __volatile("rdtsc; shrd %%cl, %%edx, %0"
> > -   : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx");
> > +   : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx");
> > return (rv);
> >  }
> >  
> 
> It would correct here too, but not sure if it would make any difference given 
> that
> some lower bits are discarded anyway.  Probably depends on exact CPU.
> 
> 
> And, oh hmm, I read AMD Software Optimization Guide for AMD Family 10h
> Processors and they suggest using cpuid (with a note that it may be
> intercepted in virtualized environments) or _mfence_ in the discussed
> role (Appendix F of the document). Googling for 'rdtsc mfence lfence'
> yields some interesting results.
Yes, MFENCE for AMD.

Since I was infected with these Google results anyway, I looked at the
Linux code. Apparently, they use MFENCE on amd, and LFENCE on Intel.
They also use LFENCE on VIA, it seems. Intel documentation claims that
MFENCE does not serialize instruction execution, which is contrary to
used LFENCE behaviour.

So we definitely want to add some barrier right before rdtsc. And we do
want LFENCE for Intels. Patch below ends with the following code:

Dump of assembler code for function tsc_get_timecount_lfence:
   0x805563a0 <+0>: push   %rbp
   0x805563a1 <+1>

Re: svn commit: r238755 - head/sys/x86/x86

2012-07-25 Thread Bruce Evans

On Wed, 25 Jul 2012, Konstantin Belousov wrote:


On Wed, Jul 25, 2012 at 10:20:02AM +0300, Andriy Gapon wrote:

on 25/07/2012 01:10 Jim Harris said the following:

Author: jimharris
Date: Tue Jul 24 22:10:11 2012
New Revision: 238755
URL: http://svn.freebsd.org/changeset/base/238755

Log:
  Add rmb() to tsc_read_##x to enforce serialization of rdtsc captures.

  Intel Architecture Manual specifies that rdtsc instruction is not serialized,
  so without this change, TSC synchronization test would periodically fail,
  resulting in use of HPET timecounter instead of TSC-low.  This caused
  severe performance degradation (40-50%) when running high IO/s workloads due 
to
  HPET MMIO reads and GEOM stat collection.

  Tests on Xeon E5-2600 (Sandy Bridge) 8C systems were seeing TSC 
synchronization
  fail approximately 20% of the time.


Should rather the synchronization test be fixed if it's the culprit?

Synchronization test for what ?


Or is this change universally good for the real uses of TSC?


It's too slow for real uses.  But synchronization code, and some uses
that requires serialization may need it for, er, synchronization and
serialization.

It's hard to think of many uses that need serialization.  I often use
it for timing instructions.  For timng a large number of instructions,
serialization doesn't matter since errors of a few tens in a few billion
done matter.  For timing a small number of instructions, I don't want
serialization, since the serialization invalidates the timing.

Most uses in FreeBSD are for timecounters.  Timecounters deliver the
current time.  This is unrelated to whatever instructions haven't
completed when the TSC is read.  Except possibly when the time needs
to be synchronized across CPUs, and when the uncompleted instruction
is a TSC read.


For tsc test, this means that after the change RDTSC executions are not
reordered on the single core among themself. As I understand, CPU has
no dependency noted between two reads of tsc by RDTSC, which allows
later read to give lower value of counter.


Gak.  Even when they are in the same instruction sequence?  Even though
the TSC reads fixed registers and some other instructions in the sequence
between the TSC use these registers?  The CPU would have to do significant
register renaming to break this.


This is fixed by Intel by
introduction of RDTSCP instruction, which is defined to be serialization
point, and use of which (instead of LFENCE; RDTSC sequence) also fixes
test, as confirmed by Jim.


This is not a fix if it is full serialization.  It just gives slowness
using a single instruction instead of a couple.


In fact, I now think that we should also apply the following patch.
Otherwise, consequtive calls to e.g. binuptime(9) could return decreased
time stamps. Note that libc __vdso_gettc.c already has LFENCE nearby the
tsc reads, which was done not for this reason, but apparently needed for
the reason too.

diff --git a/sys/x86/x86/tsc.c b/sys/x86/x86/tsc.c
index 085c339..229b351 100644
--- a/sys/x86/x86/tsc.c
+++ b/sys/x86/x86/tsc.c
@@ -594,6 +594,7 @@ static u_int
tsc_get_timecount(struct timecounter *tc __unused)
{

+   rmb();
return (rdtsc32());
}


Please don't pessimize this further.  The time for rdtsc went from 6.5
cycles on AthlonXP to 65 cycles on core2 (mainly for for
P-state-invariance hardware synchronization I think).  Pretty soon it
will be as slow as an HPET and heading towards an i8254.  Adding rmb()
only makes it 12 cycles slower on core2, but 16 cycles (almost 3 times)
slower on AthlonXP.


@@ -602,8 +603,9 @@ tsc_get_timecount_low(struct timecounter *tc)
{
uint32_t rv;

+   rmb();
__asm __volatile("rdtsc; shrd %%cl, %%edx, %0"
-   : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx");
+   : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx");
return (rv);
}


The previous TSC-low/shrd pessimization adds only 2 cycles on AthlonXP
and core2.  I think it only "works" by backing the TSC's resolution
so low that it usually can't see its own, or at least other TSC's lack of
serialness.  The shift count is usually 7 or 8, so the resolution is
reduced from 1 cycle to 128 or 256.  Out of order times that fall in
the same block of 128 or 256 cycles would appear to be the same, but
out of order times like 129 and 127 would apear to be even more out
of order after a shift of 7 turns them into 128 and 0.

Bruce
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r238755 - head/sys/x86/x86

2012-07-25 Thread Bruce Evans

On Wed, 25 Jul 2012, Andriy Gapon wrote:


on 25/07/2012 13:21 Konstantin Belousov said the following:

...
diff --git a/sys/x86/x86/tsc.c b/sys/x86/x86/tsc.c
index 085c339..229b351 100644
--- a/sys/x86/x86/tsc.c
+++ b/sys/x86/x86/tsc.c
@@ -594,6 +594,7 @@ static u_int
 tsc_get_timecount(struct timecounter *tc __unused)
 {

+   rmb();
return (rdtsc32());
 }


This makes sense to me.  We probably want correctness over performance here.
[BTW, I originally thought that the change was here; brain malfunction]


And I liked the original change because it wasn't here :-).


@@ -602,8 +603,9 @@ tsc_get_timecount_low(struct timecounter *tc)
 {
uint32_t rv;

+   rmb();
__asm __volatile("rdtsc; shrd %%cl, %%edx, %0"
-   : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx");
+   : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx");
return (rv);
 }



It would correct here too, but not sure if it would make any difference given 
that
some lower bits are discarded anyway.  Probably depends on exact CPU.


It is needed to pessimize this too. :-)

As I have complained before, the loss of resolution from the shift is
easy to see by reading the time from userland, even with syscall overhead
taking 10-20 times longer than the read.  On core2 with TSC-low, a clock-
checking utility gives:

% min 481, max 12031, mean 530.589452, std 51.633626
% 1th: 550 (1296487 observations)
% 2th: 481 (448425 observations)
% 3th: 482 (142650 observations)
% 4th: 549 (61945 observations)
% 5th: 551 (47619 observations)

The numbers are diffences in nanoseconds measured by clock_gettime().
The jump from 481 to 549 is 68.  From this I can tell that the clock
frequency is 1.86 Ghz and the shift is 128, or the clock frequency is
3.72 Ghz and the shift is 256.

On AthlonXP with TSC:

% min 273, max 29075, mean 274.412811, std 80.425963
% 1th: 273 (853962 observations)
% 2th: 274 (745606 observations)
% 3th: 275 (400212 observations)
% 4th: 276 (20 observations)
% 5th: 280 (10 observations)

Now the numbers cluster about the mean.  Although syscalls take much longer
than the loss of resolution with TSC-low, and even the core2 TSC takes
almost as long to read as the loss, it is still possible to see things
happening at the limits of the resolution (~0.5 nsec).


And, oh hmm, I read AMD Software Optimization Guide for AMD Family 10h 
Processors
and they suggest using cpuid (with a note that it may be intercepted in
virtualized environments) or _mfence_ in the discussed role (Appendix F of the
document).
Googling for 'rdtsc mfence lfence' yields some interesting results.


The second hit was for the shrd pessimization/loss of resolution and
a memory access hack in lkml in 2011.  I now seem to remember jkim
mentioning the memory access hack.  rmb() on i386 has a related memory
access hack, but now with a lock prefix that defeats the point of the
2011 hack (it wanted to save 5 nsec by removing fences).  rmb() on
amd64 uses lfence.

Some of the other hits are a bit old.  The 8th one was by me in the
thread about kib@ implementing gettimeofday() in userland.

Bruce
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r238755 - head/sys/x86/x86

2012-07-25 Thread Jim Harris
On Wed, Jul 25, 2012 at 6:37 AM, Konstantin Belousov
 wrote:
> On Wed, Jul 25, 2012 at 03:29:34PM +0300, Andriy Gapon wrote:
>> on 25/07/2012 13:21 Konstantin Belousov said the following:
>> > On Wed, Jul 25, 2012 at 10:20:02AM +0300, Andriy Gapon wrote:
>> >> on 25/07/2012 01:10 Jim Harris said the following:
>> >>> Author: jimharris
>> >>> Date: Tue Jul 24 22:10:11 2012
>> >>> New Revision: 238755
>> >>> URL: http://svn.freebsd.org/changeset/base/238755
>> >>>
>> >>> Log:
>> >>>   Add rmb() to tsc_read_##x to enforce serialization of rdtsc captures.
>> >>>
>> >>>   Intel Architecture Manual specifies that rdtsc instruction is not 
>> >>> serialized,
>> >>>   so without this change, TSC synchronization test would periodically 
>> >>> fail,
>> >>>   resulting in use of HPET timecounter instead of TSC-low.  This caused
>> >>>   severe performance degradation (40-50%) when running high IO/s 
>> >>> workloads due to
>> >>>   HPET MMIO reads and GEOM stat collection.
>> >>>
>> >>>   Tests on Xeon E5-2600 (Sandy Bridge) 8C systems were seeing TSC 
>> >>> synchronization
>> >>>   fail approximately 20% of the time.
>> >>
>> >> Should rather the synchronization test be fixed if it's the culprit?
>> > Synchronization test for what ?
>>
>> The synchronization test mentioned above.
>> So, oops, very sorry - I missed the fact that the change was precisely in the
>> test.  I confused it for another place where tsc is used.  Thank you for 
>> pointing
>> this out.
>>
>> >> Or is this change universally good for the real uses of TSC?
>> >
>> > What I understood from the Intel SDM, and also from additional experiments
>> > which Jim kindly made despite me being annoying as usual, is that 'read
>> > memory barrier' AKA LFENCE there is used for its secondary implementation
>> > effects, not for load/load barrier as you might assume.
>> >
>> > According to SDM, LFENCE fully drains execution pipeline (but comparing
>> > with MFENCE, does not drain write buffers). The result is that RDTSC is
>> > not started before previous instructions are finished.
>>
>> Yes, I am fully aware of this.
>>
>> > For tsc test, this means that after the change RDTSC executions are not
>> > reordered on the single core among themself. As I understand, CPU has
>> > no dependency noted between two reads of tsc by RDTSC, which allows
>> > later read to give lower value of counter. This is fixed by Intel by
>> > introduction of RDTSCP instruction, which is defined to be serialization
>> > point, and use of which (instead of LFENCE; RDTSC sequence) also fixes
>> > test, as confirmed by Jim.
>>
>> Yes.  I think that previously Intel recommended to precede rdtsc with cpuid 
>> for
>> all the same reasons.  Not sure if there is any difference performance-wise
>> comparing to lfence.
>> Unfortunately, rdtscp is not available on all CPUs, so using it would require
>> extra work.
>>
>> > In fact, I now think that we should also apply the following patch.
>> > Otherwise, consequtive calls to e.g. binuptime(9) could return decreased
>> > time stamps. Note that libc __vdso_gettc.c already has LFENCE nearby the
>> > tsc reads, which was done not for this reason, but apparently needed for
>> > the reason too.
>> >
>> > diff --git a/sys/x86/x86/tsc.c b/sys/x86/x86/tsc.c
>> > index 085c339..229b351 100644
>> > --- a/sys/x86/x86/tsc.c
>> > +++ b/sys/x86/x86/tsc.c
>> > @@ -594,6 +594,7 @@ static u_int
>> >  tsc_get_timecount(struct timecounter *tc __unused)
>> >  {
>> >
>> > +   rmb();
>> > return (rdtsc32());
>> >  }
>> >
>>
>> This makes sense to me.  We probably want correctness over performance here.
>> [BTW, I originally thought that the change was here; brain malfunction]
>>
>> > @@ -602,8 +603,9 @@ tsc_get_timecount_low(struct timecounter *tc)
>> >  {
>> > uint32_t rv;
>> >
>> > +   rmb();
>> > __asm __volatile("rdtsc; shrd %%cl, %%edx, %0"
>> > -   : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx");
>> > +   : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx");
>> > return (rv);
>> >  }
>> >
>>
>> It would correct here too, but not sure if it would make any difference 
>> given that
>> some lower bits are discarded anyway.  Probably depends on exact CPU.
>>
>>
>> And, oh hmm, I read AMD Software Optimization Guide for AMD Family 10h
>> Processors and they suggest using cpuid (with a note that it may be
>> intercepted in virtualized environments) or _mfence_ in the discussed
>> role (Appendix F of the document). Googling for 'rdtsc mfence lfence'
>> yields some interesting results.
> Yes, MFENCE for AMD.
>
> Since I was infected with these Google results anyway, I looked at the
> Linux code. Apparently, they use MFENCE on amd, and LFENCE on Intel.
> They also use LFENCE on VIA, it seems. Intel documentation claims that
> MFENCE does not serialize instruction execution, which is contrary to
> used LFENCE behaviour.
>
> So we definitely want to add some barrier right before rdtsc. And we do
> want LFENCE for Intels. Patch below ends 

Re: svn commit: r238755 - head/sys/x86/x86

2012-07-25 Thread Konstantin Belousov
On Wed, Jul 25, 2012 at 08:29:57AM -0700, Jim Harris wrote:
> On Wed, Jul 25, 2012 at 6:37 AM, Konstantin Belousov
>  wrote:
> > -/* rmb is required here because rdtsc is not a serializing instruction. */
> > +/*
> > + * RDTSC is not a serializing instruction, so we need to drain
> > + * instruction stream before executing it. It could be fixed by use of
> > + * RDTSCP, except the instruction is not available everywhere.
> > + *
> > + * Insert both MFENCE for AMD CPUs, and LFENCE for others (Intel and
> > + * VIA), and assume that SMP test is only performed on CPUs that have
> > + * SSE2 anyway.
> > + */
> >  #defineTSC_READ(x) \
> >  static void\
> >  tsc_read_##x(void *arg)\
> > @@ -337,6 +361,7 @@ tsc_read_##x(void *arg) \
> > u_int cpu = PCPU_GET(cpuid);\
> > \
> > rmb();  \
> > +   mb();   \
> > tsc[cpu * 3 + x] = rdtsc32();   \
> 
> I've seen bde@'s comments, so perhaps this patch will not move
> forward, but I'm wondering if it would make sense here to just call
> the new tsc_get_timecount_mfence() function rather than explicitly
> call mb() and then rdtsc32().

I think that this in fact shall call cpuid() instead of rmb()/mb().
The genuine Pentiums, PentiumPro and Pentium II/III can be used in SMP
configuration but definitely lack LFENCE.

Regarding the patch, either it or some close relative to it shall be
implemented, since otherwise we are simply incorrect, as you demonstrated.


pgpslo7lBJd4Q.pgp
Description: PGP signature


svn commit: r238774 - head/share/man/man4

2012-07-25 Thread Gavin Atkinson
Author: gavin
Date: Wed Jul 25 17:25:44 2012
New Revision: 238774
URL: http://svn.freebsd.org/changeset/base/238774

Log:
  Update supported hardware list after r238766.
  
  MFC after:1 week

Modified:
  head/share/man/man4/uplcom.4

Modified: head/share/man/man4/uplcom.4
==
--- head/share/man/man4/uplcom.4Wed Jul 25 17:15:52 2012
(r238773)
+++ head/share/man/man4/uplcom.4Wed Jul 25 17:25:44 2012
(r238774)
@@ -29,7 +29,7 @@
 .\"
 .\" $FreeBSD$
 .\"
-.Dd November 20, 2011
+.Dd July 25, 2012
 .Dt UPLCOM 4
 .Os
 .Sh NAME
@@ -118,6 +118,8 @@ Microsoft Palm 700WX
 .It
 Mobile Action MA-620 Infrared Adapter
 .It
+Motorola Cables
+.It
 Nokia CA-42 Cable
 .It
 OTI DKU-5 cable
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r238755 - head/sys/x86/x86

2012-07-25 Thread Jung-uk Kim
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 2012-07-25 10:44:04 -0400, Bruce Evans wrote:
> On Wed, 25 Jul 2012, Andriy Gapon wrote:
> 
>> on 25/07/2012 13:21 Konstantin Belousov said the following:
>>> ... diff --git a/sys/x86/x86/tsc.c b/sys/x86/x86/tsc.c index
>>> 085c339..229b351 100644 --- a/sys/x86/x86/tsc.c +++
>>> b/sys/x86/x86/tsc.c @@ -594,6 +594,7 @@ static u_int 
>>> tsc_get_timecount(struct timecounter *tc __unused) {
>>> 
>>> +rmb(); return (rdtsc32()); }
>> 
>> This makes sense to me.  We probably want correctness over
>> performance here. [BTW, I originally thought that the change was
>> here; brain malfunction]
> 
> And I liked the original change because it wasn't here :-).
> 
>>> @@ -602,8 +603,9 @@ tsc_get_timecount_low(struct timecounter
>>> *tc) { uint32_t rv;
>>> 
>>> +rmb(); __asm __volatile("rdtsc; shrd %%cl, %%edx, %0" -
>>> : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx"); +
>>> : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx"); return
>>> (rv); }
>>> 
>> 
>> It would correct here too, but not sure if it would make any 
>> difference given that some lower bits are discarded anyway.
>> Probably depends on exact CPU.
> 
> It is needed to pessimize this too. :-)
> 
> As I have complained before, the loss of resolution from the shift
> is easy to see by reading the time from userland, even with syscall
> overhead taking 10-20 times longer than the read.  On core2 with
> TSC-low, a clock- checking utility gives:
> 
> % min 481, max 12031, mean 530.589452, std 51.633626 % 1th: 550
> (1296487 observations) % 2th: 481 (448425 observations) % 3th: 482
> (142650 observations) % 4th: 549 (61945 observations) % 5th: 551
> (47619 observations)
> 
> The numbers are diffences in nanoseconds measured by
> clock_gettime(). The jump from 481 to 549 is 68.  From this I can
> tell that the clock frequency is 1.86 Ghz and the shift is 128, or
> the clock frequency is 3.72 Ghz and the shift is 256.
> 
> On AthlonXP with TSC:
> 
> % min 273, max 29075, mean 274.412811, std 80.425963 % 1th: 273
> (853962 observations) % 2th: 274 (745606 observations) % 3th: 275
> (400212 observations) % 4th: 276 (20 observations) % 5th: 280 (10
> observations)
> 
> Now the numbers cluster about the mean.  Although syscalls take
> much longer than the loss of resolution with TSC-low, and even the
> core2 TSC takes almost as long to read as the loss, it is still
> possible to see things happening at the limits of the resolution
> (~0.5 nsec).
> 
>> And, oh hmm, I read AMD Software Optimization Guide for AMD
>> Family 10h Processors and they suggest using cpuid (with a note
>> that it may be intercepted in virtualized environments) or
>> _mfence_ in the discussed role (Appendix F of the document). 
>> Googling for 'rdtsc mfence lfence' yields some interesting
>> results.
> 
> The second hit was for the shrd pessimization/loss of resolution
> and a memory access hack in lkml in 2011.  I now seem to remember
> jkim mentioning the memory access hack.  rmb() on i386 has a
> related memory access hack, but now with a lock prefix that defeats
> the point of the 2011 hack (it wanted to save 5 nsec by removing
> fences).  rmb() on amd64 uses lfence.

I believe I mentioned this thread at the time:

https://patchwork.kernel.org/patch/691712/

FYI, r238755 is essentially this commit for Linux:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=93ce99e849433ede4ce8b410b749dc0cad1100b2

> Some of the other hits are a bit old.  The 8th one was by me in
> the thread about kib@ implementing gettimeofday() in userland.

Since we have gettimeofday() in userland, the above Linux thread is
more relevant now, I guess.

Jung-uk Kim
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.19 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAlAQLJQACgkQmlay1b9qnVMR8ACglzKrNWGeYJeqRhHQmna5stQQ
qM4AoKn4xdey8nglvdVm7UiQ1NZRr81E
=15v+
-END PGP SIGNATURE-
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r238755 - head/sys/x86/x86

2012-07-25 Thread Konstantin Belousov
On Thu, Jul 26, 2012 at 12:15:54AM +1000, Bruce Evans wrote:
> On Wed, 25 Jul 2012, Konstantin Belousov wrote:
> 
> >On Wed, Jul 25, 2012 at 10:20:02AM +0300, Andriy Gapon wrote:
> >>on 25/07/2012 01:10 Jim Harris said the following:
> >>>Author: jimharris
> >>>Date: Tue Jul 24 22:10:11 2012
> >>>New Revision: 238755
> >>>URL: http://svn.freebsd.org/changeset/base/238755
> >>>
> >>>Log:
> >>>  Add rmb() to tsc_read_##x to enforce serialization of rdtsc captures.
> >>>
> >>>  Intel Architecture Manual specifies that rdtsc instruction is not 
> >>>  serialized,
> >>>  so without this change, TSC synchronization test would periodically 
> >>>  fail,
> >>>  resulting in use of HPET timecounter instead of TSC-low.  This caused
> >>>  severe performance degradation (40-50%) when running high IO/s 
> >>>  workloads due to
> >>>  HPET MMIO reads and GEOM stat collection.
> >>>
> >>>  Tests on Xeon E5-2600 (Sandy Bridge) 8C systems were seeing TSC 
> >>>  synchronization
> >>>  fail approximately 20% of the time.
> >>
> >>Should rather the synchronization test be fixed if it's the culprit?
> >Synchronization test for what ?
> >
> >>Or is this change universally good for the real uses of TSC?
> 
> It's too slow for real uses.  But synchronization code, and some uses
> that requires serialization may need it for, er, synchronization and
> serialization.
> 
> It's hard to think of many uses that need serialization.  I often use
> it for timing instructions.  For timng a large number of instructions,
> serialization doesn't matter since errors of a few tens in a few billion
> done matter.  For timing a small number of instructions, I don't want
> serialization, since the serialization invalidates the timing.
> 
> Most uses in FreeBSD are for timecounters.  Timecounters deliver the
> current time.  This is unrelated to whatever instructions haven't
> completed when the TSC is read.  Except possibly when the time needs
> to be synchronized across CPUs, and when the uncompleted instruction
> is a TSC read.
> 
> >For tsc test, this means that after the change RDTSC executions are not
> >reordered on the single core among themself. As I understand, CPU has
> >no dependency noted between two reads of tsc by RDTSC, which allows
> >later read to give lower value of counter.
> 
> Gak.  Even when they are in the same instruction sequence?  Even though
> the TSC reads fixed registers and some other instructions in the sequence
> between the TSC use these registers?  The CPU would have to do significant
> register renaming to break this.
As I could only speculate, I believe that any modern CPU executes RDTSC
as at least two separate steps, one is read from internal counter, and
second is the registers update. It seems that the first kind of action
is not serialized. I have no other explanation for the Jim findings.

I also asked Jim to test whether the cause the TSC sync test failure
is the lack of synchronization between gathering data and tasting it,
but ut appeared that the reason is genuine timecounter value going
backward.

Sp the bug seems real, and I cannot imagine we will live with the known
defect in timecounters which can step back.
> 
> >This is fixed by Intel by
> >introduction of RDTSCP instruction, which is defined to be serialization
> >point, and use of which (instead of LFENCE; RDTSC sequence) also fixes
> >test, as confirmed by Jim.
> 
> This is not a fix if it is full serialization.  It just gives slowness
> using a single instruction instead of a couple.
> 
> >In fact, I now think that we should also apply the following patch.
> >Otherwise, consequtive calls to e.g. binuptime(9) could return decreased
> >time stamps. Note that libc __vdso_gettc.c already has LFENCE nearby the
> >tsc reads, which was done not for this reason, but apparently needed for
> >the reason too.
> >
> >diff --git a/sys/x86/x86/tsc.c b/sys/x86/x86/tsc.c
> >index 085c339..229b351 100644
> >--- a/sys/x86/x86/tsc.c
> >+++ b/sys/x86/x86/tsc.c
> >@@ -594,6 +594,7 @@ static u_int
> >tsc_get_timecount(struct timecounter *tc __unused)
> >{
> >
> >+rmb();
> > return (rdtsc32());
> >}
> 
> Please don't pessimize this further.  The time for rdtsc went from 6.5
> cycles on AthlonXP to 65 cycles on core2 (mainly for for
> P-state-invariance hardware synchronization I think).  Pretty soon it
> will be as slow as an HPET and heading towards an i8254.  Adding rmb()
> only makes it 12 cycles slower on core2, but 16 cycles (almost 3 times)
> slower on AthlonXP.
AthlonXP does not look as interesting target for optimizations. Fom what I
can find this is PIII-era CPU.

> 
> >@@ -602,8 +603,9 @@ tsc_get_timecount_low(struct timecounter *tc)
> >{
> > uint32_t rv;
> >
> >+rmb();
> > __asm __volatile("rdtsc; shrd %%cl, %%edx, %0"
> >-: "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx");
> >+: "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx");
> > return (rv);
> >}
> 
> The previous TSC-low/shrd pessimization adds only 2 

svn commit: r238776 - head/cddl/contrib/opensolaris/cmd/dtrace

2012-07-25 Thread George V. Neville-Neil
Author: gnn
Date: Wed Jul 25 17:49:01 2012
New Revision: 238776
URL: http://svn.freebsd.org/changeset/base/238776

Log:
  Revert previous commit.  The bug was actually caused by an issue
  in pre 1.8.5 versions of sudo which were sending too many
  SIGINTs to processes when the user hit Ctrl-C.
  
  Pointed out by:   avg@, rpaulo@, sbruno@

Modified:
  head/cddl/contrib/opensolaris/cmd/dtrace/dtrace.c

Modified: head/cddl/contrib/opensolaris/cmd/dtrace/dtrace.c
==
--- head/cddl/contrib/opensolaris/cmd/dtrace/dtrace.c   Wed Jul 25 17:42:57 
2012(r238775)
+++ head/cddl/contrib/opensolaris/cmd/dtrace/dtrace.c   Wed Jul 25 17:49:01 
2012(r238776)
@@ -70,8 +70,6 @@ typedef struct dtrace_cmd {
 #defineE_ERROR 1
 #defineE_USAGE 2
 
-#define IMPATIENT_LIMIT2
-
 static const char DTRACE_OPTSTR[] =
"3:6:aAb:Bc:CD:ef:FGhHi:I:lL:m:n:o:p:P:qs:SU:vVwx:X:Z";
 
@@ -1204,7 +1202,7 @@ intr(int signo)
if (!g_intr)
g_newline = 1;
 
-   if (g_intr++ > IMPATIENT_LIMIT)
+   if (g_intr++)
g_impatient = 1;
 }
 
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r238755 - head/sys/x86/x86

2012-07-25 Thread Jim Harris
On Wed, Jul 25, 2012 at 10:32 AM, Konstantin Belousov
 wrote:
> On Thu, Jul 26, 2012 at 12:15:54AM +1000, Bruce Evans wrote:
>> On Wed, 25 Jul 2012, Konstantin Belousov wrote:
>>
>> >On Wed, Jul 25, 2012 at 10:20:02AM +0300, Andriy Gapon wrote:
>> >>on 25/07/2012 01:10 Jim Harris said the following:
>> >>>Author: jimharris
>> >>>Date: Tue Jul 24 22:10:11 2012
>> >>>New Revision: 238755
>> >>>URL: http://svn.freebsd.org/changeset/base/238755
>> >>>
>> >>>Log:
>> >>>  Add rmb() to tsc_read_##x to enforce serialization of rdtsc captures.
>> >>>
>> >>>  Intel Architecture Manual specifies that rdtsc instruction is not
>> >>>  serialized,
>> >>>  so without this change, TSC synchronization test would periodically
>> >>>  fail,
>> >>>  resulting in use of HPET timecounter instead of TSC-low.  This caused
>> >>>  severe performance degradation (40-50%) when running high IO/s
>> >>>  workloads due to
>> >>>  HPET MMIO reads and GEOM stat collection.
>> >>>
>> >>>  Tests on Xeon E5-2600 (Sandy Bridge) 8C systems were seeing TSC
>> >>>  synchronization
>> >>>  fail approximately 20% of the time.
>> >>
>> >>Should rather the synchronization test be fixed if it's the culprit?
>> >Synchronization test for what ?
>> >
>> >>Or is this change universally good for the real uses of TSC?
>>
>> It's too slow for real uses.  But synchronization code, and some uses
>> that requires serialization may need it for, er, synchronization and
>> serialization.
>>
>> It's hard to think of many uses that need serialization.  I often use
>> it for timing instructions.  For timng a large number of instructions,
>> serialization doesn't matter since errors of a few tens in a few billion
>> done matter.  For timing a small number of instructions, I don't want
>> serialization, since the serialization invalidates the timing.
>>
>> Most uses in FreeBSD are for timecounters.  Timecounters deliver the
>> current time.  This is unrelated to whatever instructions haven't
>> completed when the TSC is read.  Except possibly when the time needs
>> to be synchronized across CPUs, and when the uncompleted instruction
>> is a TSC read.
>>
>> >For tsc test, this means that after the change RDTSC executions are not
>> >reordered on the single core among themself. As I understand, CPU has
>> >no dependency noted between two reads of tsc by RDTSC, which allows
>> >later read to give lower value of counter.
>>
>> Gak.  Even when they are in the same instruction sequence?  Even though
>> the TSC reads fixed registers and some other instructions in the sequence
>> between the TSC use these registers?  The CPU would have to do significant
>> register renaming to break this.
> As I could only speculate, I believe that any modern CPU executes RDTSC
> as at least two separate steps, one is read from internal counter, and
> second is the registers update. It seems that the first kind of action
> is not serialized. I have no other explanation for the Jim findings.
>
> I also asked Jim to test whether the cause the TSC sync test failure
> is the lack of synchronization between gathering data and tasting it,
> but ut appeared that the reason is genuine timecounter value going
> backward.

I wonder if instead of timecounter going backward, that TSC test
fails because CPU speculatively performs rdtsc instruction in relation
to waiter checks in smp_rendezvous_action.  Or maybe we are saying
the same thing.

>
> Sp the bug seems real, and I cannot imagine we will live with the known
> defect in timecounters which can step back.
>>
>> >This is fixed by Intel by
>> >introduction of RDTSCP instruction, which is defined to be serialization
>> >point, and use of which (instead of LFENCE; RDTSC sequence) also fixes
>> >test, as confirmed by Jim.
>>
>> This is not a fix if it is full serialization.  It just gives slowness
>> using a single instruction instead of a couple.
>>
>> >In fact, I now think that we should also apply the following patch.
>> >Otherwise, consequtive calls to e.g. binuptime(9) could return decreased
>> >time stamps. Note that libc __vdso_gettc.c already has LFENCE nearby the
>> >tsc reads, which was done not for this reason, but apparently needed for
>> >the reason too.
>> >
>> >diff --git a/sys/x86/x86/tsc.c b/sys/x86/x86/tsc.c
>> >index 085c339..229b351 100644
>> >--- a/sys/x86/x86/tsc.c
>> >+++ b/sys/x86/x86/tsc.c
>> >@@ -594,6 +594,7 @@ static u_int
>> >tsc_get_timecount(struct timecounter *tc __unused)
>> >{
>> >
>> >+rmb();
>> > return (rdtsc32());
>> >}
>>
>> Please don't pessimize this further.  The time for rdtsc went from 6.5
>> cycles on AthlonXP to 65 cycles on core2 (mainly for for
>> P-state-invariance hardware synchronization I think).  Pretty soon it
>> will be as slow as an HPET and heading towards an i8254.  Adding rmb()
>> only makes it 12 cycles slower on core2, but 16 cycles (almost 3 times)
>> slower on AthlonXP.
> AthlonXP does not look as interesting target for optimizations. Fom what I
> can find this is PIII-era CPU.
>
>>

Re: svn commit: r238755 - head/sys/x86/x86

2012-07-25 Thread Konstantin Belousov
On Wed, Jul 25, 2012 at 01:27:48PM -0400, Jung-uk Kim wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 2012-07-25 10:44:04 -0400, Bruce Evans wrote:
> > On Wed, 25 Jul 2012, Andriy Gapon wrote:
> > 
> >> on 25/07/2012 13:21 Konstantin Belousov said the following:
> >>> ... diff --git a/sys/x86/x86/tsc.c b/sys/x86/x86/tsc.c index
> >>> 085c339..229b351 100644 --- a/sys/x86/x86/tsc.c +++
> >>> b/sys/x86/x86/tsc.c @@ -594,6 +594,7 @@ static u_int 
> >>> tsc_get_timecount(struct timecounter *tc __unused) {
> >>> 
> >>> +rmb(); return (rdtsc32()); }
> >> 
> >> This makes sense to me.  We probably want correctness over
> >> performance here. [BTW, I originally thought that the change was
> >> here; brain malfunction]
> > 
> > And I liked the original change because it wasn't here :-).
> > 
> >>> @@ -602,8 +603,9 @@ tsc_get_timecount_low(struct timecounter
> >>> *tc) { uint32_t rv;
> >>> 
> >>> +rmb(); __asm __volatile("rdtsc; shrd %%cl, %%edx, %0" -
> >>> : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx"); +
> >>> : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx"); return
> >>> (rv); }
> >>> 
> >> 
> >> It would correct here too, but not sure if it would make any 
> >> difference given that some lower bits are discarded anyway.
> >> Probably depends on exact CPU.
> > 
> > It is needed to pessimize this too. :-)
> > 
> > As I have complained before, the loss of resolution from the shift
> > is easy to see by reading the time from userland, even with syscall
> > overhead taking 10-20 times longer than the read.  On core2 with
> > TSC-low, a clock- checking utility gives:
> > 
> > % min 481, max 12031, mean 530.589452, std 51.633626 % 1th: 550
> > (1296487 observations) % 2th: 481 (448425 observations) % 3th: 482
> > (142650 observations) % 4th: 549 (61945 observations) % 5th: 551
> > (47619 observations)
> > 
> > The numbers are diffences in nanoseconds measured by
> > clock_gettime(). The jump from 481 to 549 is 68.  From this I can
> > tell that the clock frequency is 1.86 Ghz and the shift is 128, or
> > the clock frequency is 3.72 Ghz and the shift is 256.
> > 
> > On AthlonXP with TSC:
> > 
> > % min 273, max 29075, mean 274.412811, std 80.425963 % 1th: 273
> > (853962 observations) % 2th: 274 (745606 observations) % 3th: 275
> > (400212 observations) % 4th: 276 (20 observations) % 5th: 280 (10
> > observations)
> > 
> > Now the numbers cluster about the mean.  Although syscalls take
> > much longer than the loss of resolution with TSC-low, and even the
> > core2 TSC takes almost as long to read as the loss, it is still
> > possible to see things happening at the limits of the resolution
> > (~0.5 nsec).
> > 
> >> And, oh hmm, I read AMD Software Optimization Guide for AMD
> >> Family 10h Processors and they suggest using cpuid (with a note
> >> that it may be intercepted in virtualized environments) or
> >> _mfence_ in the discussed role (Appendix F of the document). 
> >> Googling for 'rdtsc mfence lfence' yields some interesting
> >> results.
> > 
> > The second hit was for the shrd pessimization/loss of resolution
> > and a memory access hack in lkml in 2011.  I now seem to remember
> > jkim mentioning the memory access hack.  rmb() on i386 has a
> > related memory access hack, but now with a lock prefix that defeats
> > the point of the 2011 hack (it wanted to save 5 nsec by removing
> > fences).  rmb() on amd64 uses lfence.
> 
> I believe I mentioned this thread at the time:
> 
> https://patchwork.kernel.org/patch/691712/
> 
> FYI, r238755 is essentially this commit for Linux:
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=93ce99e849433ede4ce8b410b749dc0cad1100b2
> 
> > Some of the other hits are a bit old.  The 8th one was by me in
> > the thread about kib@ implementing gettimeofday() in userland.
> 
> Since we have gettimeofday() in userland, the above Linux thread is
> more relevant now, I guess.

For some unrelated reasons, we do have lfence;rdtsc sequence in the
userland already. Well, it is not exactly such sequence, there are
some instructions between, but the main fact is that two consequtive
invocations of gettimeofday(2) (*) or clock_gettime(2) are interleaved
with lfence on Intels, guaranteeing that backstep of the counter is
impossible.

* - it is not a syscall anymore.

As I said, using recommended mfence;rdtsc sequence for AMDs would require
some work, but lets handle the kernel and userspace issues separately.

And, I really failed to find what the patch from the thread you referenced
tried to fix. Was it really committed into Linux ?

I see actual problem of us allowing timecounters going back, and a
solution that exactly follows words of both Intel and AMD documentation.
This is good one step forward IMHO.


pgpLM7uVvkfPK.pgp
Description: PGP signature


Re: svn commit: r238755 - head/sys/x86/x86

2012-07-25 Thread Jung-uk Kim
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 2012-07-25 14:05:37 -0400, Konstantin Belousov wrote:
> On Wed, Jul 25, 2012 at 01:27:48PM -0400, Jung-uk Kim wrote:
>> -BEGIN PGP SIGNED MESSAGE- Hash: SHA1
>> 
>> On 2012-07-25 10:44:04 -0400, Bruce Evans wrote:
>>> On Wed, 25 Jul 2012, Andriy Gapon wrote:
>>> 
 on 25/07/2012 13:21 Konstantin Belousov said the following:
> ... diff --git a/sys/x86/x86/tsc.c b/sys/x86/x86/tsc.c
> index 085c339..229b351 100644 --- a/sys/x86/x86/tsc.c +++ 
> b/sys/x86/x86/tsc.c @@ -594,6 +594,7 @@ static u_int 
> tsc_get_timecount(struct timecounter *tc __unused) {
> 
> +rmb(); return (rdtsc32()); }
 
 This makes sense to me.  We probably want correctness over 
 performance here. [BTW, I originally thought that the change
 was here; brain malfunction]
>>> 
>>> And I liked the original change because it wasn't here :-).
>>> 
> @@ -602,8 +603,9 @@ tsc_get_timecount_low(struct
> timecounter *tc) { uint32_t rv;
> 
> +rmb(); __asm __volatile("rdtsc; shrd %%cl, %%edx, %0"
> - : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx");
> + : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx");
> return (rv); }
> 
 
 It would correct here too, but not sure if it would make any
  difference given that some lower bits are discarded anyway. 
 Probably depends on exact CPU.
>>> 
>>> It is needed to pessimize this too. :-)
>>> 
>>> As I have complained before, the loss of resolution from the
>>> shift is easy to see by reading the time from userland, even
>>> with syscall overhead taking 10-20 times longer than the read.
>>> On core2 with TSC-low, a clock- checking utility gives:
>>> 
>>> % min 481, max 12031, mean 530.589452, std 51.633626 % 1th:
>>> 550 (1296487 observations) % 2th: 481 (448425 observations) %
>>> 3th: 482 (142650 observations) % 4th: 549 (61945 observations)
>>> % 5th: 551 (47619 observations)
>>> 
>>> The numbers are diffences in nanoseconds measured by 
>>> clock_gettime(). The jump from 481 to 549 is 68.  From this I
>>> can tell that the clock frequency is 1.86 Ghz and the shift is
>>> 128, or the clock frequency is 3.72 Ghz and the shift is 256.
>>> 
>>> On AthlonXP with TSC:
>>> 
>>> % min 273, max 29075, mean 274.412811, std 80.425963 % 1th:
>>> 273 (853962 observations) % 2th: 274 (745606 observations) %
>>> 3th: 275 (400212 observations) % 4th: 276 (20 observations) %
>>> 5th: 280 (10 observations)
>>> 
>>> Now the numbers cluster about the mean.  Although syscalls
>>> take much longer than the loss of resolution with TSC-low, and
>>> even the core2 TSC takes almost as long to read as the loss, it
>>> is still possible to see things happening at the limits of the
>>> resolution (~0.5 nsec).
>>> 
 And, oh hmm, I read AMD Software Optimization Guide for AMD 
 Family 10h Processors and they suggest using cpuid (with a
 note that it may be intercepted in virtualized environments)
 or _mfence_ in the discussed role (Appendix F of the
 document). Googling for 'rdtsc mfence lfence' yields some
 interesting results.
>>> 
>>> The second hit was for the shrd pessimization/loss of
>>> resolution and a memory access hack in lkml in 2011.  I now
>>> seem to remember jkim mentioning the memory access hack.  rmb()
>>> on i386 has a related memory access hack, but now with a lock
>>> prefix that defeats the point of the 2011 hack (it wanted to
>>> save 5 nsec by removing fences).  rmb() on amd64 uses lfence.
>> 
>> I believe I mentioned this thread at the time:
>> 
>> https://patchwork.kernel.org/patch/691712/
>> 
>> FYI, r238755 is essentially this commit for Linux:
>> 
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=93ce99e849433ede4ce8b410b749dc0cad1100b2
>>
>>>
>> 
Some of the other hits are a bit old.  The 8th one was by me in
>>> the thread about kib@ implementing gettimeofday() in userland.
>> 
>> Since we have gettimeofday() in userland, the above Linux thread
>> is more relevant now, I guess.
> 
> For some unrelated reasons, we do have lfence;rdtsc sequence in
> the userland already. Well, it is not exactly such sequence, there
> are some instructions between, but the main fact is that two
> consequtive invocations of gettimeofday(2) (*) or clock_gettime(2)
> are interleaved with lfence on Intels, guaranteeing that backstep
> of the counter is impossible.
> 
> * - it is not a syscall anymore.
> 
> As I said, using recommended mfence;rdtsc sequence for AMDs would
> require some work, but lets handle the kernel and userspace issues
> separately.

Agreed.

> And, I really failed to find what the patch from the thread you
> referenced tried to fix.

The patch was supposed to reduce a barrier, i.e., vsyscall
optimization.  Please note I brought it up at the time, not because it
fixed any problem but because we completely lack necessary serialization.

> Was it really committed into Linux ?

Yes, it was committed in 

Re: svn commit: r238755 - head/sys/x86/x86

2012-07-25 Thread Konstantin Belousov
On Wed, Jul 25, 2012 at 11:00:41AM -0700, Jim Harris wrote:
> On Wed, Jul 25, 2012 at 10:32 AM, Konstantin Belousov
>  wrote:
> > I also asked Jim to test whether the cause the TSC sync test failure
> > is the lack of synchronization between gathering data and tasting it,
> > but ut appeared that the reason is genuine timecounter value going
> > backward.
> 
> I wonder if instead of timecounter going backward, that TSC test
> fails because CPU speculatively performs rdtsc instruction in relation
> to waiter checks in smp_rendezvous_action.  Or maybe we are saying
> the same thing.

Ok, the definition of the 'timecounter goes back', as I understand it:

you have two events A and B in two threads, provable ordered, say, A is
a lock release and B is the same lock acquisition. Assume that you take
rdtsc values tA and tB under the scope of the lock right before A and
right after B. Then it should be impossible to have tA > tB.

I do not think that we can ever observe tA > tB if both threads are
executing on the same CPU.


pgpJR10ercccV.pgp
Description: PGP signature


svn commit: r238778 - head/sys/dev/usb/serial

2012-07-25 Thread Gavin Atkinson
Author: gavin
Date: Wed Jul 25 20:46:22 2012
New Revision: 238778
URL: http://svn.freebsd.org/changeset/base/238778

Log:
  The baud rate on CP1201/2/3 devices can be set in one of two ways:
   - The USLCOM_SET_BAUD_DIV command (0x01)
   - The USLCOM_SET_BAUD_RATE command (0x13)
  
  Devices based on the CP1204 will only accept the latter command, and ignore
  the former.  As the latter command works on all chips that this driver
  supports, switch to always using it.
  
  A slight confusion here is that the previously used command was incorrectly
  named USLCOM_BAUD_RATE - even though we no longer use it, rename it to
  USLCOM_SET_BAUD_DIV to closer match the name used in the datasheet.
  
  This change reflects a similar change made in the Linux driver, which was
  submitted by preston.fick at silabs.com, and has been tested on all of the
  uslcom(4) devices I have to hand.
  
  MFC after:2 weeks

Modified:
  head/sys/dev/usb/serial/uslcom.c

Modified: head/sys/dev/usb/serial/uslcom.c
==
--- head/sys/dev/usb/serial/uslcom.cWed Jul 25 19:18:28 2012
(r238777)
+++ head/sys/dev/usb/serial/uslcom.cWed Jul 25 20:46:22 2012
(r238778)
@@ -70,12 +70,13 @@ SYSCTL_INT(_hw_usb_uslcom, OID_AUTO, deb
 
 /* Request codes */
 #defineUSLCOM_UART 0x00
-#defineUSLCOM_BAUD_RATE0x01
+#defineUSLCOM_SET_BAUD_DIV 0x01
 #defineUSLCOM_DATA 0x03
 #defineUSLCOM_BREAK0x05
 #defineUSLCOM_CTRL 0x07
 #defineUSLCOM_RCTRL0x08
 #defineUSLCOM_SET_FLOWCTRL 0x13
+#defineUSLCOM_SET_BAUD_RATE0x1e
 #defineUSLCOM_VENDOR_SPECIFIC  0xff
 
 /* USLCOM_UART values */
@@ -92,8 +93,8 @@ SYSCTL_INT(_hw_usb_uslcom, OID_AUTO, deb
 #defineUSLCOM_CTRL_RI  0x0040
 #defineUSLCOM_CTRL_DCD 0x0080
 
-/* USLCOM_BAUD_RATE values */
-#defineUSLCOM_BAUD_REF 0x384000
+/* USLCOM_SET_BAUD_DIV values */
+#defineUSLCOM_BAUD_REF 3686400 /* 3.6864 MHz */
 
 /* USLCOM_DATA values */
 #defineUSLCOM_STOP_BITS_1  0x00
@@ -511,19 +512,20 @@ uslcom_param(struct ucom_softc *ucom, st
 {
struct uslcom_softc *sc = ucom->sc_parent;
struct usb_device_request req;
-   uint32_t flowctrl[4];
+   uint32_t baudrate, flowctrl[4];
uint16_t data;
 
DPRINTF("\n");
 
+   baudrate = t->c_ospeed;
req.bmRequestType = USLCOM_WRITE;
-   req.bRequest = USLCOM_BAUD_RATE;
-   USETW(req.wValue, USLCOM_BAUD_REF / t->c_ospeed);
+   req.bRequest = USLCOM_SET_BAUD_RATE;
+   USETW(req.wValue, 0);
USETW(req.wIndex, USLCOM_PORT_NO);
-   USETW(req.wLength, 0);
+   USETW(req.wLength, sizeof(baudrate));
 
-if (ucom_cfg_do_request(sc->sc_udev, &sc->sc_ucom, 
-   &req, NULL, 0, 1000)) {
+   if (ucom_cfg_do_request(sc->sc_udev, &sc->sc_ucom, 
+   &req, &baudrate, 0, 1000)) {
DPRINTF("Set baudrate failed (ignored)\n");
}
 
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r238779 - head/sys/dev/usb

2012-07-25 Thread Gavin Atkinson
Author: gavin
Date: Wed Jul 25 21:32:55 2012
New Revision: 238779
URL: http://svn.freebsd.org/changeset/base/238779

Log:
  Add vendor.product for a mouse I have laying around

Modified:
  head/sys/dev/usb/usbdevs

Modified: head/sys/dev/usb/usbdevs
==
--- head/sys/dev/usb/usbdevsWed Jul 25 20:46:22 2012(r238778)
+++ head/sys/dev/usb/usbdevsWed Jul 25 21:32:55 2012(r238779)
@@ -672,6 +672,7 @@ vendor STELERA  0x1a8d  Stelera Wireless
 vendor MATRIXORBITAL   0x1b3d  Matrix Orbital
 vendor OVISLINK0x1b75  OvisLink
 vendor TCTMOBILE   0x1bbb  TCT Mobile
+vendor SUNPLUS 0x1bcf  Sunplus Innovation Technology Inc.
 vendor WAGO0x1be3  WAGO Kontakttechnik GmbH.
 vendor TELIT   0x1bc7  Telit
 vendor LONGCHEER   0x1c9e  Longcheer Holdings, Ltd.
@@ -3268,6 +3269,9 @@ product SUN KEYBOARD_TYPE_7   0x00a2  Type 
 product SUN MOUSE  0x0100  Type 6 USB mouse
 product SUN KBD_HUB0x100e  Kbd Hub
 
+/* Sunplus Innovation Technology Inc. products */
+product SUNPLUS USBMOUSE   0x0007  USB Optical Mouse
+
 /* Super Top products */
 productSUPERTOP IDE0x6600  USB-IDE
 
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r238780 - head/usr.bin/find

2012-07-25 Thread Jilles Tjoelker
Author: jilles
Date: Wed Jul 25 21:59:10 2012
New Revision: 238780
URL: http://svn.freebsd.org/changeset/base/238780

Log:
  find: Implement real -ignore_readdir_race.
  
  If -ignore_readdir_race is present, [ENOENT] errors caused by deleting a
  file after find has read its name from a directory are ignored.
  
  Formerly, -ignore_readdir_race did nothing.
  
  PR:   bin/169723
  Submitted by: Valery Khromov and Andrey Ignatov

Modified:
  head/usr.bin/find/extern.h
  head/usr.bin/find/find.1
  head/usr.bin/find/find.c
  head/usr.bin/find/function.c
  head/usr.bin/find/main.c
  head/usr.bin/find/option.c

Modified: head/usr.bin/find/extern.h
==
--- head/usr.bin/find/extern.h  Wed Jul 25 21:32:55 2012(r238779)
+++ head/usr.bin/find/extern.h  Wed Jul 25 21:59:10 2012(r238780)
@@ -58,6 +58,7 @@ creat_f   c_flags;
 creat_fc_follow;
 creat_fc_fstype;
 creat_fc_group;
+creat_fc_ignore_readdir_race;
 creat_fc_inum;
 creat_fc_links;
 creat_fc_ls;
@@ -111,7 +112,8 @@ exec_f  f_size;
 exec_f f_type;
 exec_f f_user;
 
-extern int ftsoptions, isdeprecated, isdepth, isoutput, issort, isxargs;
+extern int ftsoptions, ignore_readdir_race, isdeprecated, isdepth, isoutput;
+extern int issort, isxargs;
 extern int mindepth, maxdepth;
 extern int regexp_flags;
 extern time_t now;

Modified: head/usr.bin/find/find.1
==
--- head/usr.bin/find/find.1Wed Jul 25 21:32:55 2012(r238779)
+++ head/usr.bin/find/find.1Wed Jul 25 21:59:10 2012(r238780)
@@ -31,7 +31,7 @@
 .\"@(#)find.1  8.7 (Berkeley) 5/9/95
 .\" $FreeBSD$
 .\"
-.Dd June 13, 2012
+.Dd July 25, 2012
 .Dt FIND 1
 .Os
 .Sh NAME
@@ -470,7 +470,9 @@ is numeric and there is no such group na
 .Ar gname
 is treated as a group ID.
 .It Ic -ignore_readdir_race
-This option is for GNU find compatibility and is ignored.
+Ignore errors because a file or a directory is deleted
+after reading the name from a directory.
+This option does not affect errors occurring on starting points.
 .It Ic -ilname Ar pattern
 Like
 .Ic -lname ,
@@ -618,7 +620,9 @@ is equivalent to
 .It Ic -nogroup
 True if the file belongs to an unknown group.
 .It Ic -noignore_readdir_race
-This option is for GNU find compatibility and is ignored.
+Turn off the effect of
+.Ic -ignore_readdir_race .
+This is default behaviour.
 .It Ic -noleaf
 This option is for GNU find compatibility.
 In GNU find it disables an optimization not relevant to

Modified: head/usr.bin/find/find.c
==
--- head/usr.bin/find/find.cWed Jul 25 21:32:55 2012(r238779)
+++ head/usr.bin/find/find.cWed Jul 25 21:59:10 2012(r238780)
@@ -197,8 +197,12 @@ find_execute(PLAN *plan, char *paths[])
continue;
break;
case FTS_DNR:
-   case FTS_ERR:
case FTS_NS:
+   if (ignore_readdir_race &&
+   entry->fts_errno == ENOENT && entry->fts_level > 0)
+   continue;
+   /* FALLTHROUGH */
+   case FTS_ERR:
(void)fflush(stdout);
warnx("%s: %s",
entry->fts_path, strerror(entry->fts_errno));
@@ -228,7 +232,7 @@ find_execute(PLAN *plan, char *paths[])
for (p = plan; p && (p->execute)(p, entry); p = p->next);
}
finish_execplus();
-   if (errno)
+   if (errno && (!ignore_readdir_race || errno != ENOENT))
err(1, "fts_read");
return (rval);
 }

Modified: head/usr.bin/find/function.c
==
--- head/usr.bin/find/function.cWed Jul 25 21:32:55 2012
(r238779)
+++ head/usr.bin/find/function.cWed Jul 25 21:59:10 2012
(r238780)
@@ -975,6 +975,25 @@ c_group(OPTION *option, char ***argvp)
 }
 
 /*
+ * -ignore_readdir_race functions --
+ *
+ * Always true. Ignore errors which occur if a file or a directory
+ * in a starting point gets deleted between reading the name and calling
+ * stat on it while find is traversing the starting point.
+ */
+
+PLAN *
+c_ignore_readdir_race(OPTION *option, char ***argvp __unused)
+{
+   if (strcmp(option->name, "-ignore_readdir_race") == 0)
+   ignore_readdir_race = 1;
+   else
+   ignore_readdir_race = 0;
+
+   return palloc(option);
+}
+
+/*
  * -inum n functions --
  *
  * True if the file has inode # n.

Modified: head/usr.bin/find/main.c
==
--- head/usr.bin/find/main.cWed Jul 25 21:32:55 2012(r23877

svn commit: r238781 - head/lib/libc/locale

2012-07-25 Thread Isabell Long
Author: issyl0 (doc committer)
Date: Wed Jul 25 22:17:44 2012
New Revision: 238781
URL: http://svn.freebsd.org/changeset/base/238781

Log:
  Add a new man page containing details of new locale-specific functions for
  wctype.h, iswalnum_l(3).  Add it and its functions to the Makefile.
  
  Reviewed by:  gavin, jilles
  Approved by:  theraven
  MFC after:5 days

Added:
  head/lib/libc/locale/iswalnum_l.3   (contents, props changed)
Modified:
  head/lib/libc/locale/Makefile.inc

Modified: head/lib/libc/locale/Makefile.inc
==
--- head/lib/libc/locale/Makefile.inc   Wed Jul 25 21:59:10 2012
(r238780)
+++ head/lib/libc/locale/Makefile.inc   Wed Jul 25 22:17:44 2012
(r238781)
@@ -30,7 +30,8 @@ MAN+= btowc.3 \
ctype.3 digittoint.3 isalnum.3 isalpha.3 isascii.3 isblank.3 iscntrl.3 \
isdigit.3 isgraph.3 isideogram.3 islower.3 isphonogram.3 isprint.3 \
ispunct.3 isrune.3 isspace.3 isspecial.3 \
-   isupper.3 iswalnum.3 isxdigit.3 localeconv.3 mblen.3 mbrlen.3 \
+   isupper.3 iswalnum.3 iswalnum_l.3 isxdigit.3 \
+   localeconv.3 mblen.3 mbrlen.3 \
mbrtowc.3 \
mbsinit.3 \
mbsrtowcs.3 mbstowcs.3 mbtowc.3 multibyte.3 \
@@ -53,6 +54,18 @@ MLINKS+=iswalnum.3 iswalpha.3 iswalnum.3
iswalnum.3 iswphonogram.3 iswalnum.3 iswprint.3 iswalnum.3 iswpunct.3 \
iswalnum.3 iswrune.3 iswalnum.3 iswspace.3 iswalnum.3 iswspecial.3 \
iswalnum.3 iswupper.3 iswalnum.3 iswxdigit.3
+MLINKS+=iswalnum_l.3 iswalpha_l.3 iswalnum_l.3 iswcntrl_l.3 \
+   iswalnum_l.3 iswctype_l.3 iswalnum_l.3 iswdigit_l.3 \
+   iswalnum_l.3 iswgraph_l.3 iswalnum_l.3 iswlower_l.3 \
+   iswalnum_l.3 iswprint_l.3 iswalnum_l.3 iswpunct_l.3 \
+   iswalnum_l.3 iswspace_l.3 iswalnum_l.3 iswupper_l.3 \
+   iswalnum_l.3 iswxdigit_l.3 iswalnum_l.3 towlower_l.3 \
+   iswalnum_l.3 towupper_l.3 iswalnum_l.3 wctype_l.3 \
+   iswalnum_l.3 iswblank_l.3 iswalnum_l.3 iswhexnumber_l.3 \
+   iswalnum_l.3 iswideogram_l.3 iswalnum_l.3 iswnumber_l.3 \
+   iswalnum_l.3 iswphonogram_l.3 iswalnum_l.3 iswrune_l.3 \
+   iswalnum_l.3 iswspecial_l.3 iswalnum_l.3 nextwctype_l.3 \
+   iswalnum_l.3 towctrans_l.3 iswalnum_l.3 wctrans_l.3
 MLINKS+=isxdigit.3 ishexnumber.3
 MLINKS+=mbsrtowcs.3 mbsnrtowcs.3
 MLINKS+=wcsrtombs.3 wcsnrtombs.3

Added: head/lib/libc/locale/iswalnum_l.3
==
--- /dev/null   00:00:00 1970   (empty, because file is newly added)
+++ head/lib/libc/locale/iswalnum_l.3   Wed Jul 25 22:17:44 2012
(r238781)
@@ -0,0 +1,168 @@
+.\" Copyright (c) 2012 Isabell Long 
+.\" All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\"notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\"notice, this list of conditions and the following disclaimer in the
+.\"documentation and/or other materials provided with the distribution.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.\" $FreeBSD$
+.\"
+.Dt ISWALNUM_L 3
+.Dd July 25, 2012
+.Os
+.Sh NAME
+.Nm iswalnum_l ,
+.Nm iswalpha_l ,
+.Nm iswcntrl_l ,
+.Nm iswctype_l ,
+.Nm iswdigit_l ,
+.Nm iswgraph_l ,
+.Nm iswlower_l ,
+.Nm iswprint_l ,
+.Nm iswpunct_l ,
+.Nm iswspace_l ,
+.Nm iswupper_l ,
+.Nm iswxdigit_l ,
+.Nm towlower_l ,
+.Nm towupper_l ,
+.Nm wctype_l ,
+.Nm iswblank_l ,
+.Nm iswhexnumber_l ,
+.Nm iswideogram_l ,
+.Nm iswnumber_l ,
+.Nm iswphonogram_l ,
+.Nm iswrune_l ,
+.Nm iswspecial_l ,
+.Nm nextwctype_l ,
+.Nm towctrans_l ,
+.Nm wctrans_l
+.Nd wide character classification utilities
+.Sh LIBRARY
+.Lb libc
+.Sh SYNOPSIS
+.In wctype.h
+.Ft int
+.Fn iswalnum_l "wint_t wc" "locale_t loc"
+.Ft int
+.Fn iswalpha_l "wint_t wc" "locale_t loc"
+.Ft int
+.Fn iswcntrl_l "wint_t wc" "locale_t loc"
+.Ft int
+.Fn iswctype_l "wint_t wc" "locale_t loc"
+.Ft int
+.Fn iswdigit_l "wint_t wc" "locale_t loc"
+.Ft int
+.F

Re: svn commit: r238741 - head/lib/libelf

2012-07-25 Thread Garrett Cooper
Sent from my iPhone

On Jul 24, 2012, at 9:03 AM, "Andrey A. Chernov"  wrote:

> Author: ache
> Date: Tue Jul 24 16:03:28 2012
> New Revision: 238741
> URL: http://svn.freebsd.org/changeset/base/238741
> 
> Log:
>  Don't ever build files depending on the directory where they are placed in.
>  It is obvious that its modification time will change with each such file
>  builded.
>  This bug cause whole libelf to rebuild itself each second make run
>  (and relink that files on each first make run) in the loop.

A bunch of the sys/boot directories probably need this 
too..___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r238782 - head/lib/msun/src

2012-07-25 Thread Steve Kargl
Author: kargl
Date: Thu Jul 26 03:50:24 2012
New Revision: 238782
URL: http://svn.freebsd.org/changeset/base/238782

Log:
  Replace code that toggles between 53 and 64 bits on i386
  class hardware with the ENTERI and RETURNI macros, which
  are now available in math_private.h.
  
  Suggested by: bde
  Approved by: das (mentor)

Modified:
  head/lib/msun/src/s_cbrtl.c

Modified: head/lib/msun/src/s_cbrtl.c
==
--- head/lib/msun/src/s_cbrtl.c Wed Jul 25 22:17:44 2012(r238781)
+++ head/lib/msun/src/s_cbrtl.c Thu Jul 26 03:50:24 2012(r238782)
@@ -51,23 +51,12 @@ cbrtl(long double x)
if (k == BIAS + LDBL_MAX_EXP)
return (x + x);
 
-#ifdef __i386__
-   fp_prec_t oprec;
-
-   oprec = fpgetprec();
-   if (oprec != FP_PE)
-   fpsetprec(FP_PE);
-#endif
+   ENTERI();
 
if (k == 0) {
/* If x = +-0, then cbrt(x) = +-0. */
-   if ((u.bits.manh | u.bits.manl) == 0) {
-#ifdef __i386__
-   if (oprec != FP_PE)
-   fpsetprec(oprec);
-#endif
-   return (x);
-   }
+   if ((u.bits.manh | u.bits.manl) == 0)
+   RETURNI(x);
/* Adjust subnormal numbers. */
u.e *= 0x1.0p514;
k = u.bits.exp;
@@ -149,9 +138,5 @@ cbrtl(long double x)
t=t+t*r;/* error <= 0.5 + 0.5/3 + epsilon */
 
t *= v.e;
-#ifdef __i386__
-   if (oprec != FP_PE)
-   fpsetprec(oprec);
-#endif
-   return (t);
+   RETURNI(t);
 }
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r238783 - in head/lib/msun: ld128 ld80

2012-07-25 Thread Steve Kargl
Author: kargl
Date: Thu Jul 26 03:59:33 2012
New Revision: 238783
URL: http://svn.freebsd.org/changeset/base/238783

Log:
  * ld80/expl.c:
. Remove a few #ifdefs that should have been removed in the initial
  commit.
. Sort fpmath.h to its rightful place.
  
  * ld128/s_expl.c:
. Replace EXPMASK with its actual value.
. Sort fpmath.h to its rightful place.
  
  Requested by: bde
  Approved by:  das (mentor)

Modified:
  head/lib/msun/ld128/s_expl.c
  head/lib/msun/ld80/s_expl.c

Modified: head/lib/msun/ld128/s_expl.c
==
--- head/lib/msun/ld128/s_expl.cThu Jul 26 03:50:24 2012
(r238782)
+++ head/lib/msun/ld128/s_expl.cThu Jul 26 03:59:33 2012
(r238783)
@@ -29,12 +29,11 @@ __FBSDID("$FreeBSD$");
 
 #include 
 
+#include "fpmath.h"
 #include "math.h"
 #include "math_private.h"
-#include "fpmath.h"
 
 #defineBIAS(LDBL_MAX_EXP - 1)
-#defineEXPMASK (BIAS + LDBL_MAX_EXP)
 
 static volatile const long double twom1 = 0x1p-1L, tiny = 0x1p-1L;
 
@@ -205,7 +204,7 @@ expl(long double x)
/* Filter out exceptional cases. */
u.e = x;
hx = u.xbits.expsign;
-   ix = hx & EXPMASK;
+   ix = hx &  0x7fff;
if (ix >= BIAS + 13) {  /* |x| >= 8192 or x is NaN */
if (ix == BIAS + LDBL_MAX_EXP) {
if (u.xbits.manh != 0

Modified: head/lib/msun/ld80/s_expl.c
==
--- head/lib/msun/ld80/s_expl.c Thu Jul 26 03:50:24 2012(r238782)
+++ head/lib/msun/ld80/s_expl.c Thu Jul 26 03:59:33 2012(r238783)
@@ -45,13 +45,9 @@ __FBSDID("$FreeBSD$");
 #include 
 #endif
 
+#include "fpmath.h"
 #include "math.h"
-#defineFPSETPREC
-#ifdef NO_FPSETPREC
-#undef FPSETPREC
-#endif
 #include "math_private.h"
-#include "fpmath.h"
 
 #defineBIAS(LDBL_MAX_EXP - 1)
 
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r238784 - in head/lib/msun: ld128 ld80

2012-07-25 Thread Steve Kargl
Author: kargl
Date: Thu Jul 26 04:05:08 2012
New Revision: 238784
URL: http://svn.freebsd.org/changeset/base/238784

Log:
  Replace the macro name NUM with INTERVALS.  This change provides
  compatibility with the INTERVALS macro used in the soon-to-be-commmitted
  expm1l() and someday-to-be-committed log*l() functions.
  
  Add a comment into ld128/s_expl.c noting at gcc issue that was
  deleted when rewriting ld80/e_expl.c as ld128/s_expl.c.
  
  Requested by: bde
  Approved by:  das (mentor)

Modified:
  head/lib/msun/ld128/s_expl.c
  head/lib/msun/ld80/s_expl.c

Modified: head/lib/msun/ld128/s_expl.c
==
--- head/lib/msun/ld128/s_expl.cThu Jul 26 03:59:33 2012
(r238783)
+++ head/lib/msun/ld128/s_expl.cThu Jul 26 04:05:08 2012
(r238784)
@@ -35,6 +35,7 @@ __FBSDID("$FreeBSD$");
 
 #defineBIAS(LDBL_MAX_EXP - 1)
 
+/* XXX Prevent gcc from erroneously constant folding this: */
 static volatile const long double twom1 = 0x1p-1L, tiny = 0x1p-1L;
 
 static const long double
@@ -57,12 +58,12 @@ P9 = 2.755731922401038678178761995444688
 P10 = 2.75573236172670046201884000197885520e-7L,
 P11 = 2.50517544183909126492878226167697856e-8L;
 
-#defineNUM 128
+#defineINTERVALS   128
 
 static const struct {
long double hi;
long double lo;
-} s[NUM] = {
+} s[INTERVALS] = {
0x1p0L, 0x0p0L,
0x1.0163da9fb33356d84a66aep0L, 0x3.36dcdfa4003ec04c360be2404078p-92L,
0x1.02c9a3e778060ee6f7cacap0L, 0x4.f7a29bde93d70a2cabc5cb89ba10p-92L,
@@ -226,8 +227,8 @@ expl(long double x)
 
fn = x * INV_L + 0x1.8p112 - 0x1.8p112;
n  = (int)fn;
-   n2 = (unsigned)n % NUM; /* Tang's j. */
-   k = (n - n2) / NUM;
+   n2 = (unsigned)n % INTERVALS;   /* Tang's j. */
+   k = (n - n2) / INTERVALS;
r1 = x - fn * L1;
r2 = -fn * L2;
 

Modified: head/lib/msun/ld80/s_expl.c
==
--- head/lib/msun/ld80/s_expl.c Thu Jul 26 03:59:33 2012(r238783)
+++ head/lib/msun/ld80/s_expl.c Thu Jul 26 04:05:08 2012(r238784)
@@ -36,7 +36,7 @@ __FBSDID("$FreeBSD$");
  *   in IEEE floating-point arithmetic," ACM Trans. Math. Soft., 15,
  *   144-157 (1989).
  *
- * where the 32 table entries have been expanded to NUM (see below).
+ * where the 32 table entries have been expanded to INTERVALS (see below).
  */
 
 #include 
@@ -65,9 +65,9 @@ u_threshold = LD80C(0xb21dfe7f09e2baa9, 
 
 static const double __aligned(64)
 /*
- * ln2/NUM = L1+L2 (hi+lo decomposition for multiplication).  L1 must have
- * at least 22 (= log2(|LDBL_MIN_EXP-extras|) + log2(NUM)) lowest bits zero
- * so that multiplication of it by n is exact.
+ * ln2/INTERVALS = L1+L2 (hi+lo decomposition for multiplication).  L1 must
+ * have at least 22 (= log2(|LDBL_MIN_EXP-extras|) + log2(INTERVALS)) lowest
+ * bits zero so that multiplication of it by n is exact.
  */
 L1 =  5.4152123484527692e-3,   /*  0x162e42ff00.0p-60 */
 L2 = -3.2819649005320973e-13,  /* -0x1718432a1b0e26.0p-94 */
@@ -75,7 +75,7 @@ INV_L = 1.8466496523378731e+2,/*  0x17
 /*
  * Domain [-0.002708, 0.002708], range ~[-5.7136e-24, 5.7110e-24]:
  * |exp(x) - p(x)| < 2**-77.2
- * (0.002708 is ln2/(2*NUM) rounded up a little).
+ * (0.002708 is ln2/(2*INTERVALS) rounded up a little).
  */
 P2 =  0.5,
 P3 =  1.6119e-1,   /*  0x155490.0p-55 */
@@ -84,16 +84,16 @@ P5 =  8.354987869413e-3,/*  0x
 P6 =  1.391738560272e-3;   /*  0x16c16c651633ae.0p-62 */
 
 /*
- * 2^(i/NUM) for i in [0,NUM] is represented by two values where the
- * first 47 (?!) bits of the significand is stored in hi and the next 53
+ * 2^(i/INTERVALS) for i in [0,INTERVALS] is represented by two values where
+ * the first 47 (?!) bits of the significand is stored in hi and the next 53
  * bits are in lo.
  */
-#defineNUM 128
+#defineINTERVALS   128
 
 static const struct {
double  hi;
double  lo;
-} s[NUM] __aligned(16) = {
+} s[INTERVALS] __aligned(16) = {
0x1p+0, 0x0p+0,
0x1.0163da9fb330p+0, 0x1.ab6c25335719bp-47,
0x1.02c9a3e77804p+0, 0x1.07737be56527cp-47,
@@ -265,8 +265,8 @@ expl(long double x)
 #else
n  = (int)fn;
 #endif
-   n2 = (unsigned)n % NUM; /* Tang's j. */
-   k = (n - n2) / NUM;
+   n2 = (unsigned)n % INTERVALS;   /* Tang's j. */
+   k = (n - n2) / INTERVALS;
r1 = x - fn * L1;
r2 = -fn * L2;
 
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r238785 - head/sys/arm/conf

2012-07-25 Thread Warner Losh
Author: imp
Date: Thu Jul 26 05:35:10 2012
New Revision: 238785
URL: http://svn.freebsd.org/changeset/base/238785

Log:
  Update partitions to reflect "sam9 demo" defaults.
  Update i2c devices to just include the eeprom.
  Update dataflash chip select to be CS 1 (this doesn't work yet and
needs changes to at91_spi and the spibus infrastructure).
  Fix typo in comment.

Modified:
  head/sys/arm/conf/SAM9260EK
  head/sys/arm/conf/SAM9260EK.hints

Modified: head/sys/arm/conf/SAM9260EK
==
--- head/sys/arm/conf/SAM9260EK Thu Jul 26 04:05:08 2012(r238784)
+++ head/sys/arm/conf/SAM9260EK Thu Jul 26 05:35:10 2012(r238785)
@@ -17,12 +17,12 @@
 #
 # $FreeBSD$
 
-ident  ETHERNUT5
+ident  SAM9260EK
 
-include "../at91/std.ethernut5"
+include "../at91/std.sam9260ek"
 
 # To statically compile in device wiring instead of /boot/device.hints
-hints  "ETHERNUT5.hints"
+hints  "SAM9260EK.hints"
 
 #makeoptions   DEBUG=-g# Build kernel with gdb(1) debug symbols
 
@@ -103,13 +103,13 @@ devicebpf # Berkeley packet filter
 
 # Ethernet
 device mii # Minimal MII support
-device ate # Atmel AT91 Ethernet friver
+device ate # Atmel AT91 Ethernet driver
 
 # I2C
 device at91_twi# Atmel AT91 Two-wire Interface
 device iic # I2C generic I/O device driver
 device iicbus  # I2C bus system
-device pcf8563 # NXP PCF8563 clock/calendar
+device icee# I2C eeprom
 
 # MMC/SD
 device at91_mci# Atmel AT91 Multimedia Card Interface

Modified: head/sys/arm/conf/SAM9260EK.hints
==
--- head/sys/arm/conf/SAM9260EK.hints   Thu Jul 26 04:05:08 2012
(r238784)
+++ head/sys/arm/conf/SAM9260EK.hints   Thu Jul 26 05:35:10 2012
(r238785)
@@ -2,50 +2,48 @@
 
 # Atmel AT45DB21D
 hint.at45d.0.at="spibus0"
-hint.at45d.0.addr=0x00
-# user 132 kbytes
+hint.at45d.0.cs=1
+# Area 0:   to 41FF (RO) Bootstrap
+# Area 1:  4200 to 83FF  Environment
+# Area 2:  8400 to 00041FFF (RO) U-Boot
+# Area 3:  00042000 to 00251FFF  Kernel
+# Area 4:  00252000 to 0083  FS
+# bootstrap
 hint.map.0.at="flash/spi0"
 hint.map.0.start=0x
-hint.map.0.end=0x00020fff
-hint.map.0.name="user"
+hint.map.0.end=0x41ff
+hint.map.0.name="bootstrap"
 hint.map.0.readonly=1
-# setup 132 kbytes
+# uboot environment
 hint.map.1.at="flash/spi0"
-hint.map.1.start=0x00021000
-hint.map.1.end=0x00041fff
-hint.map.1.name="setup"
-hint.map.1.readonly=1
-# uboot 528 kbytes
+hint.map.1.start=0x4200
+hint.map.1.end=0x00083ff
+hint.map.1.name="uboot-env"
+#hint.map.1.readonly=1
+# uboot
 hint.map.2.at="flash/spi0"
-hint.map.2.start=0x00042000
-hint.map.2.end=0x000c5fff
+hint.map.2.start=0x8400
+hint.map.2.end=0x00041fff
 hint.map.2.name="uboot"
 hint.map.2.readonly=1
-# kernel 2640 kbytes
+# kernel
 hint.map.3.at="flash/spi0"
-hint.map.3.start=0x000c6000
-hint.map.3.end=0x00359fff
-hint.map.3.name="kernel"
+hint.map.3.start=0x00042000
+hint.map.3.end=0x00251fff
+hint.map.3.name="fs"
 #hint.map.3.readonly=1
-# nutos 528 kbytes
+# fs
 hint.map.4.at="flash/spi0"
-hint.map.4.start=0x0035a000
-hint.map.4.end=0x003ddfff
-hint.map.4.name="nutos"
-hint.map.4.readonly=1
-# env 132 kbytes
-hint.map.5.at="flash/spi0"
-hint.map.5.start=0x003de000
-hint.map.5.end=0x003fefff
-hint.map.5.name="env"
-hint.map.5.readonly=1
-# env 132 kbytes
-hint.map.6.at="flash/spi0"
-hint.map.6.start=0x003ff000
-hint.map.6.end=0x0041
-hint.map.6.name="nutoscfg"
-hint.map.6.readonly=1
+hint.map.4.start=0x00252000
+hint.map.4.end=0x0083
+hint.map.4.name="fs"
+#hint.map.4.readonly=1
+
+# EEPROM
+hint.icee.0.at="iicbus0"
+hint.icee.0.addr=0xa0
+hint.icee.0.type=16
+hint.icee.0.size=65536
+hint.icee.0.rd_sz=256
+hint.icee.0.wr_sz=256
 
-# NXP PCF8563 clock/calendar
-hint.pcf8563_rtc.0.at="iicbus0"
-hint.pcf8563_rtc.0.addr=0xa2
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r238786 - head/sys/arm/conf

2012-07-25 Thread Warner Losh
Author: imp
Date: Thu Jul 26 05:37:36 2012
New Revision: 238786
URL: http://svn.freebsd.org/changeset/base/238786

Log:
  Fix typo in comment.
  spibus uses cs= rather than addr=, so fix hints to use that (nop since
spibus cs defaults to 0, and at91_spi assumes 0).

Modified:
  head/sys/arm/conf/ETHERNUT5
  head/sys/arm/conf/ETHERNUT5.hints

Modified: head/sys/arm/conf/ETHERNUT5
==
--- head/sys/arm/conf/ETHERNUT5 Thu Jul 26 05:35:10 2012(r238785)
+++ head/sys/arm/conf/ETHERNUT5 Thu Jul 26 05:37:36 2012(r238786)
@@ -103,7 +103,7 @@ device  bpf # Berkeley packet filter
 
 # Ethernet
 device mii # Minimal MII support
-device ate # Atmel AT91 Ethernet friver
+device ate # Atmel AT91 Ethernet driver
 
 # I2C
 device at91_twi# Atmel AT91 Two-wire Interface

Modified: head/sys/arm/conf/ETHERNUT5.hints
==
--- head/sys/arm/conf/ETHERNUT5.hints   Thu Jul 26 05:35:10 2012
(r238785)
+++ head/sys/arm/conf/ETHERNUT5.hints   Thu Jul 26 05:37:36 2012
(r238786)
@@ -2,7 +2,7 @@
 
 # Atmel AT45DB21D
 hint.at45d.0.at="spibus0"
-hint.at45d.0.addr=0x00
+hint.at45d.0.cs=0
 # user 132 kbytes
 hint.map.0.at="flash/spi0"
 hint.map.0.start=0x
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r238787 - head/sys/arm/at91

2012-07-25 Thread Warner Losh
Author: imp
Date: Thu Jul 26 05:46:56 2012
New Revision: 238787
URL: http://svn.freebsd.org/changeset/base/238787

Log:
  Some models have 6 USARTS + DBGU.  Set a consistent name.

Modified:
  head/sys/arm/at91/uart_bus_at91usart.c

Modified: head/sys/arm/at91/uart_bus_at91usart.c
==
--- head/sys/arm/at91/uart_bus_at91usart.c  Thu Jul 26 05:37:36 2012
(r238786)
+++ head/sys/arm/at91/uart_bus_at91usart.c  Thu Jul 26 05:46:56 2012
(r238787)
@@ -95,6 +95,12 @@ usart_at91_probe(device_t dev)
case 4:
device_set_desc(dev, "USART3");
break;
+   case 5:
+   device_set_desc(dev, "USART4");
+   break;
+   case 6:
+   device_set_desc(dev, "USART5");
+   break;
}
sc->sc_class = &at91_usart_class;
if (sc->sc_class->uc_rclk == 0)
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r238755 - head/sys/x86/x86

2012-07-25 Thread Bruce Evans

On Wed, 25 Jul 2012, Konstantin Belousov wrote:


On Wed, Jul 25, 2012 at 11:00:41AM -0700, Jim Harris wrote:

On Wed, Jul 25, 2012 at 10:32 AM, Konstantin Belousov
 wrote:

I also asked Jim to test whether the cause the TSC sync test failure
is the lack of synchronization between gathering data and tasting it,
but ut appeared that the reason is genuine timecounter value going
backward.


I wonder if instead of timecounter going backward, that TSC test
fails because CPU speculatively performs rdtsc instruction in relation
to waiter checks in smp_rendezvous_action.  Or maybe we are saying
the same thing.


Ok, the definition of the 'timecounter goes back', as I understand it:

you have two events A and B in two threads, provable ordered, say, A is
a lock release and B is the same lock acquisition. Assume that you take
rdtsc values tA and tB under the scope of the lock right before A and
right after B. Then it should be impossible to have tA > tB.


For the threaded case, there has to something for the accesses to be
provably ordered.  It is hard to see how the something can be strong
enough unless it serializes all thread state in A and B.  The rdtsc
state is not part of the thread state as know to APIs, but it is hard
to see how threads can serialize themselves without also serializing
the TSC.

For most uses, the scope of the serialization and locking also needs
to extend across multiple timer reads.  Otherwise you can have situations
like:

read the time
interrupt or context switch
read later time in other intr handler/thread
save late time
back to previous context
save earlier time

It is unclear how to even prevent such situations.  You (at least, I)
don't want heavyweight locking/synchronization to prevent the context
switches.  And the kernel rarely if ever does such synchronization.
binuptime() has none internally.  It just spins if necessary until the
read becomes stable.  Most callers of binuptime() just call it.


I do not think that we can ever observe tA > tB if both threads are
executing on the same CPU.


I thought that that was the problem, with a single thread and no context
switches seeing the TSC go backwards.  Even then, it would take
non-useful behaviour (except for calibration and benchmarks) like
spinning executing rdtsc to see it going backwards.  Normally there
are many instructions between rdtsc's and the non-serialization isn't
as deep as that.  Using syscalls, you just can't read the timecounter
without about 1000 cycles between reads.  When there is a context switch,
there is usually accidental serialization from locking.

I care about timestamps being ordered more than most people, and tried
to kill the get*time() APIs because they are weakly ordered relative
to the non-get variants (they return times in the past, and there is
no way to round down to get consistent times).  I tried to fix them
by adding locking and updating them to the latest time whenever a
non-get variant gives a later time (by being used).  This was too slow,
and breaks the design criteria that timecounter calls should not use
any explicit locking.  However, if you want slowness, then you can get
it similarly by fixing the monotonicity of rdtsc in software.  I think
I just figured out how to do this with the same slowness as serialization,
if a locked instruction serialzes; maybe less otherwise:

spin:
ptsc = prev_tsc;/* memory -> local (intentionally !atomic) */
tsc = rdtsc();  /* only 32 bits for timecounters */
if (tsc <= ptsc) {   /* I forgot about wrap at first -- see below */
/*
 * It went backwards, or stopped.  Could handle more
 * completely, starting with panic() to see if this
 * happens at all.
 */
return (ptsc);  /* stopped is better than backwards */
}
/* Usual case; update (32 bits). */
if (atomic_cmpset_int(&prev_tsc, ptsc, tsc))
return (tsc);
goto spin;

The 32-bitness of timecounters is important for the algorithm, and for
efficiency on i386.  We assume that the !atomic read gives coherent
bits.  The value may be in the past.  When tsc <= ptsc, the value is
in the future, so value must be up to date, unless there is massive
non-seriality with another CPU having just written a value more up
to date than this CPU read.  We don't care about this, since losing
this race is no different from being preempted after we read.  When
tsc > ptsc, we want to write it as 32 bits to avoid the cmpxchg8b
slowness/unportability.  Again, the value may be out of date when
we try to update it, because we were preempted.  We don't care about
this either, as above, but detected some cases as a side effect of
checking that ptsc is up to date.  Normally ptsc was up to date when
it was read, but it could easly be out of date when it was che