Re: FILTER_SCHEDULE_THREAD is not a bit-value

2012-01-31 Thread Max Khon
Hello,

On Tue, Jan 31, 2012 at 12:44 AM, Ian Lepore
 wrote:

>> sys/bus.h documents the following semantics for FILTER_SCHEDULE_THREAD:
>>
>> /**
>>  * @brief Driver interrupt filter return values
>>  *
>>  * If a driver provides an interrupt filter routine it must return an
>>  * integer consisting of oring together zero or more of the following
>>                                  ^^^
>>  * flags:
>>  *
>>  *      FILTER_STRAY    - this device did not trigger the interrupt
>>  *      FILTER_HANDLED  - the interrupt has been fully handled and can be 
>> EOId
>>  *      FILTER_SCHEDULE_THREAD - the threaded interrupt handler should be
>>  *                        scheduled to execute
>>  *
>>  * If the driver does not provide a filter, then the interrupt code will
>>  * act is if the filter had returned FILTER_SCHEDULE_THREAD.  Note that it
>>  * is illegal to specify any other flag with FILTER_STRAY and that it is
>>  * illegal to not specify either of FILTER_HANDLED or FILTER_SCHEDULE_THREAD
>>  * if FILTER_STRAY is not specified.
>>  */
>> #define FILTER_STRAY            0x01
>> #define FILTER_HANDLED          0x02
>> #define FILTER_SCHEDULE_THREAD  0x04
>>
>> But actually FILTER_SCHEDULE_THREAD is not used as a bit-value (see
>> kern/kern_intr.c):
>>
>>                 if (!thread) {
>>                         if (ret == FILTER_SCHEDULE_THREAD)
>>                                 thread = 1;
>>                 }
>>
>> There is at least one in-tree driver that could be broken because of
>> this (asmc(8), but I found the problem with some other out-of-tree
>> driver).
>> This should be "if (ret & FILTER_SCHEDULE_THREAD)" instead. Attached
>> patch fixes the problem.
>>
>> What do you think?
>>
>> Max
>
> I think returning (FILTER_HANDLED | FILTER_SCHEDULE_THREAD) makes no
> sense given the definition "the interrupt has been fully handled and can
> be EOId".  If you EOI in the primary interrupt context and then schedule
> a threaded handler to run as well you're likely to need complex locking
> between the primary and threaded interrupt handlers and I was under the
> impression that's just the sort of thing the filter/threaded scheme was
> designed to avoid.

I see no sense here.
1) You would have to implement locking anyway to protect concurrent
access from ithread/filter and other driver methods (char device or
network device callbacks)

2) ithread and filter can already be executed simultaneously even when
only FILTER_SCHEDULE_THREAD is returned: when ithread is scheduled to
be executed the device can emit a new interrupt and it will be
preempted by filter

> In other words, the part about ORing together values seems to be staking
> out room for future growth, because the current set of flags and the
> words about how to use them imply that only one of the current set of
> values should be returned at once.

No, the text does not imply that only one of the values is supposed to
be returned (where did you see it). See also KASSERT checks in
intr_event_handle() -- they clearly show that the intention was to
allow FILTER_HANDLED and FILTER_SCHEDULE_THREAD to be returned
simultaneously.

Max
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: FreeBSD 10-CURRENT/amd64: revision 230789: [...]

2012-01-31 Thread O. Hartmann
On 01/31/12 00:14, Jack Vogel wrote:
> Yes, it was. Now if I can just figure out what's going on with sparc
> 
> Jack
> 
> 
> On Mon, Jan 30, 2012 at 3:11 PM, Glen Barber  > wrote:
> 
> On Mon, Jan 30, 2012 at 11:55:48PM +0100, O. Hartmann wrote:
> > The follwoing error occurs hwen trying to compile a kernel (make
> > buildworld works fine):
> >
> > objcopy --strip-debug if_ixgb.ko
> > ===> ixgbe (all)
> > clang -O2 -pipe -O2 -fno-strict-aliasing -pipe -pipe -O3
> > -fno-strict-aliasing -march=native -DSMP -DIXGBE_FDIR -D_KERNEL
> > -DKLD_MODULE -nostdinc  -I/usr/src/sys/modules/ixgbe/../../dev/ixgbe
> > -DHAVE_KERNEL_OPTION_HEADERS -include
> > /usr/obj/usr/src/sys/THOR/opt_global.h -I. -I@ -I@/contrib/altq
> > -fno-common  -fno-omit-frame-pointer -I/usr/obj/usr/src/sys/THOR
> > -mno-aes -mno-avx -mcmodel=kernel -mno-red-zone -mno-mmx -msoft-float
> > -fno-asynchronous-unwind-tables -ffreestanding -fstack-protector
> > -std=iso9899:1999 -fstack-protector -Wall -Wredundant-decls
> > -Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes
> > -Wpointer-arith -Winline -Wcast-qual  -Wundef -Wno-pointer-sign
> > -fformat-extensions  -Wmissing-include-dirs -fdiagnostics-show-option
> > -Wno-error-tautological-compare -Wno-error-empty-body
> > -Wno-error-parentheses-equality -c
> > /usr/src/sys/modules/ixgbe/../../dev/ixgbe/ixgbe.c
> > In file included from
> /usr/src/sys/modules/ixgbe/../../dev/ixgbe/ixgbe.c:40:
> > In file included from
> /usr/src/sys/modules/ixgbe/../../dev/ixgbe/ixgbe.h:96:
> > In file included from
> > /usr/src/sys/modules/ixgbe/../../dev/ixgbe/ixgbe_api.h:38:
> > In file included from
> > /usr/src/sys/modules/ixgbe/../../dev/ixgbe/ixgbe_type.h:38:
> > /usr/src/sys/modules/ixgbe/../../dev/ixgbe/ixgbe_osdep.h:109:19:
> error:
> > typedef redefinition with different types ('boolean_t' (aka 'int') vs
> > '_Bool')
> > typedef boolean_t   bool;
> > ^
> > @/sys/types.h:271:15: note: previous definition is here
> > typedef _Bool   bool;
> > ^
> > 1 error generated.
> > *** Error code 1
> >
> > Stop in /usr/src/sys/modules/ixgbe.
> > *** Error code 1
> >
> > Stop in /usr/src/sys/modules.
> > *** Error code 1
> >
> > Stop in /usr/obj/usr/src/sys/THOR.
> > *** Error code 1
> >
> > Stop in /usr/src.
> > *** Error code 1
> >
> > Stop in /usr/src.
> >
> 
> I believe this was just fixed:
> 
> http://svn.freebsd.org/changeset/base/230790
> 
> Glen


Thanks. Works fine again ;-)

Oliver

-- 
O. Hartmann
Freie Universität Berlin
FB Geologische Wissenschaften
FR Planetologie und Fernerkundung
Campus Lankwitz
Malteser-Str. 74 - 100/Haus D

12249 Berlin

Tel.: +49 (0) 30 838 70 508
FAX:  +49 (0) 30 838 70 837



signature.asc
Description: OpenPGP digital signature


Re: FILTER_SCHEDULE_THREAD is not a bit-value

2012-01-31 Thread John Baldwin
On Tuesday, January 31, 2012 3:57:26 am Max Khon wrote:
> No, the text does not imply that only one of the values is supposed to
> be returned (where did you see it). See also KASSERT checks in
> intr_event_handle() -- they clearly show that the intention was to
> allow FILTER_HANDLED and FILTER_SCHEDULE_THREAD to be returned
> simultaneously.

That was the original plan, but I now plan to no longer allow that.  I think I 
posted a thread to that effect on arch@ several months ago.  However, in some 
patches I have to rework ithreads, I remove the bitmask bits and make the 
return value a simple enum of distinct values.

Also, your patch is not really correct I think, note that that code is only
in the !INTR_FITLER case, and we don't allow you to OR together those two
values in the INTR_FILTER case.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [patch] nextboot(8) arbitrary kernel environment

2012-01-31 Thread John Baldwin
On Monday, January 30, 2012 2:57:22 pm Ed Maste wrote:
> I have a patch to allow nextboot(8) to set arbitrary kernel environment
> variables (not just the kernel dir and kernel_options).  The usage becomes:
> 
> Usage: nextboot [-e variable=value] [-f] [-k kernel] [-o options]
>nextboot -D
> 
> and the new option is documented as:
> 
>  -e variable=value
>  This option adds the provided variable and value to the ker-
>  nel environment.  The value is quoted when written to the
>  nextboot configuration.
> 
> The patch also makes -k an option (no longer mandatory).  The patch is at
> http://people.freebsd.org/~emaste/nextboot.diff .  I'll commit it in a few
> days if no concerns are raised by review or my testing.

Nice!  I like both of these features.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Does anyone try kib's Sandy Bridge PCID patch (pcid.2.patch)?

2012-01-31 Thread Konstantin Belousov
On Tue, Jan 31, 2012 at 09:23:50AM +0800, Paul Ambrose wrote:
> ?? 2012??1??31?? 12:43??Kostik Belousov  ??
> > On Mon, Jan 30, 2012 at 07:08:13PM +0800, Paul Ambrose wrote:
> >> ?? 2012??1??30?? 2:36??Kostik Belousov  ??
> >> > On Mon, Jan 30, 2012 at 10:15:51AM +0800, Paul Ambrose wrote:
> >> >> I have two boxes, one is  AMD Athlon 610e 2.4G with FreeBSD-current
> >> >> patched with pcid.2.patch? It works well without other issue and it
> >> >> seem the pcid patch
> >> >> does not affect other part of the kernel. The other one is Sandy
> >> > Athlons do not have PCID and probably will never implement it. They use
> >> > other tricks to get similar optimizations, transparently to the OS.
> >> >
> >> Just curious, is this AMD similar optimizations
> >>  Address Space Number (ASN) and Global flag
> >>   US Patent 6,604,187.
> >> http://www.chip-architect.com/news/2003_09_21_Detailed_Architecture_of_AMDs_64bit_Core.html
> > This and the same-important next item 'The TLB Flush Filter' is what
> > I referred to.
> >
> >> I did not found anything about ASN in the AMD manual
> > It is a transparent optimization, which does not require any OS support.
> > Intel PCID is completely different, it shall be explicitely handled by OS.
> > It is some consequence of the nested pages support, AFAIU.
> >
> >>
> >> >> Bridge i5-2300 with FreeBSD 9 release patched with pcid.1.patch( the
> >> >> pcid.2.patch seems
> >> >> dependent on AVX and XSAVE stuffs which is available on -current). But
> >> >> it hangs up just in a few minutes. I doubt the nvidia-driver which is
> >> >> not recompiled with
> >> >> patched kernel is the root, I will check this out  later, but does
> >> >> anyone meet similar problem?
> >> > There are two many variations compared to the config I did tested.
> >> > I do not see anything obvious in the changes between HEAD and stable/9
> >> > which could be blamed. Nvidia driver might be bigger suspect, but again,
> >> > I am not aware of anything wrong with it.
> >> >
> >> >>
> >> >> I have two question about the pcid.2.patch
> >> >
> >> > Item 2 is clean, I fixed it.
> >> >
> >> > For the item 1, I was only able to decipher the proposal to optimize
> >> > the global shootdown handler to restore the %cr3 with bit 64 set to not
> >> > invalidate current PCID. Is there some more changes ?
> >> >
> >> yes, that is what I meant. I was wondering using another way that each
> >> process has different
> >> pcid in each active processor, just as the freebsd mips and powerpc
> >> uses. But obviously this way
> >> is more friendly to non-pcid  x86  processor.
> > Each vmspace (or pmap) has unique PCID with the patch, at least until
> > PCID space (12bit) is not exhausted. To really exhaust it, you need 4095
> > processes, so it is unlikely but possible event with the current settings.
> >
> Thank you for your explanation. I just disabled nvidia-driver( not
> load it) , and
> use "buildworld buildkernel" to test the pcid.1.patch with 9-release,
> it seems the box reset before
> completing the buildkernel, the attachment is my kernel config, would
> you mind try it on
> 9-release with pcid.1.patch? I will git 10-current a try to see if
> there is something wrong with my hardware

I just did checkout + buildworld + buildkernel with -j 10 on UFS with
PCID turned on, everything finished fine. It is up to date HEAD.

sandy% sysctl vm.stats.sys.v_swtch vm.pmap.pcid_save_cnt
vm.stats.sys.v_swtch: 13743519
vm.pmap.pcid_save_cnt: 7853519
I.e. the TLB was not flushed one each second context switch.

Trying the HEAD with the patch is probably easiest way forward.


pgpn1u9lA6atH.pgp
Description: PGP signature


Re: FILTER_SCHEDULE_THREAD is not a bit-value

2012-01-31 Thread Ian Lepore
On Tue, 2012-01-31 at 15:57 +0700, Max Khon wrote: 
> Hello,
> 
> On Tue, Jan 31, 2012 at 12:44 AM, Ian Lepore
>  wrote:
> 
> >> sys/bus.h documents the following semantics for FILTER_SCHEDULE_THREAD:
> >>
> >> /**
> >>  * @brief Driver interrupt filter return values
> >>  *
> >>  * If a driver provides an interrupt filter routine it must return an
> >>  * integer consisting of oring together zero or more of the following
> >>  ^^^
> >>  * flags:
> >>  *
> >>  *  FILTER_STRAY- this device did not trigger the interrupt
> >>  *  FILTER_HANDLED  - the interrupt has been fully handled and can be 
> >> EOId
> >>  *  FILTER_SCHEDULE_THREAD - the threaded interrupt handler should be
> >>  *scheduled to execute
> >>  *
> >>  * If the driver does not provide a filter, then the interrupt code will
> >>  * act is if the filter had returned FILTER_SCHEDULE_THREAD.  Note that it
> >>  * is illegal to specify any other flag with FILTER_STRAY and that it is
> >>  * illegal to not specify either of FILTER_HANDLED or 
> >> FILTER_SCHEDULE_THREAD
> >>  * if FILTER_STRAY is not specified.
> >>  */
> >> #define FILTER_STRAY0x01
> >> #define FILTER_HANDLED  0x02
> >> #define FILTER_SCHEDULE_THREAD  0x04
> >>
> >> But actually FILTER_SCHEDULE_THREAD is not used as a bit-value (see
> >> kern/kern_intr.c):
> >>
> >> if (!thread) {
> >> if (ret == FILTER_SCHEDULE_THREAD)
> >> thread = 1;
> >> }
> >>
> >> There is at least one in-tree driver that could be broken because of
> >> this (asmc(8), but I found the problem with some other out-of-tree
> >> driver).
> >> This should be "if (ret & FILTER_SCHEDULE_THREAD)" instead. Attached
> >> patch fixes the problem.
> >>
> >> What do you think?
> >>
> >> Max
> >
> > I think returning (FILTER_HANDLED | FILTER_SCHEDULE_THREAD) makes no
> > sense given the definition "the interrupt has been fully handled and can
> > be EOId".  If you EOI in the primary interrupt context and then schedule
> > a threaded handler to run as well you're likely to need complex locking
> > between the primary and threaded interrupt handlers and I was under the
> > impression that's just the sort of thing the filter/threaded scheme was
> > designed to avoid.
> 
> I see no sense here.
> 1) You would have to implement locking anyway to protect concurrent
> access from ithread/filter and other driver methods (char device or
> network device callbacks)
> 

That is often, but not always, the case.  Depending on the hardware and
the needs of the driver, the guaranteed temporal separation between
primary and threaded interrupt context can reduce the need for locking.
In one case I managed to avoid the need to do any locking at all in the
primary context (in a pps driver to replace the stock one that lost the
ability to handle interrupts in a primary context at all).

> 2) ithread and filter can already be executed simultaneously even when
> only FILTER_SCHEDULE_THREAD is returned: when ithread is scheduled to
> be executed the device can emit a new interrupt and it will be
> preempted by filter
> 

No, if the primary-context handler does not return FILTER_HANDLED and
the interrupt dispatcher code does not EOI the interrupt until after the
threaded handler has run, then another hardware interrupt from that
source cannot interrupt the threaded handler.  This amounts to implicit
temporal synchronization between primary and threaded interrupt contexts
that eliminates the need for explicit synchronization using locks.

> > In other words, the part about ORing together values seems to be staking
> > out room for future growth, because the current set of flags and the
> > words about how to use them imply that only one of the current set of
> > values should be returned at once.
> 
> No, the text does not imply that only one of the values is supposed to
> be returned (where did you see it). See also KASSERT checks in
> intr_event_handle() -- they clearly show that the intention was to
> allow FILTER_HANDLED and FILTER_SCHEDULE_THREAD to be returned
> simultaneously.
> 
> Max

I have to admit that the text doesn't specifically forbid returning both
values ORed together, but it seems to me that doing so is nonsensical.
The reason is the corollary to the above point:  if you return
FILTER_HANDLED and allow the interrupt to be EOI'd from the primary
context, then you have to deal with the possibility of being
re-interrupted in the threaded handler.  You have to cope with the
possiblity of getting a new interrupt before having fully handled the
prior one.  

Hmmm, but I guess I could imagine a situation where the primary context
could capture and enqueue information on a list and then return
FILTER_HANDLED specifically to request the EOI so that having multiple
hardware interrupts occur between invocations of the threaded handler
would work out okay.  I 

gptboot rewrite, bootonce, etc.

2012-01-31 Thread Andrey Fesenko
This work if use ZFS?
My issues "Root on ZFS & GPT and boot to ufs partition"
http://lists.freebsd.org/pipermail/freebsd-fs/2012-January/013514.html

I test
# gpart show
=>   34  625142381  ada0  GPT  (298G)
 34128 1  freebsd-boot  (64k)
162   26621952 2  freebsd-ufs  [bootonce,bootme]  (12G)
   266221148388608 3  freebsd-swap  (4.0G)
   35010722  590131693 4  freebsd-zfs  (281G)

system ada0p2 not boot.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


em0 failure on 10-current w/ clang

2012-01-31 Thread Philip M. Gollucci
sudo mv kernel kernel.clang
sudo mv kernel.old kernel

has me back in action for now on 9.0-RELEASE.

What info can I provide to be of help?


Jan 31 16:05:31 <4.5> frieza login: ROOT LOGIN (root) ON ttyv0
Jan 31 16:06:03 <18.5> frieza sudo: root : TTY=ttyv0 ; PWD=/root ;
USER=root ; COMMAND=/etc/rc.d/netif restart
Jan 31 16:06:03 <0.7> frieza kernel: ifa_del_loopback_route: deletion
failed: 3
Jan 31 16:06:03 <0.7> frieza kernel: ifa_add_loopback_route: insertion
failed: 17
Jan 31 16:06:05 <0.7> frieza kernel: ifa_del_loopback_route: deletion
failed: 3
Jan 31 16:06:05 <0.7> frieza kernel: ifa_add_loopback_route: insertion
failed: 17
Jan 31 16:06:05 <0.5> frieza kernel: em0: link state changed to DOWN
Jan 31 16:06:09 <3.3> frieza dhclient[2315]: em0: not found
Jan 31 16:06:09 <0.5> frieza kernel: em0: link state changed to UP
Jan 31 16:06:09 <3.2> frieza dhclient[2315]: exiting.
Jan 31 16:06:09 <3.3> frieza dhclient[2316]: connection closed
Jan 31 16:06:09 <3.2> frieza dhclient[2316]: exiting.
Jan 31 16:06:22 <0.7> frieza kernel: ifa_del_loopback_route: deletion
failed: 3
Jan 31 16:06:22 <0.7> frieza kernel: ifa_add_loopback_route: insertion
failed: 17
Jan 31 16:06:22 <0.5> frieza kernel: em0: link state changed to DOWN
Jan 31 16:06:24 <3.3> frieza dhclient[2345]: em0: not found
Jan 31 16:06:24 <0.5> frieza kernel: em0: link state changed to UP
Jan 31 16:06:24 <3.2> frieza dhclient[2345]: exiting.
Jan 31 16:06:24 <3.3> frieza dhclient[2346]: connection closed
Jan 31 16:06:24 <3.2> frieza dhclient[2346]: exiting.
Jan 31 16:06:39 <0.7> frieza kernel: ifa_del_loopback_route: deletion
failed: 3
Jan 31 16:06:39 <0.7> frieza kernel: ifa_add_loopback_route: insertion
failed: 17
Jan 31 16:06:41 <0.7> frieza kernel: ifa_del_loopback_route: deletion
failed: 3
Jan 31 16:06:41 <0.7> frieza kernel: ifa_add_loopback_route: insertion
failed: 17
Jan 31 16:06:41 <0.5> frieza kernel: em0: link state changed to DOWN
Jan 31 16:06:48 <3.3> frieza dhclient[2811]: em0: not found
Jan 31 16:06:48 <0.5> frieza kernel: em0: link state changed to UP
Jan 31 16:06:48 <3.2> frieza dhclient[2811]: exiting.
Jan 31 16:06:48 <3.3> frieza dhclient[2812]: connection closed
Jan 31 16:06:48 <3.2> frieza dhclient[2812]: exiting.


-- 

1024D/DB9B8C1C B90B FBC3 A3A1 C71A 8E70  3F8C 75B8 8FFB DB9B 8C1C
Philip M. Gollucci (pgollu...@p6m7g8.com) c: 703.336.9354
Member,   Apache Software Foundation
Committer,FreeBSD Foundation
Consultant,   P6M7G8 Inc.
Director Operations,  Ridecharge Inc.

Work like you don't need the money,
love like you'll never get hurt,
and dance like nobody's watching.



signature.asc
Description: OpenPGP digital signature


Race between cron and crontab

2012-01-31 Thread John Baldwin
A co-worker ran into a race between updating a cron tab via crontab(8) and 
cron(8) yesterday.  Specifically, cron(8) failed to notice that a crontab was 
updated.  The problem is that 1) by default our filesystems only use second 
granularity for timestamps and 2) cron only caches the seconds portion of a 
file's timestamp when checking for changes anyway.  This means that cron can 
miss updates to a spool directory if multiple updates to the directory are 
performed within a single second and cron wakes up to scan the spool directory 
within the same second and scans it before all of the updates are complete.

Specifically, when replacing a crontab, crontab(8) first creates a temporary 
file in /var/cron/tabs and then uses a rename to install it followed by 
touching the spool directory to update its modification time.  However, the 
creation of the temporary file already changes the modification time of the 
directory, and cron may "miss" the rename if it scans the directory in between 
the creation of the temporary file and the rename.

The "fix" I am planning to use locally is to simply force crontab(8) to sleep 
for a second before it touches the spool directory, thus ensuring that it the 
touch of the spool directory will use a later modification time than the 
creation of the temporary file.

Note that crontab -r is not affected by this race as it only does one atomic 
update to the directory (unlink()).

Index: crontab.c
===
--- crontab.c   (revision 225431)
+++ crontab.c   (working copy)
@@ -604,6 +604,15 @@ replace_cmd() {
 
log_it(RealUser, Pid, "REPLACE", User);
 
+   /*
+* Creating the 'tn' temp file has already updated the
+* modification time of the spool directory.  Sleep for a
+* second to ensure that poke_daemon() sets a later
+* modification time.  Otherwise, this can race with the cron
+* daemon scanning for updated crontabs.
+*/
+   sleep(1);
+
poke_daemon();
 
return (0);


-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: posix_fadvise noreuse disables file caching

2012-01-31 Thread Ulrich Spörlein
On Mon, 2012-01-30 at 09:36:45 -0500, John Baldwin wrote:
> On Sunday, January 29, 2012 10:08:10 am Tijl Coosemans wrote:
> > On Wednesday 25 January 2012 17:29:22 John Baldwin wrote:
> > > On Friday, January 20, 2012 2:12:13 pm John Baldwin wrote:
> > >> On Thursday, January 19, 2012 11:39:42 am Tijl Coosemans wrote:
> > >>> I recently noticed that multimedia/vlc generates a lot of disk IO when
> > >>> playing media files. For instance, when playing a 320kbps mp3 gstat
> > >>> reports about 1250kBps (=1kbps). That's quite a lot of overhead.
> > >>> 
> > >>> It turns out that vlc sets POSIX_FADV_NOREUSE on the entire file and
> > >>> reads in chunks of 1028 bytes. FreeBSD implements NOREUSE as if
> > >>> O_DIRECT was specified during open(2), i.e. it disables all caching.
> > >>> That means every 1028 byte read turns into a 32KiB read (new default
> > >>> block size in 9.0) which explains the above numbers.
> > >>> 
> > >>> I've copied the relevant vlc code below (modules/access/file.c:Open()).
> > >>> It's interesting to see that on OSX it sets F_NOCACHE which disables
> > >>> caching too, but combined with F_RDAHEAD there's still read-ahead
> > >>> caching.
> > >>> 
> > >>> I don't think POSIX intended for NOREUSE to mean O_DIRECT. It should
> > >>> still cache data (and even do read-ahead if F_RDAHEAD is specified),
> > >>> and once data is fetched from the cache, it can be marked WONTNEED.
> > >> 
> > >> POSIX doesn't specify O_DIRECT, so it's not clear what it asks for.
> > >> 
> > >>> Is it possible to implement it this way, or if not to just ignore
> > >>> the NOREUSE hint for now?
> > >> 
> > >> I think it would be good to improve NOREUSE, though I had sort of
> > >> assumed that applications using NOREUSE would do their own buffering
> > >> and read full blocks.  We could perhaps reimplement NOREUSE by doing
> > >> the equivalent of POSIX_FADV_DONTNEED after each read to free buffers
> > >> and pages after the data is copied out to userland.  I also have an
> > >> XXX about whether or not NOREUSE should still allow read-ahead as it
> > >> isn't very clear what the right thing to do there is.  HP-UX (IIRC)
> > >> has an fadvise() that lets you specify multiple policies, so you
> > >> could specify both NOREUSE and SEQUENTIAL for a single region to
> > >> get read-ahead but still release memory once the data is read once.
> > >
> > > So I've came up with this untested patch.  It uses
> > > VOP_ADVISE(FADV_DONTNEED) after read(2) calls to a NOREUSE region, and
> > > leaves read-ahead caching enabled for NOREUSE.  FADV_DONTNEED doesn't
> > > do any good really for writes (it only flushes clean buffers), so I've
> > > left write(2) operations as using IO_DIRECT still.  Does this sound
> > > reasonable?  I've not yet tested this at all:
> > 
> > The patch drastically improves vlc, but there's still a tiny overhead.
> > Without NOREUSE the disk is read in chunks of 128KiB (F_RDAHEAD buffer
> > size). With NOREUSE there's an extra transfer of 32KiB (block size).
> 
> This is probably because vlc is not reading on block boundaries, so the 
> noreuse is throwing away partial blocks at the end of a read that then have 
> to 
> be re-read.  We could maybe fix this by making FADV_DONTNEED only throw
> away completely-contained blocks rather than completely-contained pages.
> However, this will probably result in NOREUSE not actually throwing away
> anything at all if an app always reads sub-blocksize chunks.
> 
> We could maybe make the case of vlc work ok in this case though by allowing
> an extension where you can do 'posix_fadvise(SEQUENTIAL | NOREUSE)', and
> in this case we could make the VOP_ADVISE(DONTNEED) in read() use an offset
> of 0 rather than the start of the read request.
> 
> However, posix_fadvise() really is going to work best if the userland 
> application reads aligned FS blocks.

I find it questionable in general that an application can tell the
system what to do wrt. caching. Perhaps I'm running 100s of VLC players
all on the same file and actually *do* want reads to be cached?

What happens if I seek back in the file? It has to do a potentially
high-latency read again. The system has a better overview of blocks that
are frequently being requested than any individual application.

I fully understand the intention, and in 99.99% of the cases, this data
*is* just being read once so there's no need to cache any reads for
actually requested data. But as the example shows, requested data is not
necessarily the data that lower layers have to fetch from the disk.

Perhaps taking to VLC people on why they think this is useful and where
it actually, measurably helped them would be interesting.

Sorry if this is all perfectly obvious
Uli
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: gptboot rewrite, bootonce, etc.

2012-01-31 Thread Pawel Jakub Dawidek
On Tue, Jan 31, 2012 at 06:49:51PM +0300, Andrey Fesenko wrote:
> This work if use ZFS?
> My issues "Root on ZFS & GPT and boot to ufs partition"
> http://lists.freebsd.org/pipermail/freebsd-fs/2012-January/013514.html
> 
> I test
> # gpart show
> =>   34  625142381  ada0  GPT  (298G)
>  34128 1  freebsd-boot  (64k)
> 162   26621952 2  freebsd-ufs  [bootonce,bootme]  (12G)
>266221148388608 3  freebsd-swap  (4.0G)
>35010722  590131693 4  freebsd-zfs  (281G)
> 
> system ada0p2 not boot.

This functionality only works with UFS (gptboot).

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpgkHKPPasLZ.pgp
Description: PGP signature


Re: Race between cron and crontab

2012-01-31 Thread Ian Lepore
On Tue, 2012-01-31 at 11:49 -0500, John Baldwin wrote:
> A co-worker ran into a race between updating a cron tab via crontab(8) and 
> cron(8) yesterday.  Specifically, cron(8) failed to notice that a crontab was 
> updated.  The problem is that 1) by default our filesystems only use second 
> granularity for timestamps and 2) cron only caches the seconds portion of a 
> file's timestamp when checking for changes anyway.  This means that cron can 
> miss updates to a spool directory if multiple updates to the directory are 
> performed within a single second and cron wakes up to scan the spool 
> directory 
> within the same second and scans it before all of the updates are complete.
> 
> Specifically, when replacing a crontab, crontab(8) first creates a temporary 
> file in /var/cron/tabs and then uses a rename to install it followed by 
> touching the spool directory to update its modification time.  However, the 
> creation of the temporary file already changes the modification time of the 
> directory, and cron may "miss" the rename if it scans the directory in 
> between 
> the creation of the temporary file and the rename.
> 
> The "fix" I am planning to use locally is to simply force crontab(8) to sleep 
> for a second before it touches the spool directory, thus ensuring that it the 
> touch of the spool directory will use a later modification time than the 
> creation of the temporary file.
> 
> Note that crontab -r is not affected by this race as it only does one atomic 
> update to the directory (unlink()).
> 
> Index: crontab.c
> ===
> --- crontab.c (revision 225431)
> +++ crontab.c (working copy)
> @@ -604,6 +604,15 @@ replace_cmd() {
>  
>   log_it(RealUser, Pid, "REPLACE", User);
>  
> + /*
> +  * Creating the 'tn' temp file has already updated the
> +  * modification time of the spool directory.  Sleep for a
> +  * second to ensure that poke_daemon() sets a later
> +  * modification time.  Otherwise, this can race with the cron
> +  * daemon scanning for updated crontabs.
> +  */
> + sleep(1);
> +
>   poke_daemon();
>  
>   return (0);

Maybe this is overly pedantic, but that solution still allows the
possibility of the same sort of race if a user updates their crontab in
the same second as an admin saves a new /etc/crontab, because cron takes
the max timestamp of /etc/crontab and /var/cron/tabs and compares it
against the database-rebuild timestamp.

A possible solution on the daemon side of things might be something like
the attached, but I should state (nay, shout) that I haven't looked
beyond these few lines to see if there are any unintended side effects
to such a change.

-- Ian

diff -r eb5f4971de86 usr.sbin/cron/cron/database.c
--- usr.sbin/cron/cron/database.c	Fri Jan 20 16:12:15 2012 -0700
+++ usr.sbin/cron/cron/database.c	Tue Jan 31 10:48:32 2012 -0700
@@ -72,7 +72,7 @@ load_database(old_db)
 	 * so is guaranteed to be different than the stat() mtime the first
 	 * time this function is called.
 	 */
-	if (old_db->mtime == TMAX(statbuf.st_mtime, syscron_stat.st_mtime)) {
+	if (old_db->mtime > TMAX(statbuf.st_mtime, syscron_stat.st_mtime)) {
 		Debug(DLOAD, ("[%d] spool dir mtime unch, no load needed.\n",
 			  getpid()))
 		return;
@@ -83,7 +83,7 @@ load_database(old_db)
 	 * actually changed.  Whatever is left in the old database when
 	 * we're done is chaff -- crontabs that disappeared.
 	 */
-	new_db.mtime = TMAX(statbuf.st_mtime, syscron_stat.st_mtime);
+	new_db.mtime = 1 + TMAX(statbuf.st_mtime, syscron_stat.st_mtime);
 	new_db.head = new_db.tail = NULL;
 
 	if (syscron_stat.st_mtime) {
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: em0 failure on 10-current w/ clang

2012-01-31 Thread Dimitry Andric

On 2012-01-31 17:21, Philip M. Gollucci wrote:

sudo mv kernel kernel.clang
sudo mv kernel.old kernel

has me back in action for now on 9.0-RELEASE.

What info can I provide to be of help?


Jan 31 16:05:31<4.5>  frieza login: ROOT LOGIN (root) ON ttyv0
Jan 31 16:06:03<18.5>  frieza sudo: root : TTY=ttyv0 ; PWD=/root ;
USER=root ; COMMAND=/etc/rc.d/netif restart
Jan 31 16:06:03<0.7>  frieza kernel: ifa_del_loopback_route: deletion
failed: 3
Jan 31 16:06:03<0.7>  frieza kernel: ifa_add_loopback_route: insertion
failed: 17
Jan 31 16:06:05<0.7>  frieza kernel: ifa_del_loopback_route: deletion
failed: 3
Jan 31 16:06:05<0.7>  frieza kernel: ifa_add_loopback_route: insertion
failed: 17
Jan 31 16:06:05<0.5>  frieza kernel: em0: link state changed to DOWN
Jan 31 16:06:09<3.3>  frieza dhclient[2315]: em0: not found


I don't think it has anything to do with clang per se; there were some
incompatibilities introduced in ifconfig's ioctls, so you must upgrade
world before being able to use any networking.  This is very annoying if
you want to install of NFS... :(
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: posix_fadvise noreuse disables file caching

2012-01-31 Thread John Baldwin
On Tuesday, January 31, 2012 12:21:07 pm Ulrich Spörlein wrote:
> On Mon, 2012-01-30 at 09:36:45 -0500, John Baldwin wrote:
> > On Sunday, January 29, 2012 10:08:10 am Tijl Coosemans wrote:
> > > On Wednesday 25 January 2012 17:29:22 John Baldwin wrote:
> > > > On Friday, January 20, 2012 2:12:13 pm John Baldwin wrote:
> > > >> On Thursday, January 19, 2012 11:39:42 am Tijl Coosemans wrote:
> > > >>> I recently noticed that multimedia/vlc generates a lot of disk IO when
> > > >>> playing media files. For instance, when playing a 320kbps mp3 gstat
> > > >>> reports about 1250kBps (=1kbps). That's quite a lot of overhead.
> > > >>> 
> > > >>> It turns out that vlc sets POSIX_FADV_NOREUSE on the entire file and
> > > >>> reads in chunks of 1028 bytes. FreeBSD implements NOREUSE as if
> > > >>> O_DIRECT was specified during open(2), i.e. it disables all caching.
> > > >>> That means every 1028 byte read turns into a 32KiB read (new default
> > > >>> block size in 9.0) which explains the above numbers.
> > > >>> 
> > > >>> I've copied the relevant vlc code below 
> > > >>> (modules/access/file.c:Open()).
> > > >>> It's interesting to see that on OSX it sets F_NOCACHE which disables
> > > >>> caching too, but combined with F_RDAHEAD there's still read-ahead
> > > >>> caching.
> > > >>> 
> > > >>> I don't think POSIX intended for NOREUSE to mean O_DIRECT. It should
> > > >>> still cache data (and even do read-ahead if F_RDAHEAD is specified),
> > > >>> and once data is fetched from the cache, it can be marked WONTNEED.
> > > >> 
> > > >> POSIX doesn't specify O_DIRECT, so it's not clear what it asks for.
> > > >> 
> > > >>> Is it possible to implement it this way, or if not to just ignore
> > > >>> the NOREUSE hint for now?
> > > >> 
> > > >> I think it would be good to improve NOREUSE, though I had sort of
> > > >> assumed that applications using NOREUSE would do their own buffering
> > > >> and read full blocks.  We could perhaps reimplement NOREUSE by doing
> > > >> the equivalent of POSIX_FADV_DONTNEED after each read to free buffers
> > > >> and pages after the data is copied out to userland.  I also have an
> > > >> XXX about whether or not NOREUSE should still allow read-ahead as it
> > > >> isn't very clear what the right thing to do there is.  HP-UX (IIRC)
> > > >> has an fadvise() that lets you specify multiple policies, so you
> > > >> could specify both NOREUSE and SEQUENTIAL for a single region to
> > > >> get read-ahead but still release memory once the data is read once.
> > > >
> > > > So I've came up with this untested patch.  It uses
> > > > VOP_ADVISE(FADV_DONTNEED) after read(2) calls to a NOREUSE region, and
> > > > leaves read-ahead caching enabled for NOREUSE.  FADV_DONTNEED doesn't
> > > > do any good really for writes (it only flushes clean buffers), so I've
> > > > left write(2) operations as using IO_DIRECT still.  Does this sound
> > > > reasonable?  I've not yet tested this at all:
> > > 
> > > The patch drastically improves vlc, but there's still a tiny overhead.
> > > Without NOREUSE the disk is read in chunks of 128KiB (F_RDAHEAD buffer
> > > size). With NOREUSE there's an extra transfer of 32KiB (block size).
> > 
> > This is probably because vlc is not reading on block boundaries, so the 
> > noreuse is throwing away partial blocks at the end of a read that then have 
> > to 
> > be re-read.  We could maybe fix this by making FADV_DONTNEED only throw
> > away completely-contained blocks rather than completely-contained pages.
> > However, this will probably result in NOREUSE not actually throwing away
> > anything at all if an app always reads sub-blocksize chunks.
> > 
> > We could maybe make the case of vlc work ok in this case though by allowing
> > an extension where you can do 'posix_fadvise(SEQUENTIAL | NOREUSE)', and
> > in this case we could make the VOP_ADVISE(DONTNEED) in read() use an offset
> > of 0 rather than the start of the read request.
> > 
> > However, posix_fadvise() really is going to work best if the userland 
> > application reads aligned FS blocks.
> 
> I find it questionable in general that an application can tell the
> system what to do wrt. caching. Perhaps I'm running 100s of VLC players
> all on the same file and actually *do* want reads to be cached?
> 
> What happens if I seek back in the file? It has to do a potentially
> high-latency read again. The system has a better overview of blocks that
> are frequently being requested than any individual application.
> 
> I fully understand the intention, and in 99.99% of the cases, this data
> *is* just being read once so there's no need to cache any reads for
> actually requested data. But as the example shows, requested data is not
> necessarily the data that lower layers have to fetch from the disk.
> 
> Perhaps taking to VLC people on why they think this is useful and where
> it actually, measurably helped them would be interesting.
> 
> Sorry if this is all perfectly obvious

There are certainly c

Re: Race between cron and crontab

2012-01-31 Thread John Baldwin
On Tuesday, January 31, 2012 12:57:50 pm Ian Lepore wrote:
> On Tue, 2012-01-31 at 11:49 -0500, John Baldwin wrote:
> > A co-worker ran into a race between updating a cron tab via crontab(8) and 
> > cron(8) yesterday.  Specifically, cron(8) failed to notice that a crontab 
> > was 
> > updated.  The problem is that 1) by default our filesystems only use second 
> > granularity for timestamps and 2) cron only caches the seconds portion of a 
> > file's timestamp when checking for changes anyway.  This means that cron 
> > can 
> > miss updates to a spool directory if multiple updates to the directory are 
> > performed within a single second and cron wakes up to scan the spool 
> > directory 
> > within the same second and scans it before all of the updates are complete.
> > 
> > Specifically, when replacing a crontab, crontab(8) first creates a 
> > temporary 
> > file in /var/cron/tabs and then uses a rename to install it followed by 
> > touching the spool directory to update its modification time.  However, the 
> > creation of the temporary file already changes the modification time of the 
> > directory, and cron may "miss" the rename if it scans the directory in 
> > between 
> > the creation of the temporary file and the rename.
> > 
> > The "fix" I am planning to use locally is to simply force crontab(8) to 
> > sleep 
> > for a second before it touches the spool directory, thus ensuring that it 
> > the 
> > touch of the spool directory will use a later modification time than the 
> > creation of the temporary file.
> > 
> > Note that crontab -r is not affected by this race as it only does one 
> > atomic 
> > update to the directory (unlink()).
> > 
> > Index: crontab.c
> > ===
> > --- crontab.c   (revision 225431)
> > +++ crontab.c   (working copy)
> > @@ -604,6 +604,15 @@ replace_cmd() {
> >  
> > log_it(RealUser, Pid, "REPLACE", User);
> >  
> > +   /*
> > +* Creating the 'tn' temp file has already updated the
> > +* modification time of the spool directory.  Sleep for a
> > +* second to ensure that poke_daemon() sets a later
> > +* modification time.  Otherwise, this can race with the cron
> > +* daemon scanning for updated crontabs.
> > +*/
> > +   sleep(1);
> > +
> > poke_daemon();
> >  
> > return (0);
> 
> Maybe this is overly pedantic, but that solution still allows the
> possibility of the same sort of race if a user updates their crontab in
> the same second as an admin saves a new /etc/crontab, because cron takes
> the max timestamp of /etc/crontab and /var/cron/tabs and compares it
> against the database-rebuild timestamp.

Hmm, I'm not sure I see the race in that case.  If the /etc/crontab file
matches the timestamp of the spool directory before the utimes() call
after the one-second sleep, then it will still rescan it on the next
check when it notices a newer timestamp on the spool directory.  If
it is the same timestamp as the second timestamp on the spool directory,
then the scan is guaranteed to have not started before that second began,
meaning that the crontab(8) process editing the user's crontab must have
passed the rename, so the scan will see the user's new crontab.

> A possible solution on the daemon side of things might be something like
> the attached, but I should state (nay, shout) that I haven't looked
> beyond these few lines to see if there are any unintended side effects
> to such a change.

I think this patch doesn't change anything at all actually.  It is 
certainly subject to the original race I described if you do not use
the patch in crontab(8) itself.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Race between cron and crontab

2012-01-31 Thread Ian Lepore
On Tue, 2012-01-31 at 13:30 -0500, John Baldwin wrote:
> On Tuesday, January 31, 2012 12:57:50 pm Ian Lepore wrote:
> > On Tue, 2012-01-31 at 11:49 -0500, John Baldwin wrote:
> > > A co-worker ran into a race between updating a cron tab via crontab(8) 
> > > and 
> > > cron(8) yesterday.  Specifically, cron(8) failed to notice that a crontab 
> > > was 
> > > updated.  The problem is that 1) by default our filesystems only use 
> > > second 
> > > granularity for timestamps and 2) cron only caches the seconds portion of 
> > > a 
> > > file's timestamp when checking for changes anyway.  This means that cron 
> > > can 
> > > miss updates to a spool directory if multiple updates to the directory 
> > > are 
> > > performed within a single second and cron wakes up to scan the spool 
> > > directory 
> > > within the same second and scans it before all of the updates are 
> > > complete.
> > > 
> > > Specifically, when replacing a crontab, crontab(8) first creates a 
> > > temporary 
> > > file in /var/cron/tabs and then uses a rename to install it followed by 
> > > touching the spool directory to update its modification time.  However, 
> > > the 
> > > creation of the temporary file already changes the modification time of 
> > > the 
> > > directory, and cron may "miss" the rename if it scans the directory in 
> > > between 
> > > the creation of the temporary file and the rename.
> > > 
> > > The "fix" I am planning to use locally is to simply force crontab(8) to 
> > > sleep 
> > > for a second before it touches the spool directory, thus ensuring that it 
> > > the 
> > > touch of the spool directory will use a later modification time than the 
> > > creation of the temporary file.
> > > 
> > > Note that crontab -r is not affected by this race as it only does one 
> > > atomic 
> > > update to the directory (unlink()).
> > > 
> > > Index: crontab.c
> > > ===
> > > --- crontab.c (revision 225431)
> > > +++ crontab.c (working copy)
> > > @@ -604,6 +604,15 @@ replace_cmd() {
> > >  
> > >   log_it(RealUser, Pid, "REPLACE", User);
> > >  
> > > + /*
> > > +  * Creating the 'tn' temp file has already updated the
> > > +  * modification time of the spool directory.  Sleep for a
> > > +  * second to ensure that poke_daemon() sets a later
> > > +  * modification time.  Otherwise, this can race with the cron
> > > +  * daemon scanning for updated crontabs.
> > > +  */
> > > + sleep(1);
> > > +
> > >   poke_daemon();
> > >  
> > >   return (0);
> > 
> > Maybe this is overly pedantic, but that solution still allows the
> > possibility of the same sort of race if a user updates their crontab in
> > the same second as an admin saves a new /etc/crontab, because cron takes
> > the max timestamp of /etc/crontab and /var/cron/tabs and compares it
> > against the database-rebuild timestamp.
> 
> Hmm, I'm not sure I see the race in that case.  If the /etc/crontab file
> matches the timestamp of the spool directory before the utimes() call
> after the one-second sleep, then it will still rescan it on the next
> check when it notices a newer timestamp on the spool directory.  If
> it is the same timestamp as the second timestamp on the spool directory,
> then the scan is guaranteed to have not started before that second began,
> meaning that the crontab(8) process editing the user's crontab must have
> passed the rename, so the scan will see the user's new crontab.
> 
> > A possible solution on the daemon side of things might be something like
> > the attached, but I should state (nay, shout) that I haven't looked
> > beyond these few lines to see if there are any unintended side effects
> > to such a change.
> 
> I think this patch doesn't change anything at all actually.  It is 
> certainly subject to the original race I described if you do not use
> the patch in crontab(8) itself.
> 

You're right about my patch not fixing anything; I didn't think hard
enough before I started typing.  

But I think the problem I was trying to get at with /etc/crontab still
exists with your patch; it would be triggered if a user updated their
crontab and after the 1 second sleep the directory timestamp gets
updated and cron rebuilds the database, and then right after that but
still in the same second /etc/crontab gets written.  (That's why I was
trying, however feebly, to move the solution into the daemon.)

Maybe the simple answer is for admins to be sure not to save changes
to /etc/crontab during the xx:xx:00 second, or with your patch, :01.
(I'm kidding, of course.  The fact that cron likes to wake up at the top
of minute is no g'tee that it actually does so every time.)

Once you start thinking along the "no g'tee" lines you realize that two
users can be updating their tabs concurrently, and one arrives in
poke_daemon() and grabs the current time (let's say it's noon) then gets
preempted for a long while before calling utimes(), the other user runs
through poke_da

Re: Race between cron and crontab

2012-01-31 Thread John Baldwin
On Tuesday, January 31, 2012 3:13:34 pm Ian Lepore wrote:
> On Tue, 2012-01-31 at 13:30 -0500, John Baldwin wrote:
> > On Tuesday, January 31, 2012 12:57:50 pm Ian Lepore wrote:
> > > On Tue, 2012-01-31 at 11:49 -0500, John Baldwin wrote:
> > > > A co-worker ran into a race between updating a cron tab via crontab(8) 
and 
> > > > cron(8) yesterday.  Specifically, cron(8) failed to notice that a 
crontab was 
> > > > updated.  The problem is that 1) by default our filesystems only use 
second 
> > > > granularity for timestamps and 2) cron only caches the seconds portion 
of a 
> > > > file's timestamp when checking for changes anyway.  This means that 
cron can 
> > > > miss updates to a spool directory if multiple updates to the directory 
are 
> > > > performed within a single second and cron wakes up to scan the spool 
directory 
> > > > within the same second and scans it before all of the updates are 
complete.
> > > > 
> > > > Specifically, when replacing a crontab, crontab(8) first creates a 
temporary 
> > > > file in /var/cron/tabs and then uses a rename to install it followed 
by 
> > > > touching the spool directory to update its modification time.  
However, the 
> > > > creation of the temporary file already changes the modification time 
of the 
> > > > directory, and cron may "miss" the rename if it scans the directory in 
between 
> > > > the creation of the temporary file and the rename.
> > > > 
> > > > The "fix" I am planning to use locally is to simply force crontab(8) 
to sleep 
> > > > for a second before it touches the spool directory, thus ensuring that 
it the 
> > > > touch of the spool directory will use a later modification time than 
the 
> > > > creation of the temporary file.
> > > > 
> > > > Note that crontab -r is not affected by this race as it only does one 
atomic 
> > > > update to the directory (unlink()).
> > > > 
> > > > Index: crontab.c
> > > > ===
> > > > --- crontab.c   (revision 225431)
> > > > +++ crontab.c   (working copy)
> > > > @@ -604,6 +604,15 @@ replace_cmd() {
> > > >  
> > > > log_it(RealUser, Pid, "REPLACE", User);
> > > >  
> > > > +   /*
> > > > +* Creating the 'tn' temp file has already updated the
> > > > +* modification time of the spool directory.  Sleep for a
> > > > +* second to ensure that poke_daemon() sets a later
> > > > +* modification time.  Otherwise, this can race with the cron
> > > > +* daemon scanning for updated crontabs.
> > > > +*/
> > > > +   sleep(1);
> > > > +
> > > > poke_daemon();
> > > >  
> > > > return (0);
> > > 
> > > Maybe this is overly pedantic, but that solution still allows the
> > > possibility of the same sort of race if a user updates their crontab in
> > > the same second as an admin saves a new /etc/crontab, because cron takes
> > > the max timestamp of /etc/crontab and /var/cron/tabs and compares it
> > > against the database-rebuild timestamp.
> > 
> > Hmm, I'm not sure I see the race in that case.  If the /etc/crontab file
> > matches the timestamp of the spool directory before the utimes() call
> > after the one-second sleep, then it will still rescan it on the next
> > check when it notices a newer timestamp on the spool directory.  If
> > it is the same timestamp as the second timestamp on the spool directory,
> > then the scan is guaranteed to have not started before that second began,
> > meaning that the crontab(8) process editing the user's crontab must have
> > passed the rename, so the scan will see the user's new crontab.
> > 
> > > A possible solution on the daemon side of things might be something like
> > > the attached, but I should state (nay, shout) that I haven't looked
> > > beyond these few lines to see if there are any unintended side effects
> > > to such a change.
> > 
> > I think this patch doesn't change anything at all actually.  It is 
> > certainly subject to the original race I described if you do not use
> > the patch in crontab(8) itself.
> > 
> 
> You're right about my patch not fixing anything; I didn't think hard
> enough before I started typing.  
> 
> But I think the problem I was trying to get at with /etc/crontab still
> exists with your patch; it would be triggered if a user updated their
> crontab and after the 1 second sleep the directory timestamp gets
> updated and cron rebuilds the database, and then right after that but
> still in the same second /etc/crontab gets written.  (That's why I was
> trying, however feebly, to move the solution into the daemon.)

What I would do for this case is change the daemon to remove all the TMAX
crap and just cache both timestamps and rebuild the database any time
either one changes.

> Maybe the simple answer is for admins to be sure not to save changes
> to /etc/crontab during the xx:xx:00 second, or with your patch, :01.
> (I'm kidding, of course.  The fact that cron likes to wake up at 

using nscd (ldap) makes passwd/group disappearing while installing ports

2012-01-31 Thread O. Hartmann
I'm using on a couple of servers the nameservice cache dameon nscd and
cache "group", "passwd" and "sudoers". Backend is LDAP, but local files
should searched first. then ldap. cache is searched the very first even
before files.

Well, I'd expect that if a group is present, like "cups" or "dhcp" and
reside in the local file (/etc/group or /etc/passwd), they are cached.

Installing net/isc-dhcp42-server fails with this error:


gmake[1]: Leaving directory
`/usr/ports/net/isc-dhcp42-server/work/dhcp-4.2.3-P2/server'
gmake[1]: Entering directory
`/usr/ports/net/isc-dhcp42-server/work/dhcp-4.2.3-P2'
gmake[1]: Nothing to be done for `all-am'.
gmake[1]: Leaving directory
`/usr/ports/net/isc-dhcp42-server/work/dhcp-4.2.3-P2'
===>  Installing for isc-dhcp42-server-4.2.3_2
===>   Generating temporary packing list
===> Creating users and/or groups.
Creating group `dhcpd' with gid `136'.
pw: group disappeared during update
*** Error code 70

Stop in /usr/ports/net/isc-dhcp42-server.
*** Error code 1

Stop in /usr/ports/net/isc-dhcp42-server.



I also have this error very often when rebuilding/updating or even
installing cups when "nscd" is enabled. A simple restart of nscd helps
in most cases, most times I need to disable "cache" tag in
/etc/nsswitch.conf, then everything runs smooth.

Well, this behaviour is since a couple of years now, occurs sporadic. I
have had in FreeBSD 7, 8, 9 and I see it in 10. What is it?

I like the cache facility, since in domains with a lot of users
searching LDAP takes some time and caching help keeping traffic and
latency short. But the namservice caching mechanism seems to be
unreliable. What is up there?



signature.asc
Description: OpenPGP digital signature


Re: using nscd (ldap) makes passwd/group disappearing while installing ports

2012-01-31 Thread Benjamin Lee
On 01/31/2012 03:03 PM, O. Hartmann wrote:
> I'm using on a couple of servers the nameservice cache dameon nscd and
> cache "group", "passwd" and "sudoers". Backend is LDAP, but local files
> should searched first. then ldap. cache is searched the very first even
> before files.
> 
> Well, I'd expect that if a group is present, like "cups" or "dhcp" and
> reside in the local file (/etc/group or /etc/passwd), they are cached.
> 
> Installing net/isc-dhcp42-server fails with this error:
> 
> 
> gmake[1]: Leaving directory
> `/usr/ports/net/isc-dhcp42-server/work/dhcp-4.2.3-P2/server'
> gmake[1]: Entering directory
> `/usr/ports/net/isc-dhcp42-server/work/dhcp-4.2.3-P2'
> gmake[1]: Nothing to be done for `all-am'.
> gmake[1]: Leaving directory
> `/usr/ports/net/isc-dhcp42-server/work/dhcp-4.2.3-P2'
> ===>  Installing for isc-dhcp42-server-4.2.3_2
> ===>   Generating temporary packing list
> ===> Creating users and/or groups.
> Creating group `dhcpd' with gid `136'.
> pw: group disappeared during update
> *** Error code 70
> 
> Stop in /usr/ports/net/isc-dhcp42-server.
> *** Error code 1
> 
> Stop in /usr/ports/net/isc-dhcp42-server.

What's going on is:

1) The port checks if the group exists
2) nscd caches that the group does not exist in its negative cache
3) pw(8) creates the group then checks if it exists
4) nscd returns the negative cache entry (group does not exist)

This causes pw(8) to error since it expects the group that it just
created to exist.

> I also have this error very often when rebuilding/updating or even
> installing cups when "nscd" is enabled. A simple restart of nscd helps
> in most cases, most times I need to disable "cache" tag in
> /etc/nsswitch.conf, then everything runs smooth.
> 
> Well, this behaviour is since a couple of years now, occurs sporadic. I
> have had in FreeBSD 7, 8, 9 and I see it in 10. What is it?
> 
> I like the cache facility, since in domains with a lot of users
> searching LDAP takes some time and caching help keeping traffic and
> latency short. But the namservice caching mechanism seems to be
> unreliable. What is up there?

You should put "files" before "cache" in /etc/nsswitch.conf, e.g.:

group: files cache ldap
passwd: files cache ldap

The problem is that tools that modify the passwd and group files, like
pw(8), don't invalidate nscd's negative cache entries when making
changes.


-- 
Benjamin Lee
http://www.b1c1l1.com/



signature.asc
Description: OpenPGP digital signature


Re: posix_fadvise noreuse disables file caching

2012-01-31 Thread Rick Macklem
John Baldwin wrote:
> On Tuesday, January 31, 2012 12:21:07 pm Ulrich Spörlein wrote:
> > On Mon, 2012-01-30 at 09:36:45 -0500, John Baldwin wrote:
> > > On Sunday, January 29, 2012 10:08:10 am Tijl Coosemans wrote:
> > > > On Wednesday 25 January 2012 17:29:22 John Baldwin wrote:
> > > > > On Friday, January 20, 2012 2:12:13 pm John Baldwin wrote:
> > > > >> On Thursday, January 19, 2012 11:39:42 am Tijl Coosemans
> > > > >> wrote:
> > > > >>> I recently noticed that multimedia/vlc generates a lot of
> > > > >>> disk IO when
> > > > >>> playing media files. For instance, when playing a 320kbps
> > > > >>> mp3 gstat
> > > > >>> reports about 1250kBps (=1kbps). That's quite a lot of
> > > > >>> overhead.
> > > > >>>
> > > > >>> It turns out that vlc sets POSIX_FADV_NOREUSE on the entire
> > > > >>> file and
> > > > >>> reads in chunks of 1028 bytes. FreeBSD implements NOREUSE as
> > > > >>> if
> > > > >>> O_DIRECT was specified during open(2), i.e. it disables all
> > > > >>> caching.
> > > > >>> That means every 1028 byte read turns into a 32KiB read (new
> > > > >>> default
> > > > >>> block size in 9.0) which explains the above numbers.
> > > > >>>
> > > > >>> I've copied the relevant vlc code below
> > > > >>> (modules/access/file.c:Open()).
> > > > >>> It's interesting to see that on OSX it sets F_NOCACHE which
> > > > >>> disables
> > > > >>> caching too, but combined with F_RDAHEAD there's still
> > > > >>> read-ahead
> > > > >>> caching.
> > > > >>>
> > > > >>> I don't think POSIX intended for NOREUSE to mean O_DIRECT.
> > > > >>> It should
> > > > >>> still cache data (and even do read-ahead if F_RDAHEAD is
> > > > >>> specified),
> > > > >>> and once data is fetched from the cache, it can be marked
> > > > >>> WONTNEED.
> > > > >>
> > > > >> POSIX doesn't specify O_DIRECT, so it's not clear what it
> > > > >> asks for.
> > > > >>
> > > > >>> Is it possible to implement it this way, or if not to just
> > > > >>> ignore
> > > > >>> the NOREUSE hint for now?
> > > > >>
> > > > >> I think it would be good to improve NOREUSE, though I had
> > > > >> sort of
> > > > >> assumed that applications using NOREUSE would do their own
> > > > >> buffering
> > > > >> and read full blocks. We could perhaps reimplement NOREUSE by
> > > > >> doing
> > > > >> the equivalent of POSIX_FADV_DONTNEED after each read to free
> > > > >> buffers
> > > > >> and pages after the data is copied out to userland. I also
> > > > >> have an
> > > > >> XXX about whether or not NOREUSE should still allow
> > > > >> read-ahead as it
> > > > >> isn't very clear what the right thing to do there is. HP-UX
> > > > >> (IIRC)
> > > > >> has an fadvise() that lets you specify multiple policies, so
> > > > >> you
> > > > >> could specify both NOREUSE and SEQUENTIAL for a single region
> > > > >> to
> > > > >> get read-ahead but still release memory once the data is read
> > > > >> once.
> > > > >
> > > > > So I've came up with this untested patch. It uses
> > > > > VOP_ADVISE(FADV_DONTNEED) after read(2) calls to a NOREUSE
> > > > > region, and
> > > > > leaves read-ahead caching enabled for NOREUSE. FADV_DONTNEED
> > > > > doesn't
> > > > > do any good really for writes (it only flushes clean buffers),
> > > > > so I've
> > > > > left write(2) operations as using IO_DIRECT still. Does this
> > > > > sound
> > > > > reasonable? I've not yet tested this at all:
> > > >
> > > > The patch drastically improves vlc, but there's still a tiny
> > > > overhead.
> > > > Without NOREUSE the disk is read in chunks of 128KiB (F_RDAHEAD
> > > > buffer
> > > > size). With NOREUSE there's an extra transfer of 32KiB (block
> > > > size).
> > >
> > > This is probably because vlc is not reading on block boundaries,
> > > so the
> > > noreuse is throwing away partial blocks at the end of a read that
> > > then have to
> > > be re-read. We could maybe fix this by making FADV_DONTNEED only
> > > throw
> > > away completely-contained blocks rather than completely-contained
> > > pages.
> > > However, this will probably result in NOREUSE not actually
> > > throwing away
> > > anything at all if an app always reads sub-blocksize chunks.
> > >
> > > We could maybe make the case of vlc work ok in this case though by
> > > allowing
> > > an extension where you can do 'posix_fadvise(SEQUENTIAL |
> > > NOREUSE)', and
> > > in this case we could make the VOP_ADVISE(DONTNEED) in read() use
> > > an offset
> > > of 0 rather than the start of the read request.
> > >
> > > However, posix_fadvise() really is going to work best if the
> > > userland
> > > application reads aligned FS blocks.
> >
> > I find it questionable in general that an application can tell the
> > system what to do wrt. caching. Perhaps I'm running 100s of VLC
> > players
> > all on the same file and actually *do* want reads to be cached?
> >
> > What happens if I seek back in the file? It has to do a potentially
> > high-latency read again. The system has a better overview of blocks
> > that
> > are 

Re: Race between cron and crontab

2012-01-31 Thread Doug Barton
On 01/31/2012 08:49, John Baldwin wrote:
> A co-worker ran into a race between updating a cron tab via crontab(8) and 
> cron(8) yesterday.  Specifically, cron(8) failed to notice that a crontab was 
> updated.  The problem is that 1) by default our filesystems only use second 
> granularity for timestamps and 2) cron only caches the seconds portion of a 
> file's timestamp when checking for changes anyway.  This means that cron can 
> miss updates to a spool directory if multiple updates to the directory are 
> performed within a single second and cron wakes up to scan the spool 
> directory 
> within the same second and scans it before all of the updates are complete.
> 
> Specifically, when replacing a crontab, crontab(8) first creates a temporary 
> file in /var/cron/tabs and then uses a rename to install it followed by 
> touching the spool directory to update its modification time.  However, the 
> creation of the temporary file already changes the modification time of the 
> directory, and cron may "miss" the rename if it scans the directory in 
> between 
> the creation of the temporary file and the rename.
> 
> The "fix" I am planning to use locally is to simply force crontab(8) to sleep 
> for a second before it touches the spool directory, thus ensuring that it the 
> touch of the spool directory will use a later modification time than the 
> creation of the temporary file.

If you really want cron to have sub-second granularity I don't see how
you could do it without using flags.

crontab opensets flag that it is editing a file
crontab close   clears "editing" flag, sets "something changed" flag
(if something actually changed of course)

cronchecks existence of "something changed" flag, pulls the
update if there is no "editing" flag, clears "changed" flag


Doug

-- 

It's always a long day; 86400 doesn't fit into a short.

Breadth of IT experience, and depth of knowledge in the DNS.
Yours for the right price.  :)  http://SupersetSolutions.com/

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: gptboot rewrite, bootonce, etc.

2012-01-31 Thread Andrey V. Elsukov
On 31.01.2012 19:49, Andrey Fesenko wrote:
> This work if use ZFS?
> My issues "Root on ZFS & GPT and boot to ufs partition"
> http://lists.freebsd.org/pipermail/freebsd-fs/2012-January/013514.html
> 
> I test
> # gpart show
> =>   34  625142381  ada0  GPT  (298G)
>  34128 1  freebsd-boot  (64k)
> 162   26621952 2  freebsd-ufs  [bootonce,bootme]  (12G)
>266221148388608 3  freebsd-swap  (4.0G)
>35010722  590131693 4  freebsd-zfs  (281G)
> 
> system ada0p2 not boot.

Hi, Andrey

If you want or plan rewrite boot code, i think it is better to
write EFI loader with simple multiboot functionality.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: [HEADSUP][CFT] pkgng beta1 is out

2012-01-31 Thread Baptiste Daroussin
On Wed, Feb 01, 2012 at 09:23:35AM +0400, Andrey Zonov wrote:
> On 30.01.2012 16:39, Baptiste Daroussin wrote:
> > Hi,
> >
> > pkgng has just reached the beta phase, and has now found its way to the
> > ports tree (disabled by default).
> >
> > 1/ Why pkgng?
> > 
> Hi,
> 
> What about pkgng support in tinderbox?
> 

beat and I are working on it, just some typos left to figure out, should be
there pretty much soon.

regards,
Bapt


pgpNG6QZgFMQC.pgp
Description: PGP signature