Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash
Mike Frysinger wrote: > On Saturday 02 June 2012 00:11:19 Brian Harring wrote: >> On Fri, Jun 01, 2012 at 06:41:22PM -0400, Mike Frysinger wrote: >>> makeopts_jobs() { >> >> This function belongs in eutils, or somewhere similar- pretty sure >> we've got variants of this in multiple spots. I'd prefer a single >> point to change if/when we add a way to pass parallelism down into the >> env via EAPI. We do have variants at several places in ebuild/eclass (scons-utils, waf...). And some failed at some point, see [1]. > it's already in eutils. but i'm moving it out of that and into this since it > makes more sense in this eclass imo, and avoids this eclass from inheriting > eutils. Neat. Thanks for having added it. Lot of build related eclass would need it, if we want to factorize that code. We'll have to give maintainers incentive for migrating their code :-) [1] https://bugs.gentoo.org/show_bug.cgi?id=337831 -- Fulax Gentoo Lisp Contributor signature.asc Description: OpenPGP digital signature
Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash
Mike Frysinger wrote: exec {mj_control_fd}<>${mj_control_pipe} I'll have to remember that feature, but unfortunately it's new in bash 4.1, so unless we're giving up 3.2 as the minimum for the tree : $(( ++mj_num_jobs )) Any reason not to do just (( ++mj_num_jobs )) ? : $(( --mj_num_jobs )) : $(( ret |= $? )) Same.
Re: [gentoo-dev] Re: Portage Git migration - clean cut or git-cvsserver
On Sat, Jun 2, 2012 at 12:04 AM, Robin H. Johnson wrote: > please look up git-bundle before suggesting things like tarballs of > repos/checkouts. > Looks useful. Wasn't aware that a bundle was something other than a tarball. We'll probably need to spell out the preferred process in the docs, and reference it frequently in communications. Otherwise you'll get quite a few clones instead. It appears that devs will have to add the remote for the live repository after they've cloned the bundle - otherwise they'll just keep pulling from the bundle which isn't all that convenient. Rich
Re: [gentoo-dev] Re: Portage Git migration - clean cut or git-cvsserver
On Sat, Jun 2, 2012 at 12:59 PM, Rich Freeman wrote: > Looks useful. Wasn't aware that a bundle was something other than a tarball. > > We'll probably need to spell out the preferred process in the docs, > and reference it frequently in communications. Otherwise you'll get > quite a few clones instead. > > It appears that devs will have to add the remote for the live > repository after they've cloned the bundle - otherwise they'll just > keep pulling from the bundle which isn't all that convenient. I think you still misunderstand. As I understand Robin, we wouldn't even offer up a clone of the full-history bundle, it would only be offered as a normal download. The default workflow is cloning from the shallow version, which will obviously give you the desired remote. Cheers, Dirkjan
Re: [gentoo-dev] Re: Portage Git migration - clean cut or git-cvsserver
On Sat, Jun 2, 2012 at 8:03 AM, Dirkjan Ochtman wrote: > On Sat, Jun 2, 2012 at 12:59 PM, Rich Freeman wrote: >> It appears that devs will have to add the remote for the live >> repository after they've cloned the bundle - otherwise they'll just >> keep pulling from the bundle which isn't all that convenient. > > I think you still misunderstand. As I understand Robin, we wouldn't > even offer up a clone of the full-history bundle, it would only be > offered as a normal download. The default workflow is cloning from the > shallow version, which will obviously give you the desired remote. I wasn't talking about full-history. I was talking about the fact that we're distributing a bundle. If you clone a bundle, you won't have a remote for the live repository. You would need to add it, unless you plan to never push a commit back to the gentoo repository, and you plan to manually download bundles anytime you want to update your local repository. I'm not sure how exactly Robin was planning on making the full history available, but it sounded like it would also be distributed as a bundle. That means that you can certainly clone it - just type git clone path-to-locally-saved-bundle-file . If it is in some other format like a pack file then you would import it into a repository via a different command. Rich
Re: [gentoo-dev] Re: enhancement for doicon/newicon in eutils.eclass
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/02/2012 06:48 AM, Mike Frysinger wrote: > On Friday 01 June 2012 22:50:10 hasufell wrote: >> On 06/02/2012 12:49 AM, Mike Frysinger wrote: >>> On Wednesday 23 May 2012 21:04:42 hasufell wrote: # @FUNCTION: _iconins # @DESCRIPTION: # function for use in doicon and newicon >>> >>> mark it @INTERNAL > > what i meant here was: # @FUNCTION: _iconins # @INTERNAL # > @DESCRIPTION: # function for use in doicon and newicon > > you can run > /usr/portage/app-portage/eclass-manpages/files/eclass-to-manpage.sh > and the eclass to see if the style is valid -mike K, fixed it. Manpage seems ok with this version. -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJPyikKAAoJEFpvPKfnPDWz0S8IALS+uqaXMt8rOQUXPFjy09nZ 3gK+qJB2m453X21HprWcHEVL4/Exk77wWIWe0uZlFxuRN83+CF39PBIv6Bvr82qe k1jb1+tr2GIK6undHVJXvWOgNzQ1LWIcKL1LOC6gwXlpBXstD5KgyeLpp6Igu7tw GEalfgf5AmZ0v9QFKfR404ucvDs5uzXY1YRaFq6ygEvPRFHSzg7r2cnYdufwqz/R s6R3UpFSEkXh/8J5cvMvk5N70SytB7bPVYUtbRi8N1bA+J8M6Iz4kre2ubn5w3i5 9YcuBxeW89JcBQOpex6UqAGL6BH/l2OVyDZ5+JpToTZXVrcPih5Pc5ilw4wOw7M= =wky2 -END PGP SIGNATURE- # @FUNCTION: _iconins # @INTERNAL # @DESCRIPTION: # function for use in doicon and newicon _iconins() { ( # wrap the env here so that the 'insinto' call # doesn't corrupt the env of the caller local funcname=$1; shift local size dir local context=apps local theme=hicolor while [[ $# -gt 0 ]] ; do case $1 in -s|--size) if [[ ${2%%x*}x${2%%x*} == "$2" ]] ; then size=${2%%x*} else size=${2} fi case ${size} in 16|22|24|32|36|48|64|72|96|128|192|256) size=${size}x${size};; scalable) ;; *) eerror "${size} is an unsupported icon size!" exit 1;; esac shift 2;; -t|--theme) theme=${2} shift 2;; -c|--context) context=${2} shift 2;; *) if [[ -z ${size} ]] ; then insinto /usr/share/pixmaps else insinto /usr/share/icons/${theme}/${size}/${context} fi if [[ ${funcname} == doicon ]] ; then if [[ -f $1 ]] ; then doins "${1}" elif [[ -d $1 ]] ; then shopt -s nullglob doins "${1}"/*.{png,svg} shopt -u nullglob else eerror "${1} is not a valid file/directory!" exit 1 fi else break fi shift 1;; esac done if [[ ${funcname} == newicon ]] ; then newins "$@" fi ) || die } # @FUNCTION: doicon # @USAGE: [options] # @DESCRIPTION: # Install icon into the icon directory /usr/share/icons or into # /usr/share/pixmaps if "--size" is not set. # This is useful in conjunction with creating desktop/menu files. # # @CODE # options: # -s, --size #!!! must specify to install into /usr/share/icons/... !!! #size of the icon, like 48 or 48x48 #supported icon sizes are: #16 22 24 32 36 48 64 72 96 128 192 256 scalable # -c, --context #defaults to "apps" # -t, --theme #defaults to "hicolor" # # icons: list of icons # # example 1: doicon foobar.png fuqbar.svg # results in: insinto /usr/share/pixmaps # doins foobar.png fuqbar.svg # # example 2: doicon -s 48 foobar.png fuqbar.png # results in: insinto /usr/share/icons/hicolor/48x48/apps # doins foobar.png fuqbar.png # @CODE doicon() { _iconins ${FUNCNAME} "$@" } # @FUNCTION: newicon # @USAGE: [options] # @DESCRIPTION: # Like doicon, install the specified icon as newname. # # @CODE # example 1: newicon foobar.png NEWNAME.png # results in: insinto /usr/share/pixmaps # newins foobar.png NEWNAME.png # # example 2: newicon -s 48 foobar.png NEWNAME.png # results in: insinto /usr/share/icons/hicolor/48x48/apps # newins foobar.png NEWNAME.png # @CODE newicon() { _iconins ${FUNCNAME} "$@" }
[gentoo-dev] remote-id cpan-module
* Corentin Chary : > On Thu, May 17, 2012 at 2:02 AM, Kent Fredric wrote: > > On 13 May 2012 07:43, Torsten Veller wrote: > >> It doesn't even list "Moose" for Moose? > > > > Its probably falling outside the initial 10 results, I forgot it did that, > > > >> 02packages.details.txt.gz lists 72 package names for Moose-2.0602. > >> > > > > Need to bolt on a { "size": 100 } to the query to expand how may > > results it will return. > > Updated remotesid.py to use that, correctly add Moose in the diff now ! metadata.dtd was updated per bug #406287 and it contains the cpan-module remote-id. The current patch for dev-perl/* is roughly 800k big: http://dev.gentoo.org/~tove/files/devperlremoteids.patch I am going to update the files in the next days. Now it would be a good time to voice your concerns. -- Thanks Torsten
Re: [gentoo-dev] RFC: Add new remote-id types in metadata.dtd
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Is there any way to verify the data? What programs/scripts use these fields btw? I could imagine a test like (i.e. $a ) does http://www.fs.net/$a exist. Michael - -- Gentoo Dev http://xmw.de/ -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iF4EAREIAAYFAk/KTbwACgkQknrdDGLu8JC47wD/WhUl5SqDnd9SZM7xyYWE3NG6 GN/lt6/0J+7W070szVYA/00BA1r0jBA3i93FpvobgvFiKlZSTLPaN5sJCJrrqULR =Uufg -END PGP SIGNATURE-
[gentoo-dev] Re: [gentoo-commits] gentoo-x86 commit in profiles/default/linux: package.use.mask
I think you meant base/package.use.mask and how about using ChangeLog? On 06/02/2012 07:58 PM, Michael Weber (xmw) wrote: xmw 12/06/02 16:58:09 Modified: package.use.mask Log: dev-db/firebird client Revision ChangesPath 1.30 profiles/default/linux/package.use.mask file : http://sources.gentoo.org/viewvc.cgi/gentoo-x86/profiles/default/linux/package.use.mask?rev=1.30&view=markup plain: http://sources.gentoo.org/viewvc.cgi/gentoo-x86/profiles/default/linux/package.use.mask?rev=1.30&content-type=text/plain diff : http://sources.gentoo.org/viewvc.cgi/gentoo-x86/profiles/default/linux/package.use.mask?r1=1.29&r2=1.30 Index: package.use.mask === RCS file: /var/cvsroot/gentoo-x86/profiles/default/linux/package.use.mask,v retrieving revision 1.29 retrieving revision 1.30 diff -u -r1.29 -r1.30 --- package.use.mask30 Apr 2012 19:13:25 - 1.29 +++ package.use.mask2 Jun 2012 16:58:09 - 1.30 @@ -1,6 +1,10 @@ # Copyright 1999-2012 Gentoo Foundation # Distributed under the terms of the GNU General Public License v2 -# $Header: /var/cvsroot/gentoo-x86/profiles/default/linux/package.use.mask,v 1.29 2012/04/30 19:13:25 ssuominen Exp $ +# $Header: /var/cvsroot/gentoo-x86/profiles/default/linux/package.use.mask,v 1.30 2012/06/02 16:58:09 xmw Exp $ + +# Michael Weber (02 Jun 2012) +# Not fit for production (bug 404403, comment #5) +dev-db/firebird client sys-devel/gcc hardened sys-libs/glibc hardened
Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash
On Saturday 02 June 2012 05:52:01 David Leverton wrote: > Mike Frysinger wrote: > > exec {mj_control_fd}<>${mj_control_pipe} > > I'll have to remember that feature, but unfortunately it's new in bash > 4.1, so unless we're giving up 3.2 as the minimum for the tree lame > > : $(( ++mj_num_jobs )) > > Any reason not to do just > > (( ++mj_num_jobs )) i prefer the portable form -mike signature.asc Description: This is a digitally signed message part.
[gentoo-dev] Re: [gentoo-commits] gentoo-x86 commit in profiles/default/linux: package.use.mask
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 06/02/2012 07:47 PM, Samuli Suominen wrote: > I think you meant base/package.use.mask correct, fixed. > and how about using ChangeLog? I did that in gentoo-x86/base (theres an ChangeLog file) but I didn't see anu ChangeLog in gentoo-x86/profiles/default/linux . michael@x linux 127 % echangelog "dev-db/firebird client: moved to base/package.use.mask" This should be run in a directory with ebuilds... michael@x linux 255 % pwd /usr/portage/profiles/default/linux Sorry and thanks for your pair of eyes. Michael - -- Gentoo Dev http://xmw.de/ -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iF4EAREIAAYFAk/KaWMACgkQknrdDGLu8JCqDAD/eRqx+QRCjgsbNMq9JKDs1SlU IJyyRWiS5E6G61NZs/kA/ioCDq2m/52qsDfg6kG+o+FKHpabMxBcYMZlPD0LHtZS =Mb66 -END PGP SIGNATURE-
[gentoo-dev] Re: [gentoo-commits] gentoo-x86 commit in profiles/default/linux: package.use.mask
On 06/02/2012 10:28 PM, Michael Weber wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 06/02/2012 07:47 PM, Samuli Suominen wrote: I think you meant base/package.use.mask correct, fixed. and how about using ChangeLog? I did that in gentoo-x86/base (theres an ChangeLog file) but I didn't see anu ChangeLog in gentoo-x86/profiles/default/linux If there isn't one in the directory, then you "cd .." until you find a ChangeLog file and that's the one you should be using. To clarify: echangelog is to be used *for every* profiles/ directory, but not *in every* directory. :-) . michael@x linux 127 % echangelog "dev-db/firebird client: moved to base/package.use.mask" This should be run in a directory with ebuilds... michael@x linux 255 % pwd /usr/portage/profiles/default/linux Sorry and thanks for your pair of eyes. Michael - -- Gentoo Dev http://xmw.de/ -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iF4EAREIAAYFAk/KaWMACgkQknrdDGLu8JCqDAD/eRqx+QRCjgsbNMq9JKDs1SlU IJyyRWiS5E6G61NZs/kA/ioCDq2m/52qsDfg6kG+o+FKHpabMxBcYMZlPD0LHtZS =Mb66 -END PGP SIGNATURE-
Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash
v2 -mike # Copyright 1999-2012 Gentoo Foundation # Distributed under the terms of the GNU General Public License v2 # $Header: $ # @ECLASS: multiprocessing.eclass # @MAINTAINER: # base-sys...@gentoo.org # @AUTHOR: # Brian Harring # Mike Frysinger # @BLURB: parallelization with bash (wtf?) # @DESCRIPTION: # The multiprocessing eclass contains a suite of functions that allow ebuilds # to quickly run things in parallel using shell code. # @EXAMPLE: # # @CODE # # First initialize things: # multijob_init # # # Then hash a bunch of files in parallel: # for n in {0..20} ; do # multijob_child_init md5sum data.${n} > data.${n} # done # # # Then wait for all the children to finish: # multijob_finish # @CODE if [[ ${___ECLASS_ONCE_MULTIPROCESSING} != "recur -_+^+_- spank" ]] ; then ___ECLASS_ONCE_MULTIPROCESSING="recur -_+^+_- spank" # @FUNCTION: makeopts_jobs # @USAGE: [${MAKEOPTS}] # @DESCRIPTION: # Searches the arguments (defaults to ${MAKEOPTS}) and extracts the jobs number # specified therein. Useful for running non-make tools in parallel too. # i.e. if the user has MAKEOPTS=-j9, this will echo "9" -- we can't return the # number as bash normalizes it to [0, 255]. If the flags haven't specified a # -j flag, then "1" is shown as that is the default `make` uses. Since there's # no way to represent infinity, we return 999 if the user has -j without a number. makeopts_jobs() { [[ $# -eq 0 ]] && set -- ${MAKEOPTS} # This assumes the first .* will be more greedy than the second .* # since POSIX doesn't specify a non-greedy match (i.e. ".*?"). local jobs=$(echo " $* " | sed -r -n \ -e 's:.*[[:space:]](-j|--jobs[=[:space:]])[[:space:]]*([0-9]+).*:\2:p' \ -e 's:.*[[:space:]](-j|--jobs)[[:space:]].*:999:p') echo ${jobs:-1} } # @FUNCTION: redirect_alloc_fd # @USAGE: [redirection] # @DESCRIPTION: # Find a free fd and redirect the specified file via it. Store the new # fd in the specified variable. Useful for the cases where we don't care # about the exact fd #. redirect_alloc_fd() { local var=$1 file=$2 redir=${3:-"<>"} if [[ $(( (BASH_VERSINFO[0] << 8) + BASH_VERSINFO[1] )) -ge $(( (4 << 8) + 1 )) ]] ; then # Newer bash provides this functionality. eval "exec {${var}}${redir}'${file}'" else # Need to provide the functionality ourselves. local fd=10 while :; do if [[ ! -L /dev/fd/${fd} ]] ; then eval "exec ${fd}${redir}'${file}'" && break fi [[ ${fd} -gt 1024 ]] && return 1 # sanity : $(( ++fd )) done : $(( ${var} = fd )) fi } # @FUNCTION: multijob_init # @USAGE: [${MAKEOPTS}] # @DESCRIPTION: # Setup the environment for executing code in parallel. # You must call this before any other multijob function. multijob_init() { # When something goes wrong, try to wait for all the children so we # don't leave any zombies around. has wait ${EBUILD_DEATH_HOOKS} || EBUILD_DEATH_HOOKS+=" wait" # Setup a pipe for children to write their pids to when they finish. mj_control_pipe="${T}/multijob.pipe" mkfifo "${mj_control_pipe}" redirect_alloc_fd mj_control_fd "${mj_control_pipe}" rm -f "${mj_control_pipe}" # See how many children we can fork based on the user's settings. mj_max_jobs=$(makeopts_jobs "$@") mj_num_jobs=0 } # @FUNCTION: multijob_child_init # @USAGE: [command to run in background] # @DESCRIPTION: # This function has two forms. You can use it to execute a simple command # in the background (and it takes care of everything else), or you must # call this first thing in your forked child process. # # @CODE # # 1st form: pass the command line as arguments: # multijob_child_init ls /dev # # # 2nd form: execute multiple stuff in the background: # ( # multijob_child_init # out=`ls` # if echo "${out}" | grep foo ; then # echo "YEAH" # fi # ) & # multijob_post_fork # @CODE multijob_child_init() { if [[ $# -eq 0 ]] ; then trap 'echo ${BASHPID} $? >&'${mj_control_fd} EXIT trap 'exit 1' INT TERM else ( multijob_child_init ; "$@" ) & multijob_post_fork fi } # @FUNCTION: multijob_post_fork # @DESCRIPTION: # You must call this in the parent process after forking a child process. # If the parallel limit has been hit, it will wait for one child to finish # and return the its exit status. multijob_post_fork() { [[ $# -eq 0 ]] || die "${FUNCNAME} takes no arguments" : $(( ++mj_num_jobs )) if [[ ${mj_num_jobs} -ge ${mj_max_jobs} ]] ; then multijob_finish_one fi return $? } # @FUNCTION: multijob_finish_one # @DESCRIPTION: # Wait for a single proc
Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash
On 06/02/2012 12:54 PM, Mike Frysinger wrote: > # @FUNCTION: redirect_alloc_fd > # @USAGE: [redirection] > # @DESCRIPTION: > # Find a free fd and redirect the specified file via it. Store the new > # fd in the specified variable. Useful for the cases where we don't care > # about the exact fd #. > redirect_alloc_fd() { > local var=$1 file=$2 redir=${3:-"<>"} > > if [[ $(( (BASH_VERSINFO[0] << 8) + BASH_VERSINFO[1] )) -ge $(( (4 << > 8) + 1 )) ]] ; then > # Newer bash provides this functionality. > eval "exec {${var}}${redir}'${file}'" > else > # Need to provide the functionality ourselves. > local fd=10 > while :; do > if [[ ! -L /dev/fd/${fd} ]] ; then > eval "exec ${fd}${redir}'${file}'" && break > fi > [[ ${fd} -gt 1024 ]] && return 1 # sanity > : $(( ++fd )) > done > : $(( ${var} = fd )) > fi > } I launched up a GhostBSD livedvd to see what /dev/fd/ looks like on FreeBSD, and it seems to contain plain character devices instead of symlinks to character devices: [ghostbsd@livecd ~]$ uname -a FreeBSD livecd 9.0-RELEASE FreeBSD 9.0-RELEASE #0: Sun Jan 15 17:17:43 AST 2012 r...@ericbsd.ghostbsd.org:/usr/obj/i386.i386/usr/src/sys/GHOSTBSD i386 [ghostbsd@livecd ~]$ ls -l /dev/fd/ total 0 crw-rw-rw- 1 root wheel0, 19 Jun 2 20:15 0 crw-rw-rw- 1 root wheel0, 21 Jun 2 20:15 1 crw-rw-rw- 1 root wheel0, 23 Jun 2 20:15 2 -- Thanks, Zac
[gentoo-dev] Last rites dev-perl/Cflow, dev-lang/eleven and net-im/silc-client
Masked until fixed or removed: # Install perl modules in the site branch (#280728) # - dev-perl/Cflow (#391391) # - dev-lang/eleven (#295118) # - net-im/silc-client (#294854) dev-perl/Cflow dev-lang/eleven net-im/silc-client
Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash
On Saturday 02 June 2012 16:39:16 Zac Medico wrote: > On 06/02/2012 12:54 PM, Mike Frysinger wrote: > > if [[ ! -L /dev/fd/${fd} ]] ; then > > eval "exec ${fd}${redir}'${file}'" && break > > fi > > I launched up a GhostBSD livedvd to see what /dev/fd/ looks like on > FreeBSD, and it seems to contain plain character devices instead of > symlinks to character devices: i didn't want to use [ -e ] because of broken links, but it seems that Linux has diff semantics with /proc and broken symlinks. `test -e` will return true. -mike signature.asc Description: This is a digitally signed message part.
Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash
On Sat, 2 Jun 2012 15:54:03 -0400 Mike Frysinger wrote: > # @FUNCTION: redirect_alloc_fd > # @USAGE: [redirection] > # @DESCRIPTION: (...and a lot of code) I may be wrong but wouldn't it be simpler to just stick with a named pipe here? Well, at first glance you wouldn't be able to read exactly one result at a time but is it actually useful? -- Best regards, Michał Górny signature.asc Description: PGP signature
Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash
On 06/02/2012 02:31 PM, Michał Górny wrote: > On Sat, 2 Jun 2012 15:54:03 -0400 > Mike Frysinger wrote: > >> # @FUNCTION: redirect_alloc_fd >> # @USAGE: [redirection] >> # @DESCRIPTION: > > (...and a lot of code) > > I may be wrong but wouldn't it be simpler to just stick with a named > pipe here? Well, at first glance you wouldn't be able to read exactly > one result at a time but is it actually useful? I'm pretty sure that the pipe has remain constantly open in read mode (which can only be done by assigning it a file descriptor). Otherwise, there's a race condition that can occur, where a write is lost because it's written just before the reader closes the pipe. -- Thanks, Zac
Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash
On 06/02/2012 02:12 PM, Mike Frysinger wrote: > On Saturday 02 June 2012 16:39:16 Zac Medico wrote: >> On 06/02/2012 12:54 PM, Mike Frysinger wrote: >>> if [[ ! -L /dev/fd/${fd} ]] ; then >>> eval "exec ${fd}${redir}'${file}'" && break >>> fi >> >> I launched up a GhostBSD livedvd to see what /dev/fd/ looks like on >> FreeBSD, and it seems to contain plain character devices instead of >> symlinks to character devices: > > i didn't want to use [ -e ] because of broken links, but it seems that Linux > has diff semantics with /proc and broken symlinks. `test -e` will return > true. > -mike How about if we just create a fallback mode for older bash, where no pipes are involved, and multijob_post_fork just uses `wait` to check status and effectively causes only one job to execute at a time? -- Thanks, Zac
Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash
On Sat, Jun 02, 2012 at 03:50:06PM -0700, Zac Medico wrote: > On 06/02/2012 02:31 PM, Micha?? G??rny wrote: > > On Sat, 2 Jun 2012 15:54:03 -0400 > > Mike Frysinger wrote: > > > >> # @FUNCTION: redirect_alloc_fd > >> # @USAGE: [redirection] > >> # @DESCRIPTION: > > > > (...and a lot of code) > > > > I may be wrong but wouldn't it be simpler to just stick with a named > > pipe here? Well, at first glance you wouldn't be able to read exactly > > one result at a time but is it actually useful? > > I'm pretty sure that the pipe has remain constantly open in read mode > (which can only be done by assigning it a file descriptor). Otherwise, > there's a race condition that can occur, where a write is lost because > it's written just before the reader closes the pipe. There isn't a race; write side, it'll block once it exceeds pipe buf size; read side, bash's read functionality is explicitly byte by byte reads to avoid consuming data it doesn't need. That said, Mgorny's suggestion ignores that the the code already is pointed at a fifo. Presume he's suggesting "Just open it everytime you need to fuck with it"... which, sure, 'cept that complicates the read side (either having to find a free fd, open to it, then close it), or abuse cat or $(<) to pull the results and make the reclaim code handle multiple results in a single shot. Frankly, don't see the point in doing that. The code isn't that complex frankly, and we *need* the overhead of this to be minimal- the hand off/reclaim is effectively the bottleneck for scaling. If the jobs you've backgrounded are a second a piece, it matters less; if they're quick little bursts of activity, the scaling *will* be limited by how fast we can blast off/reclaim jobs. Keep in mind that the main process has to go find more work to queue up between the reclaims, thus this matters more than you'd think. Either way, that limit varies dependent on time required for each job vs # of cores; that said, you run code like this on a 48 core and you see it start becoming an actual bottleneck (which is why I came up with this hacky bash semaphore). ~harring
Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash
On Saturday 02 June 2012 19:29:29 Zac Medico wrote: > On 06/02/2012 02:12 PM, Mike Frysinger wrote: > > On Saturday 02 June 2012 16:39:16 Zac Medico wrote: > >> On 06/02/2012 12:54 PM, Mike Frysinger wrote: > >>> if [[ ! -L /dev/fd/${fd} ]] ; then > >>> eval "exec ${fd}${redir}'${file}'" && break > >>> fi > >> > >> I launched up a GhostBSD livedvd to see what /dev/fd/ looks like on > >> FreeBSD, and it seems to contain plain character devices instead of > > > >> symlinks to character devices: > > i didn't want to use [ -e ] because of broken links, but it seems that > > Linux has diff semantics with /proc and broken symlinks. `test -e` will > > return true. > > How about if we just create a fallback mode for older bash, where no > pipes are involved, and multijob_post_fork just uses `wait` to check > status and effectively causes only one job to execute at a time? hmm, maybe, but i've already written the code to support older versions :) -mike signature.asc Description: This is a digitally signed message part.
Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash
On Fri, Jun 01, 2012 at 06:41:22PM -0400, Mike Frysinger wrote: > # @FUNCTION: multijob_post_fork > # @DESCRIPTION: > # You must call this in the parent process after forking a child process. > # If the parallel limit has been hit, it will wait for one to finish and > # return the child's exit status. > multijob_post_fork() { > [[ $# -eq 0 ]] || die "${FUNCNAME} takes no arguments" > > : $(( ++mj_num_jobs )) > if [[ ${mj_num_jobs} -ge ${mj_max_jobs} ]] ; then > multijob_finish_one > fi > return $? > } Minor note; the design of this (fork then check), means when a job finishes, we'll not be ready with more work. This implicitly means that given a fast job identification step (main thread), and a slower job execution (what's backgrounded), we'll not breach #core of parallelism, nor will we achieve that level either (meaning potentially some idle cycles left on the floor). Realistically, the main thread (what invokes post_fork) is *likely*, (if the consumer isn't fricking retarded) to be doing minor work- mostly just poking about figuring out what the next task/arguments are to submit to the pool. That work isn't likely to be a full core worth of work, else as I said, the consumer is being a retard. The original form of this was designed around the assumption that the main thread was light, and the backgrounded jobs weren't, thus it basically did the equivalent of make -j+1, allowing #cores background jobs running, while allowing the main thread to continue on and get the next job ready, once it had that ready, it would block waiting for a slot to open, then immediately submit the job once it had done a reclaim. On the surface of it, it's a minor difference, but having the next job immediately ready to fire makes it easier to saturate cores. Unfortunately, that also changes your API a bit; your call. ~harring
[gentoo-dev] metadata/md5-cache
What's up with md5-cache? Every syn has to pull the entire md5-cache hierarchy over again, as if some daemon re-creates every file every day, rather than only re-writing those files which need updates and adding/removing those which need that. Even if only the files metatdata changes, that still adds a significant cost to an rsync. It is important that md5-cache files which do not require change be left alone. Not everyone has gobs of network bandwidth available. -JimC -- James Cloos OpenPGP: 1024D/ED7DAEA6
Re: [gentoo-dev] metadata/md5-cache
On 06/02/2012 05:32 PM, James Cloos wrote: > What's up with md5-cache? > > Every syn has to pull the entire md5-cache hierarchy over again, as if > some daemon re-creates every file every day, rather than only re-writing > those files which need updates and adding/removing those which need that. We had a bug about that [1] when we first deployed md5-cache, but it's supposed to have been fixed. > Even if only the files metatdata changes, that still adds a significant > cost to an rsync. > > It is important that md5-cache files which do not require change be left > alone. There's code in portage to avoid redundant cache writes [2]. Eclass modifications can still trigger lots of cache changes though, especially eutils.eclass (which most ebuilds inherit). > Not everyone has gobs of network bandwidth available. > > -JimC [1] https://bugs.gentoo.org/show_bug.cgi?id=410505 [2] http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=0e120da008c9d0d41c9372c81145c6e153028a6d -- Thanks, Zac
Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash
On 06/02/2012 04:47 PM, Brian Harring wrote: > On Sat, Jun 02, 2012 at 03:50:06PM -0700, Zac Medico wrote: >> On 06/02/2012 02:31 PM, Micha?? G??rny wrote: >>> On Sat, 2 Jun 2012 15:54:03 -0400 >>> Mike Frysinger wrote: >>> # @FUNCTION: redirect_alloc_fd # @USAGE: [redirection] # @DESCRIPTION: >>> >>> (...and a lot of code) >>> >>> I may be wrong but wouldn't it be simpler to just stick with a named >>> pipe here? Well, at first glance you wouldn't be able to read exactly >>> one result at a time but is it actually useful? >> >> I'm pretty sure that the pipe has remain constantly open in read mode >> (which can only be done by assigning it a file descriptor). Otherwise, >> there's a race condition that can occur, where a write is lost because >> it's written just before the reader closes the pipe. > > There isn't a race; write side, it'll block once it exceeds pipe buf > size; read side, bash's read functionality is explicitly byte by byte > reads to avoid consuming data it doesn't need. I've created a little test case and it seems you're right that nothing is lost. -- Thanks, Zac named_pipe_check_for_lost_write.sh Description: application/shellscript
Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash
On 06/02/2012 06:04 PM, Zac Medico wrote: > On 06/02/2012 04:47 PM, Brian Harring wrote: >> On Sat, Jun 02, 2012 at 03:50:06PM -0700, Zac Medico wrote: >>> On 06/02/2012 02:31 PM, Micha?? G??rny wrote: On Sat, 2 Jun 2012 15:54:03 -0400 Mike Frysinger wrote: > # @FUNCTION: redirect_alloc_fd > # @USAGE: [redirection] > # @DESCRIPTION: (...and a lot of code) I may be wrong but wouldn't it be simpler to just stick with a named pipe here? Well, at first glance you wouldn't be able to read exactly one result at a time but is it actually useful? >>> >>> I'm pretty sure that the pipe has remain constantly open in read mode >>> (which can only be done by assigning it a file descriptor). Otherwise, >>> there's a race condition that can occur, where a write is lost because >>> it's written just before the reader closes the pipe. >> >> There isn't a race; write side, it'll block once it exceeds pipe buf >> size; read side, bash's read functionality is explicitly byte by byte >> reads to avoid consuming data it doesn't need. > > I've created a little test case and it seems you're right that nothing > is lost. Actually, I forgot the mkfifo call, so it was writing a regular file. With it the fifo, the write appears to be lost, as I originally suspected. -- Thanks, Zac named_pipe_check_for_lost_write.sh Description: application/shellscript
Re: [gentoo-dev] metadata/md5-cache
> "ZM" == Zac Medico writes: Thanks for the quick reply and the reference to the bz. ZM> We had a bug about that [1] when we first deployed md5-cache, but it's ZM> supposed to have been fixed. It is not fixed. The behavior has not changed in any way since md5-cache was added. ZM> [1] https://bugs.gentoo.org/show_bug.cgi?id=410505 I've added a please re-open note to that bug. Thanks for working on it. -JimC -- James Cloos OpenPGP: 1024D/ED7DAEA6
Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash
On Saturday 02 June 2012 19:59:02 Brian Harring wrote: > On Fri, Jun 01, 2012 at 06:41:22PM -0400, Mike Frysinger wrote: > > # @FUNCTION: multijob_post_fork > > # @DESCRIPTION: > > # You must call this in the parent process after forking a child process. > > # If the parallel limit has been hit, it will wait for one to finish and > > # return the child's exit status. > > multijob_post_fork() { > > > > [[ $# -eq 0 ]] || die "${FUNCNAME} takes no arguments" > > > > : $(( ++mj_num_jobs )) > > > > if [[ ${mj_num_jobs} -ge ${mj_max_jobs} ]] ; then > > > > multijob_finish_one > > > > fi > > return $? > > > > } > > Minor note; the design of this (fork then check), means when a job > finishes, we'll not be ready with more work. This implicitly means > that given a fast job identification step (main thread), and a slower > job execution (what's backgrounded), we'll not breach #core of > parallelism, nor will we achieve that level either (meaning > potentially some idle cycles left on the floor). > > Realistically, the main thread (what invokes post_fork) is *likely*, > (if the consumer isn't fricking retarded) to be doing minor work- > mostly just poking about figuring out what the next task/arguments > are to submit to the pool. That work isn't likely to be a full core > worth of work, else as I said, the consumer is being a retard. > > The original form of this was designed around the assumption that the > main thread was light, and the backgrounded jobs weren't, thus it > basically did the equivalent of make -j+1, allowing #cores > background jobs running, while allowing the main thread to continue on > and get the next job ready, once it had that ready, it would block > waiting for a slot to open, then immediately submit the job once it > had done a reclaim. the original code i designed this around had a heavier main thread because it had series of parallel sections followed by serial followed by parallel where the serial regions didn't depend on the parallel finishing right away. that and doing things post meant it was easier to pass up return values because i didn't have to save $? anywhere ;). thinking a bit more, i don't think the two methods are mutually exclusive. it's easy to have the code support both, but i'm not sure the extended documentation helps. -mike signature.asc Description: This is a digitally signed message part.
Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash
v3 -mike # Copyright 1999-2012 Gentoo Foundation # Distributed under the terms of the GNU General Public License v2 # $Header: $ # @ECLASS: multiprocessing.eclass # @MAINTAINER: # base-sys...@gentoo.org # @AUTHOR: # Brian Harring # Mike Frysinger # @BLURB: parallelization with bash (wtf?) # @DESCRIPTION: # The multiprocessing eclass contains a suite of functions that allow ebuilds # to quickly run things in parallel using shell code. # # It has two modes: pre-fork and post-fork. If you don't want to dive into any # more nuts & bolts, just use the pre-fork mode. For main threads that mostly # spawn children and then wait for them to finish, use the pre-fork mode. For # main threads that do a bit of processing themselves, use the post-fork mode. # You may mix & match them for longer computation loops. # @EXAMPLE: # # @CODE # # First initialize things: # multijob_init # # # Then hash a bunch of files in parallel: # for n in {0..20} ; do # multijob_child_init md5sum data.${n} > data.${n} # done # # # Then wait for all the children to finish: # multijob_finish # @CODE if [[ ${___ECLASS_ONCE_MULTIPROCESSING} != "recur -_+^+_- spank" ]] ; then ___ECLASS_ONCE_MULTIPROCESSING="recur -_+^+_- spank" # @FUNCTION: makeopts_jobs # @USAGE: [${MAKEOPTS}] # @DESCRIPTION: # Searches the arguments (defaults to ${MAKEOPTS}) and extracts the jobs number # specified therein. Useful for running non-make tools in parallel too. # i.e. if the user has MAKEOPTS=-j9, this will echo "9" -- we can't return the # number as bash normalizes it to [0, 255]. If the flags haven't specified a # -j flag, then "1" is shown as that is the default `make` uses. Since there's # no way to represent infinity, we return 999 if the user has -j without a number. makeopts_jobs() { [[ $# -eq 0 ]] && set -- ${MAKEOPTS} # This assumes the first .* will be more greedy than the second .* # since POSIX doesn't specify a non-greedy match (i.e. ".*?"). local jobs=$(echo " $* " | sed -r -n \ -e 's:.*[[:space:]](-j|--jobs[=[:space:]])[[:space:]]*([0-9]+).*:\2:p' \ -e 's:.*[[:space:]](-j|--jobs)[[:space:]].*:999:p') echo ${jobs:-1} } # @FUNCTION: multijob_init # @USAGE: [${MAKEOPTS}] # @DESCRIPTION: # Setup the environment for executing code in parallel. # You must call this before any other multijob function. multijob_init() { # When something goes wrong, try to wait for all the children so we # don't leave any zombies around. has wait ${EBUILD_DEATH_HOOKS} || EBUILD_DEATH_HOOKS+=" wait" # Setup a pipe for children to write their pids to when they finish. local pipe="${T}/multijob.pipe" mkfifo "${pipe}" redirect_alloc_fd mj_control_fd "${pipe}" rm -f "${pipe}" # See how many children we can fork based on the user's settings. mj_max_jobs=$(makeopts_jobs "$@") mj_num_jobs=0 } # @FUNCTION: multijob_child_init # @USAGE: [--pre|--post] [command to run in background] # @DESCRIPTION: # This function has two forms. You can use it to execute a simple command # in the background (and it takes care of everything else), or you must # call this first thing in your forked child process. # # The --pre/--post options allow you to select the child generation mode. # # @CODE # # 1st form: pass the command line as arguments: # multijob_child_init ls /dev # # Or if you want to use pre/post fork modes: # multijob_child_init --pre ls /dev # multijob_child_init --post ls /dev # # # 2nd form: execute multiple stuff in the background (post fork): # ( # multijob_child_init # out=`ls` # if echo "${out}" | grep foo ; then # echo "YEAH" # fi # ) & # multijob_post_fork # # # 2nd form: execute multiple stuff in the background (pre fork): # multijob_pre_fork # ( # multijob_child_init # out=`ls` # if echo "${out}" | grep foo ; then # echo "YEAH" # fi # ) & # @CODE multijob_child_init() { local mode="pre" case $1 in --pre) mode="pre" ; shift ;; --post) mode="post"; shift ;; esac if [[ $# -eq 0 ]] ; then trap 'echo ${BASHPID} $? >&'${mj_control_fd} EXIT trap 'exit 1' INT TERM else local ret [[ ${mode} == "pre" ]] && { multijob_pre_fork; ret=$?; } ( multijob_child_init ; "$@" ) & [[ ${mode} == "post" ]] && { multijob_post_fork; ret=$?; } return ${ret} fi } # @FUNCTION: _multijob_fork # @INTERNAL # @DESCRIPTION: # Do the actual book keeping. _multijob_fork() { [[ $# -eq 1 ]] || die "incorrect number of arguments" local ret=0 [[ $1 == "pre" ]] && : $(( ++mj_num_jobs )) if [[ ${mj_num_jobs} -ge ${mj_max_jobs} ]] ; then multijob_finish_one ret=$? fi [[ $1 == "post" ]] && : $(( ++mj_num_jobs )) return ${ret} } # @FUNCTION: multijob_pre_fork # @DES
Re: [gentoo-dev] metadata/md5-cache
On 06/02/2012 08:52 PM, James Cloos wrote: >> "ZM" == Zac Medico writes: > > Thanks for the quick reply and the reference to the bz. > > ZM> We had a bug about that [1] when we first deployed md5-cache, but it's > ZM> supposed to have been fixed. > > It is not fixed. The behavior has not changed in any way since md5-cache was > added. As I've noted on the bug, a simple mtime check on the cache entries seems to indicate that it's working properly. > ZM> [1] https://bugs.gentoo.org/show_bug.cgi?id=410505 > > I've added a please re-open note to that bug. > > Thanks for working on it. One way that we can reduce the amount of cache regeneration is to add support for elibs: http://www.gentoo.org/proj/en/glep/glep-0033.htm Since elibs aren't allowed to modify the ebuild metadata, the metadata cache doesn't need to be regenerated when elibs are modified. For example, if eutils was an elib, we would avoid a lot of cache regeneration each time it was modified. -- Thanks, Zac
Re: [gentoo-dev] multiprocessing.eclass: doing parallel work in bash
On 06/02/2012 10:05 PM, Mike Frysinger wrote: > On Saturday 02 June 2012 19:59:02 Brian Harring wrote: >> On Fri, Jun 01, 2012 at 06:41:22PM -0400, Mike Frysinger wrote: >>> # @FUNCTION: multijob_post_fork >>> # @DESCRIPTION: >>> # You must call this in the parent process after forking a child process. >>> # If the parallel limit has been hit, it will wait for one to finish and >>> # return the child's exit status. >>> multijob_post_fork() { >>> >>> [[ $# -eq 0 ]] || die "${FUNCNAME} takes no arguments" >>> >>> : $(( ++mj_num_jobs )) >>> >>> if [[ ${mj_num_jobs} -ge ${mj_max_jobs} ]] ; then >>> >>> multijob_finish_one >>> >>> fi >>> return $? >>> >>> } >> >> Minor note; the design of this (fork then check), means when a job >> finishes, we'll not be ready with more work. This implicitly means >> that given a fast job identification step (main thread), and a slower >> job execution (what's backgrounded), we'll not breach #core of >> parallelism, nor will we achieve that level either (meaning >> potentially some idle cycles left on the floor). >> >> Realistically, the main thread (what invokes post_fork) is *likely*, >> (if the consumer isn't fricking retarded) to be doing minor work- >> mostly just poking about figuring out what the next task/arguments >> are to submit to the pool. That work isn't likely to be a full core >> worth of work, else as I said, the consumer is being a retard. >> >> The original form of this was designed around the assumption that the >> main thread was light, and the backgrounded jobs weren't, thus it >> basically did the equivalent of make -j+1, allowing #cores >> background jobs running, while allowing the main thread to continue on >> and get the next job ready, once it had that ready, it would block >> waiting for a slot to open, then immediately submit the job once it >> had done a reclaim. > > the original code i designed this around had a heavier main thread because it > had series of parallel sections followed by serial followed by parallel where > the serial regions didn't depend on the parallel finishing right away. that > and doing things post meant it was easier to pass up return values because i > didn't have to save $? anywhere ;). > > thinking a bit more, i don't think the two methods are mutually exclusive. > it's easy to have the code support both, but i'm not sure the extended > documentation helps. Can't you just add a multijob_pre_fork function and do your waiting in there instead of in the multijob_post_fork function? -- Thanks, Zac