On 05/31/10 10:45 AM, Dr. David Kirkby wrote:
On 05/31/10 01:58 AM, MartinX wrote:

PS I meant type i5-750 processor not 750i in the original post.

I've never used such a processor myself, but I'm not surprised there are
not tuning parameters as that processor is quite new. Fortunately, ATLAS
does not have to tune on my 3.333 GHz Intel Xeon W3580, though I suspect
there are no parameters that have actually been optimised for my Xeon,
but there is probably some generic Xeon code.

Dave

You could try making a package using the latest (unstable) ATLAS snapshot. The ChangeLog shows many new processors added since the release in Sage.

I've posted below the bit of the changelog showing changes since the release of ATLAS in Sage. Note however these are all in the 'unstable'. Also, creating a new package will not be easy, as there are ton's of Sage-specific fixes to ATLAS

Nothing in here unfortunately looks as though it will resolve my Solaris issues, but it might just resolve issues on your processor.

Dave

ATLAS 3.9.24 released 04/21/10, changes from 3.9.23:
   * Should see a roughly doubling in performance of L2's SYR2/HER2
   * Addition of new BLAS2.5-like kernel, GER2 (rank-2 update)
     - A = alpha*x*y + beta*w*z + A
   * Native ATLAS support for xGELS and all subsidiary routines, including
     C and Fortran interfaces for GELS
     - Internal routs not yet exposed in C/F77 iface include:
       + ORM[[QL,QR,LQ,RQ]  -> UNM* called ORM for complex
       + GE[QL,QR,LQ,RQ]2 (unblocked QR)
       + GE[QL,QR,LQ,RQ]R (recursive QR)
       + LADIV, LAPY2, LAPY3
       + LARFB, LARFT (F77 ifaces, but no C ifaces)
       + LARF, LARFG, LARFP
       + LASCL (not supported for banded matrices)
     - Of these, should definitely expose UNM/ORM at iface level
   * Addition of [D,S]LAMCH for both C & F77 interfaces
   * Fixed slvtst (LU & QR) to use norm of original A, not factored matrix
     in computing solve residual
   * Chad fixed a bug in the SSE generator in type casting for stores
   * Changed it so unknown LAPACK routs are given ATLAS's NB for NB,
     rather than 1
   * Fixed bug in r1hgen.c where Level 1 & 2 blocking were hugely inflated
     (leading to no effective blocking)
   * Updated archinfo_linux to recognize "PPC970MP" as a G5
ATLAS 3.9.23 released 02/07/10, changes from 3.9.22:
   * Fixed dependency error in ATLAS/makes/Make.mmtune
   * Improved mmflagsearch, so we now have O(N) greedy search as default
     -> if you pass -f gcc, will gen most opt-related gcc flags in gccflags.txt
   * Improved flags used on PowerPC G4 & G5
   * Updated some architectural defaults:
     - Corei764SSE3, PPCG564AltiVec, PPCG4AltiVec, MIPSICE964
ATLAS 3.9.22 released 02/05/10, changes from 3.9.21
   * Fixed long-standing bug in cleanup code generation -- this bug has been
     in package since we've generated cleanup, and it causes malformed ifs
     that select cleanup code; most commonly it creates uncompilable code,
     but it could also result in using a suboptimal cleanup kernel.
   * Fixed another long-standing bug in cleanup code generation, this
     one involving not building enough fixed=1 clean cases if there are
     higher imult cleanup cases in the Q.  This resulted in errors in
     cleanup answers.
   * Complete rewrite of search for finding best generated kernel to use
     new test/time infrastructure.  See ATLAS/tune/blas/gemm/gmmsearch for
     new search.  Cleanup and no-copy still uses old search, which is renamed
     ATLAS/tune/blas/gemm/mmcuncpsearch.c.  New search driver is mmsearch.c
   * Chad fixed several bugs in the SSE generator relating to type casting
   * Fixed genparse's DupString to handle NULL pointers
   * Fixed erroneous include of atlas_misc.h in clapack.h
   * Added a compiler flag search to ease job of finding good flags.
     - ATLAS/tune/blas/gemm/mmflagsearch.c
   * Arch def changes:
     - Updated G4 defs -- reduced perf due to gcc PPC performance bug
     - Corei7464SSE3: negated ?MMRES.sum mflop values
     - AMD64K10h64SSE3 : updated to new style
     - Core264SSE3 : updated to new style
   * Some PowerPC-specific fixes:
     - Fixed it so configure can autodetect clock speed on G4/Linux
     - Fixed it so ATLAS always assumes gnu gcc altivec handling on PowerPC
     - Renamed vector registers to numbers just like GPRs (fixes Linux/PPC
       assembly, and related altivec probe)
ATLAS 3.9.21 released 01/11/10, changes from 3.9.20
   * Fixed error in threaded SYMM, where recursion had bad pointer
   * Created ability to tune threaded/serial crossover points, see
        ATLAS/tune/blas/gemm/txover.c
   * Improved CacheEdge detection
   * Fixed bug in configure for --shared on archs w/o f77 compiler
   * Updated lanbtst to work wt new QR naming scheme, and to compile
     correctly for lanbtime (was not using lapack's ILAENV in this case)
ATLAS 3.9.20 released 12/21/09, changes from 3.9.19
   * Fixed bug in call to memcpy by casting all MulBySize to size_t
   * Fixed several ilaenv-related errors, including QR always using serial parms
   * Made it so ORMQR and UNMQR variants use QR's tuned NB
   * Fixed error in complex gemoveT & gemoveC (src/auxil)
   * Made gemoveT & C TLB-aware
   * Added src/auxil/ATL_sqtrans to do TLB-aware in-place square transpose
   * If M==N, then RQ & LQ (row-major) do in-place transpose and call
     QL or QR (column-major).  This gives ~10% performance improvement.
   * Added F77 interface for xLARFT and xLARFB
ATLAS 3.9.19 released 12/08/09, changes from 3.9.18
   * Got rid of files in C2F now being provided natively by ATLAS:
     - larft, geqrf, geqlf, gerqf, gelqf, geqrf,
   * Fixed duplication of unmqr_wrk symbols
   * Removed use of SAFMIN global variable in larfb/larfg
ATLAS 3.9.18 released 12/05/09, changes from 3.9.17
   * Found & fixed error in threaded GEMM
   * Fixed bug where lanbtst_pt didn't set NB
   * Modified mmksearch_sse.c to try gcc & sse flags if native compiler
     can't handle the generated files.
   * Rewrote LAPACK/QR NB tuning
     - now uses ATLAS/tune/lapack/lanbsrch rather than bin/lanbtst (faster)
     - Now done by default
   * Numerous errors fixed involving architecture default timing (all levels)
   * Modified atlas_install to keep track of times for every part of install,
     so we can see where time is spent
   * Architectural default related changes:
     - Fixed ArchNew target in building arch defs to negate .sum files
     - Core264SSE & AMD64K10h64SSE needed negative values in .sum files
     - Updated Core264SSE, AMD64K10h64SSE, HAMMER64SSE3 to get new threaded
       lapack, and full .sum support
ATLAS 3.9.17 released 11/15/09, changes from 3.9.16
   * Chad's SSE GEMM generator now works for CGEMM
     - Provides faster (CGEMM) arch defs for Core264SSE3
   * Addition of householder factorizations (mostly written by Siju Samuel):
     - F77 & C interface, C supports row/col- major
     - GEQRF GEQLF GERQF GELQF
     - tester is qrtst.c in ATLAS/bin/
     - Retuned LAPACK's QR NB arch defs for AMD64K10h64SSE3 & Core264SSE3
   * Fixed seg fault in ummsearch caused by mmksearch_sse failure
   * Rewrote Write[MM,MV,R1]File to get around gcc bug
   * Fixed bugs in ATLAS/src/auxil/[ge,tr]collapse
   * Fixed bug in ATLAS/tune/blas/ger/CASES/ATL_zgerk_1x4_sse3.c
   * Renamed xatlas_install -> xatlas_build, to get around Windows 7
     "security-through-stupidity" misfeature
ATLAS 3.9.16 released 10/17/09 (bugfix release), changes from 3.9.15
   * Fixed bugs in mmksearch_sse.c for machines w/o SSE3
   * Fixed errors in C2F preventing full lapack install
   * Fixed error in atlas_install trying to open wrong filename in latune
   * Fixed error in mmsearch's FindNoCopyNB where latency computed incorrectly
   * Numerous errors related to new architectural default handling
   * New architectural defaults for:
     - AMD64K10h64SSE3
     - Core264SSE3
     - Corei764SSE3
ATLAS 3.9.15 released 10/10/09, changes from 3.9.14
   * Addition of Chad Zalkin's SSE GEMM generator to ATLAS
   * Support for external searches and use of standard matmul search routs in:
     - include/atlas_mmparse.h
     - include/atlas_mmtesttime.h
   * Numerous search changes to incorporate above in ATLAS matmul install
     - Changed matmul install to be much quieter
ATLAS 3.9.14 released 08/19/09 (bugfix release), changes from 3.9.13
   * Fixed complex indexing errors in ATL_ger.c & ATL_zgerk_1x4_sse3.c
   * Fixed error in config.c where using LAPACK caused OpenMP to be built
   * Made it so C2F LAPACK interface only built if F77 LAPACK is provided
   * Basic --shared install now works (tested Linux build only)
ATLAS 3.9.13 released 08/17/09 (bugfix release), changes from 3.9.12
   * Fixed ATL_smm14x1x84_sseCU.c so it won't be used when NB > 84
     - fixed AMD64 arch def not to use it
   * Fixed 1-character memory overwrite in atlas_genparse.h's DupString
   * Added prototype to r1ktest.c
   * 3.9.12 showed version of 3.9.11; this version shows correct 3.9.13
ATLAS 3.9.12 released 08/06/09, changes from 3.9.11
   * Complete rewrite of GER, SYR/HER and SYR2/HER2:
     - New tuning mechinism tunes GER for in-L1, in-L2, and out-of-cache
       * Call ATL_<pre>ger_L1 if data known to be in L1 cache
       * Call ATL_<pre>ger_L2 if data known to be in L2 cache
     - Most architectures now lack GER arch defs
       * Provided GER archdefs 64-bit K10h and Core2
     - atlas_devel not yet updated
   * Relatively untested standard timing/tester code available for all
     tuned kernels (GER fairly well tested)
     - atlas_[mv,r1,mm]parse.h reads standard input/output files
     - atlas_[mv,r1,mm]testtime.h provides tester/timer calls for kernels
   * Can compile both lapack 3.2 and 3.1 with --with-netlib-lapack-tarfile
     - Removed support for other ways of building lapack
     - atlas_install mostly updated
   * Bug fixes
     - Fixed BETA=0 SCAL NaN-propogation bug (no more call to ATL_set)
     - Fixed C/Z GEMM JITcp bug where C was read when BETA=0
     - Fixed threaded LAPACK calling serial ilaenv  (QR speedup)
ATLAS 3.9.11 released 04/07/09, changes from 3.9.10
   * Added flags -Si [omp,antthr] 0/1/2 to allow ease of building ATLAS
     with alternative threading implementations
   * Fixed prototypes in atlas_f77wrap.h so that all thread interfaces
     are properly prototyped when they are selected by the above flags
   * Fixed missing TRMM prototype in atlas_tlvl3.h that caused STRSM
     to fail tests in xsl3blastst_pt
ATLAS 3.9.10 released 03/11/09, changes from 3.9.9
   * Rewrote tgemm's combine routine to work on arbitrary partitionings
     combined in arbitrary orders (necessary for non-power-of-2 processors)
     - Restricted fix for SYRK (not general, as it isn't needed yet)
   * Fixed bug in EnforceNonPwr2LO caused by failure to rename moved
     structure in the Cinfp array
   * Fixed makefile problem that caused ATLAS to re-archive the L3BLAS for
     every tester compile
   * On windows, added -lkernel32 to LIBS macro to enable shared lib build
ATLAS 3.9.9 released 02/26/09, changes from 3.9.8
   * Fixed bug in Xtsyrk's ATL_tsyrkdecomp_K, both on when the algorithm
     is used, and correctness for when K is not large enough to give all
     processors NB of work.
   * Fixed bug in lanbtst, where single precision (S/C) used double values
     rather than single values when determining workspace requirements
   * Changed atlas_install to have a final library build phase
     - Was not rebuilding lib after post-build tuning
       -> Caused lapack and poss other files to be untuned unless user rebuilds
          by invoking tester/timer for each subpiece
       -> Caused dynamic libs to be built from badly tuned libs
   * Added missing lapack arch defs for Corei764 and MIPSICE9
ATLAS 3.9.8 released 02/23/09, changes from 3.9.7
   * Fixed bug in ATL_Xtgemm where ATL_thrdecompMM failed to return the
     number of processors on non-power-of-2 processor systems
   * Fixed bug in ATL_tsyrk where I was calling the K-splitting routine
     when the required workspace was large, rather than when it was small.
   * Fixed analagous problem in ATL_tsyrk as the 3.9.7 did for ATL_tgemm;
     however, tsyrk bug could not have been exercised by current decomposition.
   * Introduced some fixes & workarounds for SiCortex/MIPSICE9:
     - Changed default MIPSICE9 compiler back to gcc, since pathcc produces
       bad ATL_tsyrk when optimization is above -O1 (confirmed compiler error)
   * Added dependence on atlas_ptalias3.h in cblas interface Makefile.
ATLAS 3.9.7 released 02/20/09, changes from 3.9.6:
   * Fixed bug in ATL_tgemm that caused seg faults for some small-M tGEMMs
   * Added architectural defaults for K7323DNow (Athlon "classic")
ATLAS 3.9.6 released 02/01/09, Changes from 3.9.5:
   * Made it so LAPACK is tuned specifically for threading as well as for serial
     - Added threaded lapack arch defs for:
       + Core264SSE3, P4E64SSE3, Corei764SSE3
   * Made it so LAPACK NB-tuning is mu/nu aware
   * MIPSICE9 (sicortex) improvements:
     - added pathcc arch defs
     - updated gcc arch defs to better values
     --> Still getting errors on this platform
   * Some bug fixes:
     - Detect model 29 as Core2
     - Rewrote ptFlushAreasByCL to use new thread framework
     - Fixed handling of non-power-of-2 number of threads
     - Better dependencies for building ilaenv
ATLAS 3.9.5 released 12/11/08, Changes from 3.9.4:
   * Complete rewrite of ATLAS threading system:
     - Now supports native windows threads in addition to pthreads
     - Use of master-last and affinity increases threaded performance, with
       an advantage that grows with P (almost no advantage for P=2, but for
       instance LU is more than 60% faster asymptotically on a P=8 Core2)
       + OS X and FreeBSD don't support processor affinity, and so their
         performance is still bad
     - Cacheedge specifically tuned for threading (another 5%)
   * Changed emit_buildinfo so that it replaces all control characters with
     spaces (prevents errors under windows).
   * Added dependency info for ATL_ilaenv so that it is recompiled once
     lapack tuning is complete
   * Fixed error in configure where it issues commands in wrong directory
     when the user builds lapack directly from a tarfile
   * Fixed typos in config.c where I used 'comp' rather than 'comps'.
   * Added mmtime_pt.c, which can allow us to find kernels that do well
     in parallel operation.
   * Various small configure fixes for windows
ATLAS 3.9.4 released 09/06/08, Changes from 3.9.3:
   * Improved Windows/cygwin configure with addition of archinfo_win.c
   * Added basic support for Windows/interix
     - Did not pursue much due to widespread seg fault in gcc, hundreds of
       hard-to-get "hot fixes", and ancient gnu tools that can't assemble SSE3
   * Removed special "no-need-to-copy" cases from ATLmm_JIK/IJK.c, since they
     occasionally seem to cause large performance drops.
   * Changed it so JIK matmul always called for rank-K update, in order to
     reduce access costs on C.
   * Fixed several errors in ATLAS's ILAENV.
   * Fixed several errors in configure
   * Fixed error when -Ss lasrc is given as relative rather than absolute path
   * Added BETA support for auto-building shared/dynamic libraries when the
     user passes --shared to configure (no need to explicitly set compiler
     flags [eg., -fPIC] for any of the known compilers):
     - Not fully tested, but appears to work for Windows, OS X and Linux
     - Now referenced in make install, but present process is crude
     - with --nof77, get clapack reather than lapack; eventually probably want
       a logical link of lapack




--
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Reply via email to