Wednesday, March 26, 2008

WRFV221 on dual quad Mac Pro

Here are my recent trials of WRF v.2.2.1 on a dual-quad Mac Pro with Intel Fortran (10.1.012) and gcc. The 64 bit version of the compilers are used. The machine has 8 GB of memory. Trials include OMP and MPI versions, the latter based on mpich2.

--------------------------------------------------------------
netcdf compilation for 64 bit (based on netcdf-3.6.0-p1)
--------------------------------------------------------------

export CC=/usr/bin/gcc
export CPPFLAGS="-O -DNDEBUG -DpgiFortran"
export CFLAGS="-O -m64"
export CXX=/usr/bin/c++
export CXXFLAGS="-O -m64"

export FC=ifort
export F77=ifort
export F90=ifort
export FFLAGS="-O3 -mp"
export F90FLAGS="-O3 -mp"


./configure --prefix=/usr/local/netcdf
make
make test
sudo mkdir /usr/local/netcdf
sudo make install


--------------------------------------------------------------
jasper compilation for 64 bit (based on jasper-1.701.0)
--------------------------------------------------------------

setenv CC /usr/bin/gcc
setenv CFLAGS "-O -m64"
setenv CXX /usr/bin/c++
setenv CXXFLAGS "-O -m64"

./configure --prefix=/usr/local/jasper
make
sudo make install


--------------------------------------------------------------
mpich2 compilation for 64 bit (based on mpich2-1.0.5)
--------------------------------------------------------------

setenv FC ifort
setenv F90 ifort
setenv CC "gcc -m64"
setenv RSHCOMMAND "/usr/bin/ssh"
setenv CXX "/usr/bin/c++ -m64"
setenv FFLAGS "-xP -vec- -fp-model precise"
setenv F90FLAGS "-xP -vec- -fp-model precise"
./configure --with-comm=shared


--------------------------------------------------------------
test run
--------------------------------------------------------------

The test run is a short simulation with three telescoping, two-way domains (142x100, 100x100 and 100x100, with 35 vertical levels). Flags were chosen to reproduce output file cksums of a completely unoptimized simulation requesting strict arithmetic. The OMP version occasionally produces different results, apparently randomly.

Timing plot:


The plot below adds results from a single 2.4GHz quad-core machine, running Mandriva Linux, for OMP runs built using the same configuration file as linked below (save Mac-specific references removed). For the four thread run, the 2.4 GHz run is 33% slower, though the clock speed difference is only 17%. Checksums were the same for all the runs.



Configuration files are here: OMP version, MPI version.

[edited 1 April 2008 to include the compilation for netcdf, jasper and mpich2 and to clarify this is for 64 bit compilers.]

No comments: