Monday, June 16, 2008

WRFV300 update: two items

(1) In WRFV300 on the Mac, using the Intel Fortran 10.1 compiler, the compiler flag –switch fe_use_rtl_copy_arg_inout causes problems that weren't apparent with earlier WRF versions. Specifically, lateral boundary tendencies for water vapor are zeroed, resulting in problems in this field that grow inward from the boundaries. In the wrfbdy_d01 file, fields like QVAPOR_BXE and related entries possess only zeroes. This file is, of course, created by real.exe.

I added this switch owing to this page. Removing it does no harm. I tried several minor versions of ifort 10.1, explored 32 and 64 bit, etc.; nothing helped fix this issue except removing this switch.

(2) In WRFV300, the Lin microphysics scheme (mp_physics = 2) has issues -- noise issues. After only a few tens of minutes, the pressure field starts displaying small scale noise. Decreasing the time step, implementing 6th order smoothing, and upping the small time step smoothing (epssm) do not help. For the simulations I am examining, the noise does not appear with any other microphysics scheme. With Lin, the noise appears in higher but not in lower resolution simulations.

The Lin code was altered between WRFV221 and WRV300. I have not isolated the code changes that cause this problem.

Thursday, May 1, 2008

Playing with WRF3 (WRFV300, WPSV300) - part II

Some updates: In the post Playing with WRF3 (WRFV300, WPSV300) - part I below, I mentioned that WRFV300's OpenMP was broken on Mac and Linux, using ifort and pgf. This problem is solved. OpenMP isn't working in the stock configuration because the $(OMP) declaration was not defined in FCFLAGS. It was declared in LDFLAGS, so the model seemed to go through the motions of spawning the requested number of threads. But, its absence from FCFLAGS means that OpenMP support was not actually complied in the code.

The remedy for this is to edit arch/postamble_new and insert $(OMP) on the FCFLAGS line.

I also have WPSV300 working on my Intel Mac, because I did manage to get a 64 bit version of NCAR Graphics compiled and running. I will update this post to describe how this was done in the future.

[EDIT 10 June 2008: Some additional information..
* My 64 bit NCAR Graphics binaries built on my MacPro are
here. This is an 18MB gzipped tar file. It does not include all of NCL, just NCAR Graphics..

* For this build of NCAR Graphics, you need to include the -ncarbd flag with ncargf77, ncargf90 or ncargex. Or, when compiling with ifort, you need to link in /usr/local/ncarg64/lib/ncarg/robj/libncarbd.o

* With this NCAR Graphics, I had no trouble building WPSV300 and RIP4 64bit on the MacPro.

* Here are my build notes for NCAR Graphics 64 bit on the MacPro, using ifort. Of course, you need to change locations, etc..

setenv NCARG /case/ncl_ncarg-5.0.0
setenv NCARG_ROOT /case/ncl_ncarg-5.0.0
setenv NCARG_CONFIG /case/ncl_ncarg-5.0.0/config

cd config
./ymake
./Configure
..then, as root
mkdir /usr/local/ncarg64


edited config/Darwin_Intel and ran Configure again

ulimit -s 65530
compile as root, so source ifortvars.sh

had been getting segfaults with fontcap. Resolved by editing yMakefile in /case/ncl_ncarg-5.0.0/common/src/libncarg_c/yMakefile for Darwin section to make it like Linux x86_64 section. That is,

#elif defined(Darwin) && defined(i386)
EXCSRCS = bcopyswap.c logic32.c
EXFSRCS = gbytes.f sbytes.f
EXOBJS = sbytes.o gbytes.o bcopyswap.o logic32.o

Wednesday, April 16, 2008

WRFV300 on Mac G5 (XLF compiler)

Here are alterations to the stock WRF version 3.0.0 release I had to get it working on Mac/G5 equipment. This was tested on Macs using Tiger. Everything is 32 bit. The diffwrf program does not compile, but can probably be made to work fairly easily.

(1) Here are replacement files for the arch directory: configure_new.defaults, noopt_exceptions_f, and postamble_new

(2) Replacement file for the external/RSL_LITE directory: makefile

(3) Replacement file for the frame directory: frame.Makefile (rename as Makefile)

(4) Replacement file for the dyn_em directory: module_first_rk_step_part1.F

The file module_first_rk_step_part1.F is altered to remove unnecessary whitespace in the subroutine call to surface_driver. As it is, the number of characters in that call exceeds the allowable maximum for xlf on the Mac.

This image shows performance for the two-domain June 2001 test case on a G5 cluster (mpich1, no fancy interconnects whatsoever), a 12 hour run on the number of cpus indicated. The real times are the relevant numbers.


MM5 on Mac Intel (OpenMP)

Although this is my WRF on Mac blog, I'm also still using MM5 for some projects, and need to run it on my Macs. Recently, I tried to use it on my 8 core Mac Pro, using Intel Fortran 10 and OpenMP. All this is done on a HFSX (case-sensitive) volume. So far, it appears to be running well, with some caveats. One is that the build is 32 bit. My attempts at 64 bit builds, even after invoking flags like -DDEC_ALPHA and -DSGI_IA64, found referenced on the web, were successful but segfault immediately on execution.

Another is that the model actually takes or requires one more thread than is requested using the OMP_NUM_THREADS specification. Thus, if I request 6 threads, it actually spawns and apparently uses 7. Additionally, despite having 8 cores, I cannot request more than 6 and have the model run successfully. It should not need more than 7 in that instance, and thus should run. I'm not sure why it does not.

Getting MM5 compiled on the Mac Pro was easy once a change was made to the way suffixes are handled in configure.user. Intel Fortran complained about #define type statements in the code that also had comments attached beyond column 72. The workaround was to force each Fortran program file through the CPP preprocessor. That was done by removing the rule at the bottom of configure.user that looks like this:

.F.o:
$(RM) $@
$(FC) -c $(FCFLAGS) $*.F

This rule compiles files ending with .F without involving the CPP preprocessor. Removing it causes other rules already in place to first push .F files through the preprocessor, and then compile the .f files that result from that operation. Remember, this has to be done on a case-sensitive volume.

I also had to manually add #include to Util/parseconfig.c to avoid a compilation error. The portion of configure.user that concerns compilation looks like this:


RUNTIME_SYSTEM = "macintel"
FC = ifort
FCFLAGS = -I$(LIBINCLUDE) -pc32 -O3 -convert big_endian -fp-model precise -openmp -fpp -allow fpp-comments -auto -traceback #-DDEC_ALPHA
CPP = /usr/bin/cpp
CFLAGS = -O #-DSGI_IA64
CPPFLAGS = -I$(LIBINCLUDE) -I. -C -P -traditional -xassembler-with-cpp
LDOPTIONS = $(FCFLAGS) -Wl,-stack_size -Wl,0x20000000 -Wl,-stack_addr -Wl,0xd0000000


Here is a sample timing plot for a triply nested 24 hour simulation, revealing results I am not unhappy with:

Saturday, April 5, 2008

Playing with WRF3 (WRFV300, WPSV300) - part I

WRF v.3.0.0 was released yesterday, and I decided to see if it would be any easier to get it to run well on my Intel-based Mac Pro, with the latest ifort 10 compiler. I also tested it on two Linux machines, one with the same chip type and ifort version, the other with older 32 bit hardware and software (ifort 9). I also built WRF with PGI Fortran on the older machine, for a total of four different combinations of hardware and software (one Mac, three Linux; three ifort, one PGI).

The good news is the configuration system is a lot more slick. The bad news for the Mac Pro is the executables would not build with the stock configure.wrf obtained for ifort and OMP. A few alterations were necessary on my Mac:

(1) I'm using 64 bit compilers, so I needed to add "-m64" to CFLAGS_LOCAL.

(2) To get rid of missing symbols (that didn't draw complaints on either Intel Linux build), I had to add "$(WRF_SRC_ROOT_DIR)/frame/module_domain_type.o $(WRF_SRC_ROOT_DIR)/external/io_grib2/grib2tbls_types.o" to LDFLAGS_LOCAL.

Then, the weirdness started. On the two Core Quad machines (Mac and Linux), the OMP ifort build doesn't spawn more than one thread or capture more than 100% according to top and Activity Monitor, despite the setting for OMP_NUM_THREADS. It seems OMP is broken, and it's not platform-specific.

WRF3 is not backward compatible with respect to input files, and this makes upgrading WPS a must. But the new version of WPS' geogrid program isn't working anywhere. It spits out a segfault (at the same place) even when trying to build the example domain. Same experience with ifort and PGI, on the two Linux machines. I verified that the WPS_GEOG data have not changed, so it's not that.

No Mac WPS yet since I don't have, and have been unable as of yet to successfully build, NCL for 64 bit on the Mac. I tried to create parallel 64 and 32 bit versions of WRF and WPS (and all other necessary software) to circumvent this, but decided not to continue since the new WPS is not working yet on the Linux boxes.

Wednesday, March 26, 2008

WRFV221 on dual quad Mac Pro

Here are my recent trials of WRF v.2.2.1 on a dual-quad Mac Pro with Intel Fortran (10.1.012) and gcc. The 64 bit version of the compilers are used. The machine has 8 GB of memory. Trials include OMP and MPI versions, the latter based on mpich2.

--------------------------------------------------------------
netcdf compilation for 64 bit (based on netcdf-3.6.0-p1)
--------------------------------------------------------------

export CC=/usr/bin/gcc
export CPPFLAGS="-O -DNDEBUG -DpgiFortran"
export CFLAGS="-O -m64"
export CXX=/usr/bin/c++
export CXXFLAGS="-O -m64"

export FC=ifort
export F77=ifort
export F90=ifort
export FFLAGS="-O3 -mp"
export F90FLAGS="-O3 -mp"


./configure --prefix=/usr/local/netcdf
make
make test
sudo mkdir /usr/local/netcdf
sudo make install


--------------------------------------------------------------
jasper compilation for 64 bit (based on jasper-1.701.0)
--------------------------------------------------------------

setenv CC /usr/bin/gcc
setenv CFLAGS "-O -m64"
setenv CXX /usr/bin/c++
setenv CXXFLAGS "-O -m64"

./configure --prefix=/usr/local/jasper
make
sudo make install


--------------------------------------------------------------
mpich2 compilation for 64 bit (based on mpich2-1.0.5)
--------------------------------------------------------------

setenv FC ifort
setenv F90 ifort
setenv CC "gcc -m64"
setenv RSHCOMMAND "/usr/bin/ssh"
setenv CXX "/usr/bin/c++ -m64"
setenv FFLAGS "-xP -vec- -fp-model precise"
setenv F90FLAGS "-xP -vec- -fp-model precise"
./configure --with-comm=shared


--------------------------------------------------------------
test run
--------------------------------------------------------------

The test run is a short simulation with three telescoping, two-way domains (142x100, 100x100 and 100x100, with 35 vertical levels). Flags were chosen to reproduce output file cksums of a completely unoptimized simulation requesting strict arithmetic. The OMP version occasionally produces different results, apparently randomly.

Timing plot:


The plot below adds results from a single 2.4GHz quad-core machine, running Mandriva Linux, for OMP runs built using the same configuration file as linked below (save Mac-specific references removed). For the four thread run, the 2.4 GHz run is 33% slower, though the clock speed difference is only 17%. Checksums were the same for all the runs.



Configuration files are here: OMP version, MPI version.

[edited 1 April 2008 to include the compilation for netcdf, jasper and mpich2 and to clarify this is for 64 bit compilers.]