Wednesday, April 16, 2008

WRFV300 on Mac G5 (XLF compiler)

Here are alterations to the stock WRF version 3.0.0 release I had to get it working on Mac/G5 equipment. This was tested on Macs using Tiger. Everything is 32 bit. The diffwrf program does not compile, but can probably be made to work fairly easily.

(1) Here are replacement files for the arch directory: configure_new.defaults, noopt_exceptions_f, and postamble_new

(2) Replacement file for the external/RSL_LITE directory: makefile

(3) Replacement file for the frame directory: frame.Makefile (rename as Makefile)

(4) Replacement file for the dyn_em directory: module_first_rk_step_part1.F

The file module_first_rk_step_part1.F is altered to remove unnecessary whitespace in the subroutine call to surface_driver. As it is, the number of characters in that call exceeds the allowable maximum for xlf on the Mac.

This image shows performance for the two-domain June 2001 test case on a G5 cluster (mpich1, no fancy interconnects whatsoever), a 12 hour run on the number of cpus indicated. The real times are the relevant numbers.


MM5 on Mac Intel (OpenMP)

Although this is my WRF on Mac blog, I'm also still using MM5 for some projects, and need to run it on my Macs. Recently, I tried to use it on my 8 core Mac Pro, using Intel Fortran 10 and OpenMP. All this is done on a HFSX (case-sensitive) volume. So far, it appears to be running well, with some caveats. One is that the build is 32 bit. My attempts at 64 bit builds, even after invoking flags like -DDEC_ALPHA and -DSGI_IA64, found referenced on the web, were successful but segfault immediately on execution.

Another is that the model actually takes or requires one more thread than is requested using the OMP_NUM_THREADS specification. Thus, if I request 6 threads, it actually spawns and apparently uses 7. Additionally, despite having 8 cores, I cannot request more than 6 and have the model run successfully. It should not need more than 7 in that instance, and thus should run. I'm not sure why it does not.

Getting MM5 compiled on the Mac Pro was easy once a change was made to the way suffixes are handled in configure.user. Intel Fortran complained about #define type statements in the code that also had comments attached beyond column 72. The workaround was to force each Fortran program file through the CPP preprocessor. That was done by removing the rule at the bottom of configure.user that looks like this:

.F.o:
$(RM) $@
$(FC) -c $(FCFLAGS) $*.F

This rule compiles files ending with .F without involving the CPP preprocessor. Removing it causes other rules already in place to first push .F files through the preprocessor, and then compile the .f files that result from that operation. Remember, this has to be done on a case-sensitive volume.

I also had to manually add #include to Util/parseconfig.c to avoid a compilation error. The portion of configure.user that concerns compilation looks like this:


RUNTIME_SYSTEM = "macintel"
FC = ifort
FCFLAGS = -I$(LIBINCLUDE) -pc32 -O3 -convert big_endian -fp-model precise -openmp -fpp -allow fpp-comments -auto -traceback #-DDEC_ALPHA
CPP = /usr/bin/cpp
CFLAGS = -O #-DSGI_IA64
CPPFLAGS = -I$(LIBINCLUDE) -I. -C -P -traditional -xassembler-with-cpp
LDOPTIONS = $(FCFLAGS) -Wl,-stack_size -Wl,0x20000000 -Wl,-stack_addr -Wl,0xd0000000


Here is a sample timing plot for a triply nested 24 hour simulation, revealing results I am not unhappy with:

Saturday, April 5, 2008

Playing with WRF3 (WRFV300, WPSV300) - part I

WRF v.3.0.0 was released yesterday, and I decided to see if it would be any easier to get it to run well on my Intel-based Mac Pro, with the latest ifort 10 compiler. I also tested it on two Linux machines, one with the same chip type and ifort version, the other with older 32 bit hardware and software (ifort 9). I also built WRF with PGI Fortran on the older machine, for a total of four different combinations of hardware and software (one Mac, three Linux; three ifort, one PGI).

The good news is the configuration system is a lot more slick. The bad news for the Mac Pro is the executables would not build with the stock configure.wrf obtained for ifort and OMP. A few alterations were necessary on my Mac:

(1) I'm using 64 bit compilers, so I needed to add "-m64" to CFLAGS_LOCAL.

(2) To get rid of missing symbols (that didn't draw complaints on either Intel Linux build), I had to add "$(WRF_SRC_ROOT_DIR)/frame/module_domain_type.o $(WRF_SRC_ROOT_DIR)/external/io_grib2/grib2tbls_types.o" to LDFLAGS_LOCAL.

Then, the weirdness started. On the two Core Quad machines (Mac and Linux), the OMP ifort build doesn't spawn more than one thread or capture more than 100% according to top and Activity Monitor, despite the setting for OMP_NUM_THREADS. It seems OMP is broken, and it's not platform-specific.

WRF3 is not backward compatible with respect to input files, and this makes upgrading WPS a must. But the new version of WPS' geogrid program isn't working anywhere. It spits out a segfault (at the same place) even when trying to build the example domain. Same experience with ifort and PGI, on the two Linux machines. I verified that the WPS_GEOG data have not changed, so it's not that.

No Mac WPS yet since I don't have, and have been unable as of yet to successfully build, NCL for 64 bit on the Mac. I tried to create parallel 64 and 32 bit versions of WRF and WPS (and all other necessary software) to circumvent this, but decided not to continue since the new WPS is not working yet on the Linux boxes.