Wednesday, April 16, 2008

MM5 on Mac Intel (OpenMP)

Although this is my WRF on Mac blog, I'm also still using MM5 for some projects, and need to run it on my Macs. Recently, I tried to use it on my 8 core Mac Pro, using Intel Fortran 10 and OpenMP. All this is done on a HFSX (case-sensitive) volume. So far, it appears to be running well, with some caveats. One is that the build is 32 bit. My attempts at 64 bit builds, even after invoking flags like -DDEC_ALPHA and -DSGI_IA64, found referenced on the web, were successful but segfault immediately on execution.

Another is that the model actually takes or requires one more thread than is requested using the OMP_NUM_THREADS specification. Thus, if I request 6 threads, it actually spawns and apparently uses 7. Additionally, despite having 8 cores, I cannot request more than 6 and have the model run successfully. It should not need more than 7 in that instance, and thus should run. I'm not sure why it does not.

Getting MM5 compiled on the Mac Pro was easy once a change was made to the way suffixes are handled in configure.user. Intel Fortran complained about #define type statements in the code that also had comments attached beyond column 72. The workaround was to force each Fortran program file through the CPP preprocessor. That was done by removing the rule at the bottom of configure.user that looks like this:

.F.o:
$(RM) $@
$(FC) -c $(FCFLAGS) $*.F

This rule compiles files ending with .F without involving the CPP preprocessor. Removing it causes other rules already in place to first push .F files through the preprocessor, and then compile the .f files that result from that operation. Remember, this has to be done on a case-sensitive volume.

I also had to manually add #include to Util/parseconfig.c to avoid a compilation error. The portion of configure.user that concerns compilation looks like this:


RUNTIME_SYSTEM = "macintel"
FC = ifort
FCFLAGS = -I$(LIBINCLUDE) -pc32 -O3 -convert big_endian -fp-model precise -openmp -fpp -allow fpp-comments -auto -traceback #-DDEC_ALPHA
CPP = /usr/bin/cpp
CFLAGS = -O #-DSGI_IA64
CPPFLAGS = -I$(LIBINCLUDE) -I. -C -P -traditional -xassembler-with-cpp
LDOPTIONS = $(FCFLAGS) -Wl,-stack_size -Wl,0x20000000 -Wl,-stack_addr -Wl,0xd0000000


Here is a sample timing plot for a triply nested 24 hour simulation, revealing results I am not unhappy with:

No comments: