Sunday, June 27, 2010

Intel 11.1, OS X 10.6.4, Xcode 3.2.2, and WRF seg faults

This post was kindly provided by Dr. Lou Wicker of the National Severe Storms Laboratory. Though I have updated to WRFv3.2, I'm still on ifort 11.0, Xcode 3.0 and Mac OS X 10.5.8.

===================================

When compiling WRF 3.2.1 [3.2 ?] using Intel 11.1 (v088) on Mac OS X 10.6.4, we recently kept having segmentation faults during runtime. I managed to track down the problem to memory allocated in Thompson microphysics, where some arrays were allocated (successfully, I checked the status flag), and then a bad address to memory to the first element in the array in a nested do-loop caused the segmentation fault. I "fixed" the Thompson code with static allocations of these arrays for a test (they were small), and then the segmentation fault occured in the solver_em routine.

I then compiled WRF with a beta version of Intel v12 compiler (which I am testing), and this behavior does NOT occur.

I posted to Intel Premier, which I have to say is pretty responsive to the 2-3 issues I have ever posted. The got back to me within a few hours. They pointed me to this 9 May 2010 note:

http://software.intel.com/en-us/articles/intel-fortran-for-mac-os-x-incompatible-with-xcode-322/

Which basically says that Apple, in version 3.2.2 of Xcode changed the loader in some way which is incompatible with Intel v11.1 which causes these segmentation faults. Their simple answer is to either remain at Xcode v 3.2.1, although they say the compiler flag "-use-asm" might fix the problem. In any event, this might save people a lot of hours of time and frustration. BTW, the WRF code ran 10-20 time steps using WSM5 with no problem using the "bad" Xcode version.

Hope that is helpful to all the WRF'ers.

Lou

Monday, December 21, 2009

WRFV311 on Snow Leopard

I tried compiling WRF on a Snow Leopard (SL) Mac, and found I had to make a few adjustments to my configure.wrf files. First of all, ifort version 11 appears to be needed. I also needed to revert to the (now) older version of the gcc compiler. Change "gcc" or "cc" on the SCC and CC_TOOLS lines to "gcc-4.0". I'm not sure if this is ifort's problem or mine, but at least it appears to be resolved.

The issue this resolved was a segfault in executing tools/registry for the creation of the module_state_description.F file.

Monday, June 16, 2008

WRFV300 update: two items

(1) In WRFV300 on the Mac, using the Intel Fortran 10.1 compiler, the compiler flag –switch fe_use_rtl_copy_arg_inout causes problems that weren't apparent with earlier WRF versions. Specifically, lateral boundary tendencies for water vapor are zeroed, resulting in problems in this field that grow inward from the boundaries. In the wrfbdy_d01 file, fields like QVAPOR_BXE and related entries possess only zeroes. This file is, of course, created by real.exe.

I added this switch owing to this page. Removing it does no harm. I tried several minor versions of ifort 10.1, explored 32 and 64 bit, etc.; nothing helped fix this issue except removing this switch.

(2) In WRFV300, the Lin microphysics scheme (mp_physics = 2) has issues -- noise issues. After only a few tens of minutes, the pressure field starts displaying small scale noise. Decreasing the time step, implementing 6th order smoothing, and upping the small time step smoothing (epssm) do not help. For the simulations I am examining, the noise does not appear with any other microphysics scheme. With Lin, the noise appears in higher but not in lower resolution simulations.

The Lin code was altered between WRFV221 and WRV300. I have not isolated the code changes that cause this problem.

Thursday, May 1, 2008

Playing with WRF3 (WRFV300, WPSV300) - part II

Some updates: In the post Playing with WRF3 (WRFV300, WPSV300) - part I below, I mentioned that WRFV300's OpenMP was broken on Mac and Linux, using ifort and pgf. This problem is solved. OpenMP isn't working in the stock configuration because the $(OMP) declaration was not defined in FCFLAGS. It was declared in LDFLAGS, so the model seemed to go through the motions of spawning the requested number of threads. But, its absence from FCFLAGS means that OpenMP support was not actually complied in the code.

The remedy for this is to edit arch/postamble_new and insert $(OMP) on the FCFLAGS line.

I also have WPSV300 working on my Intel Mac, because I did manage to get a 64 bit version of NCAR Graphics compiled and running. I will update this post to describe how this was done in the future.

[EDIT 10 June 2008: Some additional information..
* My 64 bit NCAR Graphics binaries built on my MacPro are
here. This is an 18MB gzipped tar file. It does not include all of NCL, just NCAR Graphics..

* For this build of NCAR Graphics, you need to include the -ncarbd flag with ncargf77, ncargf90 or ncargex. Or, when compiling with ifort, you need to link in /usr/local/ncarg64/lib/ncarg/robj/libncarbd.o

* With this NCAR Graphics, I had no trouble building WPSV300 and RIP4 64bit on the MacPro.

* Here are my build notes for NCAR Graphics 64 bit on the MacPro, using ifort. Of course, you need to change locations, etc..

setenv NCARG /case/ncl_ncarg-5.0.0
setenv NCARG_ROOT /case/ncl_ncarg-5.0.0
setenv NCARG_CONFIG /case/ncl_ncarg-5.0.0/config

cd config
./ymake
./Configure
..then, as root
mkdir /usr/local/ncarg64


edited config/Darwin_Intel and ran Configure again

ulimit -s 65530
compile as root, so source ifortvars.sh

had been getting segfaults with fontcap. Resolved by editing yMakefile in /case/ncl_ncarg-5.0.0/common/src/libncarg_c/yMakefile for Darwin section to make it like Linux x86_64 section. That is,

#elif defined(Darwin) && defined(i386)
EXCSRCS = bcopyswap.c logic32.c
EXFSRCS = gbytes.f sbytes.f
EXOBJS = sbytes.o gbytes.o bcopyswap.o logic32.o

Wednesday, April 16, 2008

WRFV300 on Mac G5 (XLF compiler)

Here are alterations to the stock WRF version 3.0.0 release I had to get it working on Mac/G5 equipment. This was tested on Macs using Tiger. Everything is 32 bit. The diffwrf program does not compile, but can probably be made to work fairly easily.

(1) Here are replacement files for the arch directory: configure_new.defaults, noopt_exceptions_f, and postamble_new

(2) Replacement file for the external/RSL_LITE directory: makefile

(3) Replacement file for the frame directory: frame.Makefile (rename as Makefile)

(4) Replacement file for the dyn_em directory: module_first_rk_step_part1.F

The file module_first_rk_step_part1.F is altered to remove unnecessary whitespace in the subroutine call to surface_driver. As it is, the number of characters in that call exceeds the allowable maximum for xlf on the Mac.

This image shows performance for the two-domain June 2001 test case on a G5 cluster (mpich1, no fancy interconnects whatsoever), a 12 hour run on the number of cpus indicated. The real times are the relevant numbers.


MM5 on Mac Intel (OpenMP)

Although this is my WRF on Mac blog, I'm also still using MM5 for some projects, and need to run it on my Macs. Recently, I tried to use it on my 8 core Mac Pro, using Intel Fortran 10 and OpenMP. All this is done on a HFSX (case-sensitive) volume. So far, it appears to be running well, with some caveats. One is that the build is 32 bit. My attempts at 64 bit builds, even after invoking flags like -DDEC_ALPHA and -DSGI_IA64, found referenced on the web, were successful but segfault immediately on execution.

Another is that the model actually takes or requires one more thread than is requested using the OMP_NUM_THREADS specification. Thus, if I request 6 threads, it actually spawns and apparently uses 7. Additionally, despite having 8 cores, I cannot request more than 6 and have the model run successfully. It should not need more than 7 in that instance, and thus should run. I'm not sure why it does not.

Getting MM5 compiled on the Mac Pro was easy once a change was made to the way suffixes are handled in configure.user. Intel Fortran complained about #define type statements in the code that also had comments attached beyond column 72. The workaround was to force each Fortran program file through the CPP preprocessor. That was done by removing the rule at the bottom of configure.user that looks like this:

.F.o:
$(RM) $@
$(FC) -c $(FCFLAGS) $*.F

This rule compiles files ending with .F without involving the CPP preprocessor. Removing it causes other rules already in place to first push .F files through the preprocessor, and then compile the .f files that result from that operation. Remember, this has to be done on a case-sensitive volume.

I also had to manually add #include to Util/parseconfig.c to avoid a compilation error. The portion of configure.user that concerns compilation looks like this:


RUNTIME_SYSTEM = "macintel"
FC = ifort
FCFLAGS = -I$(LIBINCLUDE) -pc32 -O3 -convert big_endian -fp-model precise -openmp -fpp -allow fpp-comments -auto -traceback #-DDEC_ALPHA
CPP = /usr/bin/cpp
CFLAGS = -O #-DSGI_IA64
CPPFLAGS = -I$(LIBINCLUDE) -I. -C -P -traditional -xassembler-with-cpp
LDOPTIONS = $(FCFLAGS) -Wl,-stack_size -Wl,0x20000000 -Wl,-stack_addr -Wl,0xd0000000


Here is a sample timing plot for a triply nested 24 hour simulation, revealing results I am not unhappy with:

Saturday, April 5, 2008

Playing with WRF3 (WRFV300, WPSV300) - part I

WRF v.3.0.0 was released yesterday, and I decided to see if it would be any easier to get it to run well on my Intel-based Mac Pro, with the latest ifort 10 compiler. I also tested it on two Linux machines, one with the same chip type and ifort version, the other with older 32 bit hardware and software (ifort 9). I also built WRF with PGI Fortran on the older machine, for a total of four different combinations of hardware and software (one Mac, three Linux; three ifort, one PGI).

The good news is the configuration system is a lot more slick. The bad news for the Mac Pro is the executables would not build with the stock configure.wrf obtained for ifort and OMP. A few alterations were necessary on my Mac:

(1) I'm using 64 bit compilers, so I needed to add "-m64" to CFLAGS_LOCAL.

(2) To get rid of missing symbols (that didn't draw complaints on either Intel Linux build), I had to add "$(WRF_SRC_ROOT_DIR)/frame/module_domain_type.o $(WRF_SRC_ROOT_DIR)/external/io_grib2/grib2tbls_types.o" to LDFLAGS_LOCAL.

Then, the weirdness started. On the two Core Quad machines (Mac and Linux), the OMP ifort build doesn't spawn more than one thread or capture more than 100% according to top and Activity Monitor, despite the setting for OMP_NUM_THREADS. It seems OMP is broken, and it's not platform-specific.

WRF3 is not backward compatible with respect to input files, and this makes upgrading WPS a must. But the new version of WPS' geogrid program isn't working anywhere. It spits out a segfault (at the same place) even when trying to build the example domain. Same experience with ifort and PGI, on the two Linux machines. I verified that the WPS_GEOG data have not changed, so it's not that.

No Mac WPS yet since I don't have, and have been unable as of yet to successfully build, NCL for 64 bit on the Mac. I tried to create parallel 64 and 32 bit versions of WRF and WPS (and all other necessary software) to circumvent this, but decided not to continue since the new WPS is not working yet on the Linux boxes.