- g95 on OS X/PPC (donar_g95)
- xlf on OSX/PPC (donar_xlf)
- g95 on Mandriva Linux/Athlon (cascade_g95)
- ifort on Mandriva Linux/Athlon (cascade_ifort)
The first thing I check is to see if the model produces the same results, as verified by checksums and visual inspection of plotted fields, independent of the number of cpus -- at least until the domain becomes too finely subdivided. So far in this experiment, only g95 on OS X/PPC is passing that test. (It is likely that ifort will also pass, but that portion of the experiment is not yet finished.) The checksums are the same for runs with 1 to 24 cpus. For xlf on OS X/PPC, runs with 1, 2, 12 and 18 cpus produce one particular checksum while 4, 6, 8, 10 and 14 cpus result in another. 16 and 20 cpus result in a third checksum, and 24 cpus produces a fourth sum. The 12 hour forecast fields differ in some respects, with patterns that suggest roundoff errors.
Unsurprisingly, g95 is slower than a commercial compiler on the same hardware. The plot below presents timing results obtained thusfar. There are gaps in the data, and some degree of inconsistency regarding how the time function works on OS X and Linux. I will redo these statistics using Brian Jewett's scripts for extracting timings from the rsl.out.0000 files. Also, for g95 on OS X/PPC, I/O is a bottleneck. Running with nio_tasks_per_group > 0 appears to help a lot.
Click the image to open a larger version in a new window.
Here is the latest configure.wrf segment for my g95 OS X PPC runs.