glen herrmannsfeldt
2007-12-11 12:22:20 UTC
I'm working on a task to 'paralellizing' an already made FORTRAN
application. Since it involves a huge count of variables per
iteration, I decided to use file operations (read and write) to pass
the values between processes.
If I spawn these processes ( let say 2 to 4 processes ) on SMP system
with 4 cores, the results are fine. But the problem occur if I spawn
them on 2 to 4 different computers (on a cluster) with exact OS and
specification ; the final results are wrong. Both systems (SMP and
cluster) use NFS as file storage. Are FORTRAN files that are
unformated isn't portable between computers even if they share an
exact specification and compiler ?
(snip)application. Since it involves a huge count of variables per
iteration, I decided to use file operations (read and write) to pass
the values between processes.
If I spawn these processes ( let say 2 to 4 processes ) on SMP system
with 4 cores, the results are fine. But the problem occur if I spawn
them on 2 to 4 different computers (on a cluster) with exact OS and
specification ; the final results are wrong. Both systems (SMP and
cluster) use NFS as file storage. Are FORTRAN files that are
unformated isn't portable between computers even if they share an
exact specification and compiler ?
open(1,file='var6.dat',form='unformatted',status='unknown')
write(1) var1,var2,var3
close(1)
open(1,file='var6.dat',form='unformatted',status='old')
read(1) var1,var2,var3
close(1)
Here is the program flow on 'human language'
Process1 write var6.dat ->
Mpi-barrier ->
Process2 read var6.dat
OS : Linux kernel 2.6.22
compiler : mpich + g77
Filesystem : NFS
I haven't worried about this one for a while, and addedwrite(1) var1,var2,var3
close(1)
open(1,file='var6.dat',form='unformatted',status='old')
read(1) var1,var2,var3
close(1)
Here is the program flow on 'human language'
Process1 write var6.dat ->
Mpi-barrier ->
Process2 read var6.dat
OS : Linux kernel 2.6.22
compiler : mpich + g77
Filesystem : NFS
comp.protocols.nfs in case anyone there knows.
NFS does some buffering, and MPI barrier may not be enough to
make sure that all buffers are written back.
The NFS option for synchronous writes is supposed to not return
to the writer until the data is on a physical storage device.
That may or may not stop any read buffering, though.
Also, be sure you use hard mounts.
If it is NFS buffering you should either have old data or EOF.
Then again, there could always be bugs in the NFS implementation.
Anyway, I believe your problem is NFS, not Fortran.
-- glen