Discussion:
Problems with unformatted files on mpich + nfs
(too old to reply)
glen herrmannsfeldt
2007-12-11 12:22:20 UTC
Permalink
I'm working on a task to 'paralellizing' an already made FORTRAN
application. Since it involves a huge count of variables per
iteration, I decided to use file operations (read and write) to pass
the values between processes.
If I spawn these processes ( let say 2 to 4 processes ) on SMP system
with 4 cores, the results are fine. But the problem occur if I spawn
them on 2 to 4 different computers (on a cluster) with exact OS and
specification ; the final results are wrong. Both systems (SMP and
cluster) use NFS as file storage. Are FORTRAN files that are
unformated isn't portable between computers even if they share an
exact specification and compiler ?
(snip)
open(1,file='var6.dat',form='unformatted',status='unknown')
write(1) var1,var2,var3
close(1)
open(1,file='var6.dat',form='unformatted',status='old')
read(1) var1,var2,var3
close(1)
Here is the program flow on 'human language'
Process1 write var6.dat ->
Mpi-barrier ->
Process2 read var6.dat
OS : Linux kernel 2.6.22
compiler : mpich + g77
Filesystem : NFS
I haven't worried about this one for a while, and added
comp.protocols.nfs in case anyone there knows.

NFS does some buffering, and MPI barrier may not be enough to
make sure that all buffers are written back.

The NFS option for synchronous writes is supposed to not return
to the writer until the data is on a physical storage device.
That may or may not stop any read buffering, though.
Also, be sure you use hard mounts.

If it is NFS buffering you should either have old data or EOF.
Then again, there could always be bugs in the NFS implementation.

Anyway, I believe your problem is NFS, not Fortran.

-- glen
Nick Maclaren
2007-12-11 12:30:13 UTC
Permalink
In article <***@comcast.com>,
glen herrmannsfeldt <***@ugcs.caltech.edu> writes:
|> antok.tm wrote:
|>
|> > I'm working on a task to 'paralellizing' an already made FORTRAN
|> > application. Since it involves a huge count of variables per
|> > iteration, I decided to use file operations (read and write) to pass
|> > the values between processes.
|>
|> > If I spawn these processes ( let say 2 to 4 processes ) on SMP system
|> > with 4 cores, the results are fine. But the problem occur if I spawn
|> > them on 2 to 4 different computers (on a cluster) with exact OS and
|> > specification ; the final results are wrong. Both systems (SMP and
|> > cluster) use NFS as file storage. Are FORTRAN files that are
|> > unformated isn't portable between computers even if they share an
|> > exact specification and compiler ?
|>
|> I haven't worried about this one for a while, and added
|> comp.protocols.nfs in case anyone there knows.
|>
|> NFS does some buffering, and MPI barrier may not be enough to
|> make sure that all buffers are written back.

It isn't. You need the Fortran 2003 FLUSH statement, to close the
file, or a system-dependent call to a FLUSH subroutine. And that
IS a Fortran issue.

|> The NFS option for synchronous writes is supposed to not return
|> to the writer until the data is on a physical storage device.
|> That may or may not stop any read buffering, though.
|> Also, be sure you use hard mounts.

Also, in NFS 3 and earlier, there are race conditions between the
data transfer and the Inode update, which can cause bizarre effects.

This is not helped by the POSIX specification of fsync and Synchronized
I/O File Integrity Completion - have YOU spotted what it doesn't
require and there is no POSIX mechanism to force? :-)

|> If it is NFS buffering you should either have old data or EOF.
|> Then again, there could always be bugs in the NFS implementation.

There always are - virtually every NFS implementation adds buffering
of forms that is forbidden by NFS, because the performance is
catastrophic if you don't.


Regards,
Nick Maclaren.
glen herrmannsfeldt
2007-12-11 16:55:41 UTC
Permalink
(I wrote)
|> > I'm working on a task to 'paralellizing' an already made FORTRAN
|> > application. Since it involves a huge count of variables per
|> > iteration, I decided to use file operations (read and write) to pass
|> > the values between processes.
|> NFS does some buffering, and MPI barrier may not be enough to
|> make sure that all buffers are written back.
It isn't. You need the Fortran 2003 FLUSH statement, to close the
file, or a system-dependent call to a FLUSH subroutine. And that
IS a Fortran issue.
The OP's example had a CLOSE, but I snipped it out. If you
write on one remote machine, and read on another remote machine,
does CLOSE guarantee the changes are seen on the second machine?

I think I remember creating files on one machine and then it
taking many seconds before I would see them on another machine.
|> The NFS option for synchronous writes is supposed to not return
|> to the writer until the data is on a physical storage device.
|> That may or may not stop any read buffering, though.
|> Also, be sure you use hard mounts.
Also, in NFS 3 and earlier, there are race conditions between the
data transfer and the Inode update, which can cause bizarre effects.
This is not helped by the POSIX specification of fsync and Synchronized
I/O File Integrity Completion - have YOU spotted what it doesn't
require and there is no POSIX mechanism to force? :-)
I gave up on locking a long time ago, when I had a system spending
all its time in a lockd loop waiting for something that was never
going to happen.
|> If it is NFS buffering you should either have old data or EOF.
|> Then again, there could always be bugs in the NFS implementation.
There always are - virtually every NFS implementation adds buffering
of forms that is forbidden by NFS, because the performance is
catastrophic if you don't.
So, my guess is that the OP is getting a previous version of the
file from the buffer, as the system hadn't noticed the changes.

-- glen

Loading...