Discussion:
random unmounts on the NFS client
(too old to reply)
Eman On
2007-07-12 11:39:37 UTC
Permalink
Hello:

Hello, I am running NFS Server, Solaris and NFS client on AIX. I am
currently doing hard mounts and not using autofs or automounter.
From time to time, the client loses NFS connectivity with NFS server.
I keep getting a RPC timeout issue. I am not sure why this happens.
The only solution is, force unmount on the client, and restart NFS on
the server, and everything is working again.

Any thoughts? Ideas?
TIA
George N. White III
2007-07-14 12:33:02 UTC
Permalink
Post by Eman On
Hello, I am running NFS Server, Solaris and NFS client on AIX. I am
currently doing hard mounts and not using autofs or automounter.
From time to time, the client loses NFS connectivity with NFS server.
I keep getting a RPC timeout issue. I am not sure why this happens.
The only solution is, force unmount on the client, and restart NFS on
the server, and everything is working again.
Any thoughts? Ideas?
Learn to provide proper bug reports.

Not a lot of information here -- what type of network (switched?), are you
seeing other network problems, are the failures random or reproducible,
etc. Versions would be useful.

Where I work we have SGI clients mounting Solaris exports of a
hierarchical storage system using switched ethernet. There have been bugs
in SGI's NFS implementation for files that were staged to tape
(reproducible with difficulty, since trying to access a staged file
migrates it back to disk), duplex mismatch problems between the SGI
machines and the switches, and problems from the switches dropping
UDP traffic when busy.

If you are using duplex negotiation, you might consider reconfiguring
for a fixed setting.

If you are using UDP on switched ethernet you should try TCP. There are
some web pages with information on NFS over switched ethernet.
--
George N. White III <***@chebucto.ns.ca>
Eman On
2007-07-15 14:47:11 UTC
Permalink
Post by George N. White III
Post by Eman On
Hello, I am running NFS Server, Solaris and NFS client on AIX. I am
currently doing hard mounts and not using autofs or automounter.
From time to time, the client loses NFS connectivity with NFS server.
I keep getting a RPC timeout issue. I am not sure why this happens.
The only solution is, force unmount on the client, and restart NFS on
the server, and everything is working again.
Any thoughts? Ideas?
Learn to provide proper bug reports.
Not a lot of information here -- what type of network (switched?), are you
seeing other network problems, are the failures random or reproducible,
etc. Versions would be useful.
Where I work we have SGI clients mounting Solaris exports of a
hierarchical storage system using switched ethernet. There have been bugs
in SGI's NFS implementation for files that were staged to tape
(reproducible with difficulty, since trying to access a staged file
migrates it back to disk), duplex mismatch problems between the SGI
machines and the switches, and problems from the switches dropping
UDP traffic when busy.
If you are using duplex negotiation, you might consider reconfiguring
for a fixed setting.
If you are using UDP on switched ethernet you should try TCP. There are
some web pages with information on NFS over switched ethernet.
--
Yes, this is a switched 10/100 network. Both the client and server are
on the same subnet. I am not seeing any network problems, and this
problem is totally random (as far as I know). We can't seem to
reproduce the problem, so I say its random. We are using NFS version
3 with tcp, according to nfsstat -m

The media duplex speed between server and client are set to 10/100
Full Duplex (no auto negotiate).

We haven't had the problem yet. But once it occurs, I am not really
sure how to troubleshoot or find the "root cause". I will try to do a
tcpdump on the client to see whats going. As I was saying, on the
client side: the mounts go bad and RPC timeout error.
George N. White III
2007-07-16 21:15:35 UTC
Permalink
Post by Eman On
Post by George N. White III
Post by Eman On
Hello, I am running NFS Server, Solaris and NFS client on AIX. I am
currently doing hard mounts and not using autofs or automounter.
From time to time, the client loses NFS connectivity with NFS server.
I keep getting a RPC timeout issue. I am not sure why this happens.
The only solution is, force unmount on the client, and restart NFS on
the server, and everything is working again.
Any thoughts? Ideas?
Learn to provide proper bug reports.
Not a lot of information here -- what type of network (switched?), are you
seeing other network problems, are the failures random or reproducible,
etc. Versions would be useful.
Where I work we have SGI clients mounting Solaris exports of a
hierarchical storage system using switched ethernet. There have been bugs
in SGI's NFS implementation for files that were staged to tape
(reproducible with difficulty, since trying to access a staged file
migrates it back to disk), duplex mismatch problems between the SGI
machines and the switches, and problems from the switches dropping
UDP traffic when busy.
If you are using duplex negotiation, you might consider reconfiguring
for a fixed setting.
If you are using UDP on switched ethernet you should try TCP. There are
some web pages with information on NFS over switched ethernet.
--
Yes, this is a switched 10/100 network. Both the client and server are
on the same subnet. I am not seeing any network problems, and this
problem is totally random (as far as I know). We can't seem to
reproduce the problem, so I say its random. We are using NFS version
3 with tcp, according to nfsstat -m
Based on my experiences, you should look carefully at the switch
(especially if you don't have hierarchical storage or some other
problem causing latency on the server).

There are some general documents about NFS and switched ethernets on the
web. You may be able to make it fall over using a tool to generate heavy
RPC traffic. Have a look at
<http://www.eecs.harvard.edu/sos/papers/ellard_freenix03/bench.htm>
Post by Eman On
The media duplex speed between server and client are set to 10/100
Full Duplex (no auto negotiate).
Does 10/100 mean one end is 10 and the other 100? We set the ports on
the switch to 100-FD, on the SGI workstations we have to build a custom
kernel to turn off autogegotiation.
Post by Eman On
We haven't had the problem yet. But once it occurs, I am not really
sure how to troubleshoot or find the "root cause". I will try to do a
tcpdump on the client to see whats going. As I was saying, on the
client side: the mounts go bad and RPC timeout error.
At one time we had problems with the switches cutting off systems that
didn't fit the vendor's idea of "normal Windows PC client traffic
patterns". The switch had a setting for "server" ports. Can you peek at
the logs for the switch?
--
George N. White III <***@chebucto.ns.ca>
Loading...