'ssh node "ls -lR"': consumes all memory on the client!

Discussion:

(too old to reply)

Zoli

2007-09-13 15:37:44 UTC

This issue is with a volume exported from the server, when accessed
indirectly via SSH from the server, i.e. not logging into the node:

ssh node "ls -lR /files/export > /dev/null"

The command takes up all available memory (near 4Gb) on the node,
while generating hundreds of thousand of slabs (nfs_inode_cache and
dentry_cache) as seen with slabtop. The system is rather plain Ubuntu:
2.6.15-26-amd64-k8 kernel.

/etc/exports on the server:
/files/export
10.68.0.0/255.255.0.0(sync,no_wdelay,rw,no_root_squash)

/etc/fstab entry on the client node:
home:/files/export /files/export nfs
rw,nosuid,rsize=32768,wsize=32768,tcp 0 0

I'd very much appreciate any pointers on how to treat this problem.

Zoli

Bill Marcum

2007-09-13 17:04:59 UTC

Permalink

On Thu, 13 Sep 2007 15:37:44 -0000, Zoli

Is /dev/null correct?
crw-rw-rw- 1 root root 1, 3 2006-05-22 10:25 /dev/null

Maybe the sorting causes ls to take up memory. Try ls -lRU
Or maybe it's the recursion. Do you have directories containing
thousands of files or thousands of subdirectories?
Or maybe you have circular hard links. fsck should detect that.

--
Blutarsky's Axiom:
Nothing is impossible for the man who will not listen to reason.

Zoli

2007-09-14 07:51:09 UTC

Permalink

[...] Is /dev/null correct?
crw-rw-rw- 1 root root 1, 3 2006-05-22 10:25 /dev/null

Yes, it is.

Maybe the sorting causes ls to take up memory. Try ls -lRU

I just have, with very interesting results: when looking for the line-
count with 'ls -lRU /files/export/ |wc >export.ls-lRU_wc' while logged
into the client, it provoked the same memory consumption I had only
seen via SSH call previously! The total count came out as 5179046
lines, and slabtop says:

OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE
NAME
2306142 2306141 99% 1.02K 768714 3 3074856K
nfs_inode_cache
1520616 1519124 99% 0.22K 89448 17 357792K dentry_cache

And 'free' gives:

total used free shared buffers
cached
Mem: 4053584 3796608 256976 0 68220
240680
-/+ buffers/cache: 3487708 565876
Swap: 4000144 92 4000052

After umount/mount the volume on the client, memory comes back to
normal:
total used free shared buffers
cached
Mem: 4053584 131988 3921596 0 68248
8908
-/+ buffers/cache: 54832 3998752
Swap: 4000144 92 4000052

Now trying a test with the output discarded, 'ls -lRU /files/export/

/dev/null' launched from the client: results in essentially the same

memory lost!
Then I repeated with 'ls -lR /files/export/ >/dev/null' (which had
not give the problem before, when logged into the client), this time
giving the same problem... So it seems at least we can take the SSH
call out of the picture, but I'm as baffled as before! (The same is
happening after a clean reboot, too.)

Or maybe it's the recursion. Do you have directories containing
thousands of files or thousands of subdirectories?

Recursion certainly has something to do with it (plain 'ls -l' is not
a problem, at least at the top-level where I tried) - but why? And
yes, its a big RAID disk with lots of stuff, including some huge
directories...

Or maybe you have circular hard links. fsck should detect that.

Although I'm fairly certain there are no circular links (actually I
think there are only soft links on that volume), I'll give fsck a try
when have a chance.

Thanks again - Zoli

David Schwartz

2007-09-14 15:15:19 UTC

Permalink

What's the problem? Is the memory not freed when it's needed for
something else?

The most efficient thing the system can do is keep the data it read
from the disk in memory just in case it's needed later. There is no
reason to throw the data away by freeing the memory just in case the
memory is needed later -- if it's needed later, it can be freed later.

If you don't want your memory used, put it on the shelf.

DS

Zoli

2007-09-14 15:34:47 UTC

Permalink

Post by David Schwartz

Post by Zoli
ssh node "ls -lR /files/export > /dev/null"
The command takes up all available memory (near 4Gb) on the node,

What's the problem?
Is the memory not freed when it's needed for something else?

Exactly.

Alas, in the meantime I found that it's probably not related to NFS
as such (so I reset followup-to): the same occurs on the server with
the host volume...

-- Zoli

Nico

2007-09-15 00:20:17 UTC

Permalink

Post by Zoli
This issue is with a volume exported from the server, when accessed
ssh node "ls -lR /files/export > /dev/null"
The command takes up all available memory (near 4Gb) on the node,
while generating hundreds of thousand of slabs (nfs_inode_cache and
2.6.15-26-amd64-k8 kernel.
/files/export
10.68.0.0/255.255.0.0(sync,no_wdelay,rw,no_root_squash)
home:/files/export /files/export nfs
rw,nosuid,rsize=32768,wsize=32768,tcp 0 0
I'd very much appreciate any pointers on how to treat this problem.
Zoli

What is *in* /files/exports?

Zoli

2007-09-17 08:32:58 UTC

Permalink

Post by Nico
What is *in* /files/exports?

It's a 450Gb RAID volume: as I've noted in my detailed report
<***@o80g2000hse.googlegroups.com>, the full file
listing is some 5M lines.

Also in the meantime I found that the trouble is probably not related
to NFS
as such (so I reset followup-to): the same occurs on the server with
the host volume, too...

-- Zoli