Stopping the NFS mystery
In my most recent post over at
"on
being open", I talk about the suitability of Fedora 7 for a home Linux.
One of the things I close with there is the idea that Fedora 7 is actually a
pretty good OS for the data center or the professional desktop. Very bleeding
edge. Chases out problems well in advance of some other OS's. One problem we
found with it, and later kerneled versions of Fedora Core 6 was that they did
not work at all well with our Tru64 TruCluster NAS.
I am not one to be mysterious most of the time, but apparently I was not quite detailed enough in my March 1st, 2007 post about the cause of the problem, and the short term solution we are using. Why I say "Mysterious" should become clear in a moment.
Names in this Blog
I wanted to take a quick detour to mention my policy about publishing names in this or my other weblog. When someone writes me offline (I.E., not using the comments button, but the "Contact Me" button), I assume that they do not want their name published. Most of the time I do not publish any of what they have written.
Today's post is an exception because there is a great deal of technical detail
in the email, and the problem under discussion while not common unless you are
using Tru64 as a NAS server is going to start happening for anyone who does.
NFS V4 will create this issue. Recent Linux clients will therefore create this
issue. Maybe other NAS servers that behave the way Tru64 does in some way will
create this issue.
Today's post then is largely Google and other search engine bait, to help anyone else who finds this problem to a quick solution. Like the OS.X / 64 bit NFS server problem before it (in this weblog: way back there), there will be people hitting this, and it is not well documented as a problem out there yet.
The NAS readdirplus Problem in a Nutshell
The author of this email does a far better job than I of really peeling back the covers on this, so lets dive into their letter:
I've googled my way to your blog - seeking answers to a problem we've just encountered here (at -deleted for privacy-) with Fedora Core 6 NFS (via autofs) dealing with a Tru64 NFS server. Excuse the length of this mail, and my presumption in seeking your advice, but I'm at the limits of my knowledge/understanding here.
I just want to stop here for a second and say that this blog is all about knowledge sharing. I do this blog after hours (and for many many hours at a time) because this is a subject I care about, and I truly enjoy the interactions I have with others around the world. I hate to jinx this or anything, but only once in the years that I have been doing this have I ever gotten anything like hate mail. Everyone else has been kind and decent and concerned and knowledgeable (as this writer is), and that makes it worthwhile.
So, there is no presumption, and no imposition. Its why I do this. In fact, thank you for writing me about this so that hopefully everyone can benefit.
Back to the problem at hand:
Everything's been OK up till now. We have Linux NFS clients going back years (to RedHat 7.3 even), plus Suns (running Solaris) and SGI systems (running SLES). Now, on a newly installed and yum-updated Fedora Core 6 system, we're seeing "Not a directory" messages when trying to cd down a filesystem that is automounted from our Tru64 server.
Breaking in again to note here that default Fedora Core 6, out of the box works *fine*. It is when the 2.6.20 kernel is loaded that the new NFS client code is brought along, and the readdirplus problem begins to occur. It happens all day long on Fedora Core 7.
Accessing the same automounted filesystem from a Fedora Core 5 system (or earlier) is fine. The only difference between the two cases is on the client side. The Tru64 server is unchanged and the nfs mount is from a NIS auto.direct map. Some examples by way of illustration: First, a working FC5 system (automount version 4.1.4-29):
% cd /home_a/user/d1
% cd d2
% ls -l
drwxrwxr-x 3 user user 8192 Jul 3 14:22 d2
% cd d3
% cd d4
% cd d5
% cd d6
% pwd
/home_a/user/d1/d2/d3/d4/d5/d6
% grep home_a /proc/mounts
automount(pid2155) /home_a autofs
rw,fd=4,pgrp=2155,timeout=600,minproto=2,maxproto=4,indirect 0 0
tru64:/export/fs1/user /home_a/user nfs
rw,vers=3,rsize=65536,wsize=65536,hard,proto=tcp,timeo=600,retrans=2,addr=tru64
0 0
Now on a "broken" FC6 system (automount version 5.0.1-0.rc3.31):
% cd /home_a/user/d1
% ls -l
drwxrwxr-x 3 user user 8192 Jul 3 14:22 d2/
% cd d2
d2: Not a directory.
% grep home_a /proc/mounts
auto.home_a /home_a autofs
rw,fd=19,pgrp=6020,timeout=600,minproto=5,maxproto=5,indirect 0 0
tru64:/export/fs1/user /home_a/user nfs
rw,vers=3,rsize=65536,wsize=65536,hard,proto=tcp,timeo=600,retrans=2,sec=sys,addr=tru64
0 0
tru64:/export/fs1/user/d1/d2 /home_a/user/d1/d2 nfs
rw,vers=3,rsize=65536,wsize=65536,hard,proto=tcp,timeo=600,retrans=2,sec=sys,addr=tru64
0 0
Notice the additional line purportedly showing a mount of /home_a/user/d1/d2.
The behaviour is interesting in some aspects, as it seems that it's
possible to cd directly way down an automounted filesystem provided you
don't attempt any long directory listings. Then one can cd back up the
tree OK. It's doing the long directory listing that seems to cause the
additional entry to be made in /proc/mounts and to result in the subsequent
"Not a directory" error when attempting to cd further down.
In your blog of March 01, you've written one or two tantalizing paragraphs
that say that say that "the NFS server is where the real problem is" and that you'll have to work around the problem. The question is, how?
Is the "Not a directory" error that I've noted above the sort of thing
that you have seen?
There it is in a nutshell. Well. It was making us nuts anyway.
It was and is very weird. If you unmounted and remounted, you could navigate directly to places you knew existed and all would be OK with the world. Do anything that causes 'readdirplus' (like 'ls') to be issued, and it is game over. "Not a Directory" from there on out.
And, again in the tantalizing category, the man page for NFS(5) on my FC6
system documents this new option ...
nordirplus
Disables NFSv3 READDIRPLUS RPCs. Use this options when mounting
servers that don't support or have broken READDIRPLUS implementations.
This option doesn't appear in the man page on my FC5 system. Is
READDIRPLUS what happens when a ls -l is issued on an NFS client? I wonder
if the nordirplus option being implemented as result of the sort of things
that you've blogged about? Regardless. I've tried to use nordirplus as a
mount option on FC6. It's accepted on the command line, but is not listed
in a /proc/mounts output and it doesn't seem to alter the "Not a
directory" problem behaviour either :-(
In our email conversation I never answered this part of the query, because I was already sure that this was the exact problem we were having. Our Twister of All Things Network Storage mentioned in passing to me one day that he had noticed that the options did not work the way the way he thought they should... and I think this was the option he mentioned. But I won't swear to it.
It does seem like a buggy behavior: or it may be the man page has the doc before the feature is actually implemented? I don't know. Sigh. I guess we'll have another update when someone someplace knows the right answer on this.
Letter 2 (with a portion of my response)
>You have hit the exact problem I was referencing.
>Simple fix is to make the export version 2.
It's nice to have that confirmation, and forcing the mount to be NFS v2
does indeed work. I've set that in the auto.master file on the Fedora Core
6 client only:
# For details of the format look at autofs(8)
#
/- yp:auto.direct --ghost nfsvers=2
/home_a yp:auto.home_a nfsvers=2
Now, I just need to work out how to do some more sophisticated
configuration of the automounter maps so that the nfsvers=2 option is only
applied to automounts from the Tru64 server and not to everything in the
direct map (as happens with the setup above).
Then, I need to work on retiring the Tru64 server :-)
I'll also keep an eye on the nordirplus option that I mentioned in my
first email, as I'm curious to know why this didn't seem to work.
Page 2
The NFS V4 client of later Fedora 6 and and all of Fedora 7 is going to try to use the "readdirplus" if it thinks it can. It appears if it sees a V3 export, it thinks it can. So far, to date this problem has only manifested for us as an issue with Tru64 as a the server and Fedora Linux as the client.
I see no reason based on the actual problem that this will not get worse over time. Seems a cert in fact. How many other older-kerneled NAS servers are going to do the wrong thing when challenged with this new client behavior?
My Senior NAS Beater tells me that he worked this problem with network traces and looked at the client side code (the wonders of Open Source), and talked to the Linux NFS client author (another wonder of the Open Source community). At the end of this investigation, he is satisfied the the client is playing 100% by the NFS rules. Even though we have only seen it to date with Fedora clients, this will change. Fedora is just farther down the NFS client code adoption bunny trail right now.
To verify this in fact, the Chief Mugwump of NAS Destruction (yep: I steal from J.K. Rowling too...) modified the Fedora client code to always force READDIR rather than READDIRPLUS, and the problem stopped with the Tru64 NAS server.
For now, we use NFS V2 over TCP to any clients with the problem. In future, the very near future, we'll retire the formerly awesome Tru64 TruCluster from NAS duties. We'll be very sad on that day because we know that it happened not for technical reasons but because of parental neglect.
_____
tags:
got the same issue with a Tru64 & a RHEL5 client
same issue with VMware ESX 3 client and OS X 10.5 server
There appear to be two problems here:
1) tcpdump on the ESX service console will not monitor vmkernel traffic (and NFS traffic goes over this interface, if the mount was initiated by esxcfg-nas). I'm pretty sure there's a way to do this, but I haven't sorted out the details yet. However, the real problem is:
2) OS X NFS server (as of 10.5 Leopard) apparently does not grok readdirplus() - after comparing tcpdump outputs on the server side between ls(1) attempts on the share mounted via mount(8) and esxcfg-nas(8), I finally noticed that the former was issuing readdir(), while the latter was using readdirplus().
As of ESX-3.5 (requirement introduced in ESX-3.0 I believe), nfsv3 over TCP is the only supported transport mechanism, so I can't just drop back to nfsv2 to get around the issue. I'm now looking into whether there's a secret knob in OS X that will make it parse readdirplus() properly, but I'm not optimistic. Perhaps the upcoming 10.5.3 update will fix the problem.
cheers!

THX
Replies to this comment