Okay... I'm getting to the point where I want to release my local caching
patches again and have NFS work with them. This means making NFS mounts share
or not share appropriately - something that's engendered a fair bit of
argument.
So I'd like to solicit advice on how best to deal with this problem.
Let me explain the problem in more detail.
================
CURRENT PRACTICE
================
As the kernel currently stands, coherency is ignored for mounts that have
slightly different combinations of parameters, even if these parameters just
affect the properties of network "connection" used or just mark a superblock
as being read-only.
Consider the case of a file remotely available by NFS. Imagine the client sees
three different views of this file (they could be by three overlapping mounts,
or by three hardlinks or some combination thereof).
This is how NFS currently operates without any superblock sharing:
+---------+
Object on server ---> | |
| inode |
| |
+---------+
/|\
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
| | |
| | |
:::::::::::::NFS::::::::|:::::::::::|:::::::::::|:::::::::::::::::::::::::::::
| | |
| | |
| | |
+---------+ +---------+ | |
| | | | | |
| mount 1 |----->| super 1 | | |
| | | | | |
+---------+ +---------+ | |
| |
| |
+---------+ +---------+ |
| | | | |
| mount 2 |----------------->| super 2 | |
| | | | |
+---------+ +---------+ |
|
|
+---------+ +---------+
| | | |
| mount 3 |----------------------------->| super 3 |
| | | |
+---------+ ...I don't have figures on that, but I do know people have complained about it My point was meant to be that the presence and coverage of a cache is more likely to reflect the client machine than would the NIS map for the NFS automounts. You wouldn't necessarily want to store this table in NIS. David --
I don't see how persistent local caching means we can no longer ignore (a) and (b) above. Can you amplify this a bit? Nothing you say in the rest of your proposal convinces me that having multiple caches for the same export is really more than a theoretical issue. Frankly, the reason why admins mount exports multiple times is precisely because they want different applications to access the files in different ways. Admins *want* one mount point to be available ro, and another rw. They *want* one mount point to use 'noac' and another not to. They *want* multiple sockets, more RPC slots, and unique caches for different applications. No one would go to the trouble of mounting an export again, using different options, unless that's precisely the behavior that they wanted. This is actually a feature of NFS. It's used as a standard part of production environments, for example, when running Oracle databases on NFS. One mount point is rw and is used by the database engine. Another mount point is ro and is used for back-up utilities, like RMAN. Another example is local software distribution. One mount point is ro, and is accessed by normal users. Another mount point accesses the same export rw, and is used by administrators who provide updates for the software. As useful as the feature is, one can also argue that mounting the same export multiple times is infrequent in most normal use cases. Practically speaking, why do we really need to worry about it? The real problem here is that the NFS protocol itself does not support strong cache coherence. I don't see why the Linux kernel must fix that problem. The only real problem with the first scenario is that you may have more than one copy of a file in the persistent cache. How often will that be the case? Since the local persistence cache is probably disk- based and thus large relative to memory, what's the problem with using a little extra space? The problems ...
How about I put it like this. There are two principal problems to be dealt
with:
(1) Reconnection.
Imagine that the administrator requests a mount that uses part of a cache.
The client machine is at some time later rebooted and the administrator
requests the same mount again.
Since the cache is meant to be persistent, the administrator is at liberty
to expect that the second mount immediately begins to use the data that
the first mount left in the cache.
For this to occur, the second mount has to be able to determine which part
of the cache the first mount was using and request to use the same piece
of cache.
To aid with this, FS-Cache has the concept of a 'key'. Each object in the
cache is addressed by a unique key. NFS currently builds a key to the
cache object for a file from: "NFS", the server IP address, port and NFS
version and the file handle for that file.
(2) Cache coherency.
Imagine that the administrator requests a mount that uses part of a
cache. The administrator then makes a second mount that overlaps the
first, maybe because it's a different part of the same server export or
maybe it uses the same part, but with different parameters.
Imagine further that a particular server file is accessible through both
mountpoints. This means that the kernel, and therefore the user, has two
views of the one file.
If the kernel maintains these two views of the files as totally separate
copies, then coherency is mostly not a kernel problem, it's an application
problem - as it is now.
However, if these two views are shared at any level - such as if they
share an FS-Cache cache object - then coherency can be a problem.
The two simplest solutions to the coherency problem are (a) to enforce
sharing at all levels (superblocks, inodes, cache objects), (b) to enforce
non-sharing. In-between states are possible, but ...Hi David- Why not use the fsid as well? The NFS client already uses the fsid to detect when it is crossing a server-side mount point. Fsids are supposed to be stable over server reboots (although sometimes they aren't, it could be made a condition of supporting FS-cache on clients). I also note the inclusion of server IP address in the key. For multi- homed servers, you have the same unavoidable cache aliasing issues if the client mounts the same server and export via different server Is it a problem because, if there are multiple copies of the same remote file in its cache, then FS-cache doesn't know, upon reconnection, which item to match against a particular remote file? I think that's actually going to be a fairly typical situation -- you'll have conditions where some cache items will become orphaned, for example, so you're going to have to deal with that ambiguity as a part of normal operation. For example, if the FS-caching client is disconnected or powered off when a remote rename occurs that replaces a file it has cached, the client will have an orphaned item left over. Maybe this use case is How do you propose to do that? First, clearly, FS-cache has to know that it's the same object, so fsid and filehandle have to be the same (you refer to that as the "reconnection problem", but it may generally be a "cache aliasing problem"). I assume FS-cache has a record of the state of the remote file when it was last connected -- mtime, ctime, size, change attribute (I'll refer to this as the "reconciliation problem")? Does it, for instance, checksum both the cache item and the remote file to detect data differences? You have the same problem here as we have with file system search tools such as Beagle. Reconciling file contents after a reconnection event may be too expensive to consider for NFS, especially if a file Do you allow administrators to select whether the FS-cache is persistent? Or is it ...
Why use the FSID at all? The file handles are supposed to be unique per
I'm aware of this, but unless there's:
(a) a way to specify a logical server group to the kernel, and
(b) a guarantee that the file handles of each member of the logical group are
common across the group
there's nothing I can do about it.
AFS deals with these by making servers second class citizens, and defining
"file handles" to be a set within the cell space.
Besides, I can use the IP address of the server as a key. I just have to hope
that the IP address doesn't get transferred to a different server because, as
There are multiple copies of the same remote file that are described by the
same remote parameters. Same IP address, same port, same NFS version, same
Orphaned stuff in the cache is eventually culled by cachefilesd when there's
Rename isn't a problem provided the FH doesn't change. NFS effectively caches
inodes, not files. If the remote file is deleted, then either NFS will try
opening it, will fail and will tell the cache to evict it; or the remote file
will never be opened again and the garbage in the cache will be culled
eventually. It may even hang around for ever, but if the FH it re-used, the
cache object will be evicted based on mtime + ctime + filesize being
different.
If someone tries hard enough, they can probably muck up the cache, but there's
For NFS, check mtime + ctime + filesize upon opening. It's in the patch
already.
mtime + ctime + size, yes. I should add the change attribute if it's present,
No. That would be horrendously inefficient. Besides, if we're going to
checksum the remote file each time, what's the point in having a persistent
Because NFS v2 and v3 don't support proper coherency, there's a limited amount
we can do without being silly about it. You just have to hope someone doesn't
wind back the clock on the server in order to fudge the ctime to give your
cache conniptions. But if someone's willing to go to such lengths, ...