Re: [Tux3] Two kinds of atomic commit

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Daniel Phillips
Date: Monday, July 28, 2008 - 12:52 pm

On Monday 28 July 2008 09:58, Matthew Dillon wrote:

Then I need to rewrite the post so it seems as simple as it is :-)


In this case a log transaction is created containing all of the split
nodes as physical updates and possibly some merged nodes from the free
tree.  Btree splitting can always be committed atomically and
independently of any other activity on the filesystem.  It is nicely
bounded by the btree depth.  Allocations that do not require splitting
the free tree are logged logically, leaving the affected free tree
blocks dirty in cache.  So the allocation updates come "for free",
being tucked into the same commit block that governs the split btree
leaves.


I now see how UNDO works, thankyou for your patient explanations.  The
point I overlooked is, fsync (and friends) is indeed the only barrier
you have to worry about.

Still, I think it is good to try to get as much as possible of what was
going on in a bit burst of activity with no fsync durably onto disk.
Redo will clearly leave the filesystem in a consistent state less far
back in time than undo.

So my general strategy is to log big changes like splits as "physical"
full block updates and small ones like allocating an extent as logical
edit records in the commit blocks of the physical updates.

The complexity you noted above is due to making sure that the on-disk
image of a block is always what the logical edit record expects it
to be at the time the logical edit is replayed.


Yes, allocate-in-free is usually nasty.  I need to ensure that there is
always a reserve of blocks available on disk that is more than the
maximum possible transaction that can be accepted.  This simple idea
can get really complex, I know.  One nice trick to simplify things a
little is to have a really generous limit on the maximum number of
dirty blocks that are allowed, when the filesystem has lots of free
space, and shrink that limit progressively as free space approaches
zero.

I now see what you were driving at with your point about interlocking
namespace transactions, etc.  While each VFS transaction can indeed be
committed on its own, atomically and independently, to do so would be
death for throughput.  So it is necessary to batch up a lot of these
transactions, and be mindful of the interdependencies between the VFS
transactions and the underlying data structures.  The rule is,
underlying data changes required by any given VFS transaction must
never lie in more than one commit.  This can be accomplished without
actually tracking the physical representation interdependencies between
VFS transactions.  Instead, just count the dirty cache blocks and send
out an atomic commit for the entire set of transactions when some
threshold is passed.  Now the challenge is to figure out how to avoid
stalling during the commit.

It is thanks to you, your searching questions and the example of Hammer
that I was forced to understand these issues clearly. :-)

Note!  In Linux, VFS means "Virtual Filesystem Switch", that is, the
security, synchronization and abstraction layer that exists above
filesystems in Linux.  I think it means "Virtual File System" to you,
which is just "filesystem" to Linux hackers.

Regards,

Daniel

_______________________________________________
Tux3 mailing list
Tux3@tux3.org
http://tux3.org/cgi-bin/mailman/listinfo/tux3
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
Re: [Tux3] Two kinds of atomic commit, Matthew Dillon, (Mon Jul 28, 9:58 am)
Re: [Tux3] Two kinds of atomic commit, Daniel Phillips, (Mon Jul 28, 12:52 pm)
[Tux3] cleanup and small fix, and sparse warning fix, OGAWA Hirofumi, (Thu Nov 13, 5:06 am)