Linux: Keeping separate BitKeeper repositories synced

Submitted by nimrod
on July 24, 2002 - 12:21am

Andreas Schuldei asked in lkml if there was an easy a way for BitKeeper to backport stuff from the 2.5 to the 2.4 repositories. Like so many discussions, this rapidly veered off; from keeping separate BK repositories in sync, to the limitations of BitKeeper.

Much of the resulting discussion follows.


From: Andreas Schuldei
To: linux-kernel
Subject: using bitkeeper to backport subsystems?
Date: 2002-07-21 23:34:10

I want to use/track the linuxconsole project (especially for its
Multi-desktop operation), which tracks 2.5 on the stable tree
2.4.

is bitkeeper the easiest way to go? i imagine the patch sets to
be like transformations, which can be superimposed, so i would
clone marcellos and linus tree, generate a linuxconsole patchset
against linus tree and backport it to marcellos tree. (there are
older backports, which should make my live easier.)

I imagine that i had two 'transforms' now: first the linuxconsole
transform, which changes over time as the project (and the
kernel) moves on, and the backport transform, which i hope to
remain more static. Can i superimpose these transforms? Is this
how it works?

has anyone done this before? is there a howto or could someone
outline the bitkeeper steps needed? Any catches?


From: Val Henson
To: linux-kernel
Subject: Re: using bitkeeper to backport subsystems?
Date: 2002-07-22 7:15:10

On Mon, Jul 22, 2002 at 01:34:10AM +0200, Andreas Schuldei wrote:
> I want to use/track the linuxconsole project (especially for its
> Multi-desktop operation), which tracks 2.5 on the stable tree
> 2.4.
>
> is bitkeeper the easiest way to go? i imagine the patch sets to
> be like transformations, which can be superimposed, so i would
> clone marcellos and linus tree, generate a linuxconsole patchset
> against linus tree and backport it to marcellos tree. (there are
> older backports, which should make my live easier.)
>
> I imagine that i had two 'transforms' now: first the linuxconsole
> transform, which changes over time as the project (and the
> kernel) moves on, and the backport transform, which i hope to
> remain more static. Can i superimpose these transforms? Is this
> how it works?
>
> has anyone done this before? is there a howto or could someone
> outline the bitkeeper steps needed? Any catches?

Sigh. I hate this question: "How will BitKeeper make it easier to
port something between 2.4 and 2.5?" Answer: "Bk won't help - at
least not as much as it would help if 2.5 had been cloned from 2.4."

As far as bk is concerned, 2.4 and 2.5 are two completely unrelated
repositories, so you can't push or pull changes between them. You can
still use bk to export and import patches, and to help you understand
what a change was attempting to do, so it's not completely useless.

If I were you, I would:

1. Grab the linux-2.4, linux-2.5, and linuxconsole trees.
2. Use "bk changes -L " to get a list of
all the changes in the linuxconsole tree but not in the mainline.
3. Export those changes as a GNU patch, something like:

    for i in `bk changes -L ../linux-2.5 -k | bk key2rev ChangeSet`; do
    bk export -tpatch -r$i >> ../console_patches;
    done

    Note: This won't collapse overlapping patches. There is probably a
    smarter way to do this.

4. Attempt to apply that patch to the linux-2.4 tree:

    cd ../linux-2.4
    bk import -tpatch ../console_patches

5. Clean up the resulting mess. I suggest bringing up revtool in the
linuxconsole tree and reading the comments and generally browsing
the related changesets for each file in order to figure out what
rejected bits of patches were supposed to do.

For documentation, try the following:

Jeff Garzik's BK Kernel Hacking HOWTO:

http://www.uwsg.indiana.edu/hypermail/linux/kernel/0202.2/1060.html

[Warning: Blatant personal plug] BitKeeper for Kernel Developers:

http://www.nmt.edu/~val/ols/bk.ps.gz

I also highly recommend the Bitkeeper test drive even for people who
have been using bk for a while:

http://www.bitkeeper.com/Test.html

After using bk for over a year, I still learned something new (and
very useful) when I took the test drive.

-VAL


From: Christoph Hellwig
To: linux-kernel
Subject: Re: using bitkeeper to backport subsystems?
Date: 2002-07-22 8:29:30

On Mon, Jul 22, 2002 at 01:15:10AM -0600, Val Henson wrote:
> Sigh. I hate this question: "How will BitKeeper make it easier to
> port something between 2.4 and 2.5?" Answer: "Bk won't help - at
> least not as much as it would help if 2.5 had been cloned from 2.4."

2.5 _is_ cloned from 2.4..


From: Val Henson
To: linux-kernel
Subject: Re: using bitkeeper to backport subsystems?
Date: 2002-07-22 17:52:31

On Mon, Jul 22, 2002 at 10:29:30AM +0200, Christoph Hellwig wrote:
> 2.5 _is_ cloned from 2.4..

Really? Cool, I wonder where I got the idea it wasn't...

Even so, I can't figure out how to backport from 2.5 to 2.4 without
using patches (but Larry's smarter than I am, he might know how).
Cherry picking would only solve part of the problem, the independent
creation of what is logically the same file is a bigger problem.
Instead of making "stable" changes to the 2.4 tree and pulling them
into the dev tree, they've been independently applied to both trees.
Development on 2.4 and 2.5 would have to be more coordinated with each
other than it is right now to really take advantage of the ability to
push/pull between 2.4 and 2.5.


From: Andreas Schuldei
To: linux-kernel
Subject: Re: using bitkeeper to backport subsystems?
Date: 2002-07-22 10:27:05

* Christoph Hellwig (hch@lst.de) [020722 10:29]:
> 2.5 _is_ cloned from 2.4..

can one make use of that somehow?


From: Christoph Hellwig
To: linux-kernel
Subject: Re: using bitkeeper to backport subsystems?
Date: 2002-07-22 10:29:05

On Mon, Jul 22, 2002 at 12:27:05PM +0200, Andreas Schuldei wrote:
> * Christoph Hellwig (hch@lst.de) [020722 10:29]:
> > 2.5 _is_ cloned from 2.4..
> can one make use of that somehow?

/me ain't no bk guru.

but I'd be interested in that, too.


From: Tom Rini
To: linux-kernel
Subject: Re: using bitkeeper to backport subsystems?
Date: 2002-07-22 15:20:31

On Mon, Jul 22, 2002 at 12:27:05PM +0200, Andreas Schuldei wrote:
> * Christoph Hellwig (hch@lst.de) [020722 10:29]:
> > 2.5 _is_ cloned from 2.4..
>
> can one make use of that somehow?

Possibly, once bitkeeper allowes ChangeSets to only depend on what they
actually need, not every previous ChangeSet in the repository. IIRC,
this was one of the things Linus asked for, so hopefully it will happen.


From: Larry McVoy
To: linux-kernel
Subject: Re: using bitkeeper to backport subsystems?
Date: 2002-07-22 15:25:52

On Mon, Jul 22, 2002 at 12:29:05PM +0200, Christoph Hellwig wrote:
> On Mon, Jul 22, 2002 at 12:27:05PM +0200, Andreas Schuldei wrote:
> > * Christoph Hellwig (hch@lst.de) [020722 10:29]:
> > > 2.5 _is_ cloned from 2.4..
> >
> > can one make use of that somehow?
>
> /me ain't no bk guru.
>
> but I'd be interested in that, too.

I'll try and write up how to do the backport thing later today (after
I have some coffee) but I wanted to answer this one.

In theory, the fact that the 2.4 and 2.5 trees are clones of each other
means that you could just do a bk pull of the 2.5 tree into the 2.4 tree
and you'd be all set. In practice, it's not going to work very well;
the problem is that that a lot of files, the same files, were added to
both the 2.4 and the 2.5 tree. As far as BK is concerned, these are
different files, they have different "inode numbers". Today, when you
do the pull, you'll be forced to move one of the files out of the way,
typically deleting it and using the other one. That's not what you want,
you really want the two "inodes" to be merged into one in such a way that
synchronizing with either a 2.4 or a 2.5 tree would take any updates to
either inode and apply them to the merged inode.

Unless BK is taught to handle that case, I think a 2.4 / 2.5 merge
using BK is hopeless, I tried it about a month after the trees
split and there were piles of file conflicts.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm


From: Roger Gammans
To: linux-kernel
Subject: Re: using bitkeeper to backport subsystems?
Date: 2002-07-22 22:29:41

On Mon, Jul 22, 2002 at 08:20:31AM -0700, Tom Rini wrote:
> Possibly, once bitkeeper allowes ChangeSets to only depend on what they
> actually need, not every previous ChangeSet in the repository. IIRC,
> this was one of the things Linus asked for, so hopefully it will happen.

While that would be great.

With all due respect to Larry and the bk team, I think you'll
find determining 'needed changesets' in this case is a _hard_ problem.

How is bk supposed to find that a change depends on a previously
redefined api declared in a set of files othwerwise untouched by the
changeset being exported.

Now , bk could make this a little easier by allowing changesets to
be exported without any dependencies (ala GNU-patch export - but
with metadata for commit messages).

The developer can then use a 'bk undo' to remove the unnessary changeset
for his patch , reapply keeping the commit metadata, test and now
re-export a full bk patch with minimal dependencies.

Unfortuantely I know know way of currently instructing bk to
do this dependency-less export.

--
TTFN
Roger.
Master of Peng Shui. (Ancient oriental art of Penguin Arranging)


From: Larry McVoy
To: linux-kernel
Subject: Re: using bitkeeper to backport subsystems?
Date: 2002-07-22 22:44:43

On Mon, Jul 22, 2002 at 11:29:41PM +0100, Roger Gammans wrote:
> With all due respect to Larry and the bk team, I think you'll
> find determining 'needed changesets' in this case is a _hard_ problem.

Thanks, we agree completely. It's actually an impossible problem
for a program since it requires semantic knowledge of the content
under revision control. And even then the program can get it wrong
(think about a change which shortens the depth of the stack followed by
a change that won't work with the old stack depth, now you export that
to the other tree and it breaks yet it worked in the first tree).

> Now , bk could make this a little easier by allowing changesets to
> be exported without any dependencies (ala GNU-patch export - but
> with metadata for commit messages).

That's trivial to do, we already have a 'bk export -tpatch -r' which
does the patch part. Combine that with 'bk changes -vr' and you have
what you are talking about on the sending side. On the receiving side
we have 'bk import -tpatch' and 'bk comments' which do the other half.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm


From: Eric W. Biederman
To: linux-kernel
Subject: Re: using bitkeeper to backport subsystems?
Date: 2002-07-23 18:38:54

Larry McVoy writes:

> Thanks, we agree completely. It's actually an impossible problem
> for a program since it requires semantic knowledge of the content
> under revision control. And even then the program can get it wrong
> (think about a change which shortens the depth of the stack followed by
> a change that won't work with the old stack depth, now you export that
> to the other tree and it breaks yet it worked in the first tree).

Perfection is impossible. However there is a lot of independent code
in the linux kernel. It has to be that way or maintenance would quickly
become impossible.

The last time this was suggested, the idea was to look how far back into
the repository (up to a given limit) a current changeset could apply, with all
of it's current dependencies.

But beyond that I suspect it would be easier to declare lack of dependencies.

drivers/net and drivers/ide are completely separate subtrees. At
least not until you get ATA over ip. And even then the dependencies
is with the ip layer.

Maybe independence should be shown by putting each independent chunk
into it's own repository. And then building a working kernel tree
would just be a matter of checking out all of the parallel
repositories, into the appropriate location. Then the global tree
can just remember which version of all of the subtrees it was
tested with last.

Given that a fully independent program is likely to break because of
a buggy libc (which I have no business depending upon the exact
version), I think the insistence on global dependencies is just plain
silly, you can never find the entire set of dependencies.

So Larry please cope with the fact that perfect dependency modeling is
impossible, and setup a method that works in the real world. Or do
you have a way to model that my code only works on a magic test
machine, that magically catches a page fault, and does the right
thing, while all other machines page fault reliably?

Eric


From: Larry McVoy
To: linux-kernel
Subject: Re: using bitkeeper to backport subsystems?
Date: 2002-07-23 22:46:48

On Tue, Jul 23, 2002 at 12:38:54PM -0600, Eric W. Biederman wrote:
> The last time this was suggested, the idea was to look how far back into
> the repository (up to a given limit) a current changeset could apply, with all
> of it's current dependencies.
>
> But beyond that I suspect it would be easier to declare lack of dependencies.

[I'm going to argue with you up here, mostly to just explanations of how/why
BK works, but you can skip down to the next section and you'll see we have
a fair amount of agreement]

All of what you are saying makes perfect sense in a centralized system
like CVS, Perforce, Subversion, whatever. The reason it makes sense is
that there is exactly one copy of the truth, you can manipulate it in
the one location which has it, and that's that.

Sort of sounds like all those tools are better than BitKeeper, given
that description, right? Because BitKeeper is distributed, there is no
one place that you can do anything and force it upon everyone else.
You can only do things to your local revision history in a recorded
way and propogate that to anyone else that wants it.

Again, sounds like the distributed nature of BK is causing all sorts
of problems, so why not just toss it, centralized systems manage 99.9%
or more of the world's source, so they must be good enough. Maybe not.
It is the peer to peer nature of BK which allows all sorts of things to
work, from mundane stuff like performance (you operate against a local
copy of the history) to more complex things like work flow (it's trivial
to mimic Rational's unified change management system with a series of
repositories) to practical things like working both at home and at
work and not losing data.

Whether you agree or disagree with the value of the distributed nature
of BitKeeper, that's a basic part of how it works and it can be tweaked
but not thrown out. Consider it a "limitation" of the BitKeeper design.

OK, so now think about what you are asking. You want to move
changesets around out of order. Please explain to me how you are
going to synchronize two trees when you've done that. Right now,
we can use the fact that there is a strong ordering to do fast and
lightweight synchronization. Do an strace of a pull from bkbits.net when
there is nothing to pull and count the bytes that go across the wire.
It's tiny, probably about 5-6KB. Now do the same thing with CVS, the
amount of data is proportional to the number of files in the tree, i.e.,
dramatically more.

The reason we can do what we do is that a changeset actually implies the
existence of all the changesets which came before it. As soon as we do
the out of order stuff, we can no longer depend on that. The openlogging
kernel tree has 12,000 changesets in it. If I can't depend on ordering,
do you want me to compare all 12,000 to see if I need to update anything?
Or should I start doing the file by file comparison that CVS does?
No thanks, that sucks, we can quantify exactly how much it sucks and it
is too much.

I'm not saying "no, we won't fix it", I'm saying "understand why it is
the way it is and then suggest a fix". In other words, don't throw
the baby out with the bath water.

> drivers/net and drivers/ide are completely separate subtrees. At
> least not until you get ATA over ip. And even then the dependencies
> is with the ip layer.
>
> Maybe independence should be shown by putting each independent chunk
> into it's own repository. And then building a working kernel tree
> would just be a matter of checking out all of the parallel
> repositories, into the appropriate location. Then the global tree
> can just remember which version of all of the subtrees it was
> tested with last.

Whoohoo! Agree completely, and we're building this, we call them nested
repositories and they work pretty much exactly as you describe. However,
even there we do get into problems. Here's how: suppose that you have a
nested repo for include/ppc and another for arch/ppc. You make a change
in both and you commit a sort of "super changeset" which binds those
changes together because one won't work without the other. Now you go
to pull the include/ppc directory for some reason and it will force you
to pull the arch/ppc directory. So the dependencies are reduced but
can still creep across the boundaries. Not doing so isn't an option
because we can all agree we have to have some way to say "these changes
which span these subrepositories must move as a unit".

> Given that a fully independent program is likely to break because of
> a buggy libc (which I have no business depending upon the exact
> version), I think the insistence on global dependencies is just plain
> silly, you can never find the entire set of dependencies.

Agreed. And it's unlikely anyone would take explicit actions to bind
their app to libc unless for some reason they really did need at least
glibc2.3 for some (probably bad, IMHO) reason. So you'd be OK there.

By the way, we have customers who maintain (large) embedded Linux
distributions in BitKeeper with literally hundreds of unrelated
repositories, i.e., one for the kernel, one for gcc, one for make,
etc. They are the motivation behind the nested stuff so they don't
have to do scripts which do things like

    for i in `cat list_of_repos`
    do cd ~/ws/$i && bk pull
    done

> So Larry please cope with the fact that perfect dependency modeling is
> impossible, and setup a method that works in the real world.

It's high on our list and we are working on it. I get a little touchy
about it because some of what people say they want just won't work in
a distributed system, but as you suggested, there is more than one way
to do it and the nested stuff is a good start.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

What is really going on...

cptnoskill
on
July 24, 2002 - 10:15am

It seems some people are spending more time arguing then actually coding... :)

Why doesn't everyone just move on and get some work done..
One of the benefits of OS is that you can use it the way you see fit. If BK isn't working for you, why not use something else?

just MHO

What is really going on...

Anonymous
on
July 26, 2002 - 9:29am

I'm going to have to agree with you here...I've been reading alot of stuff on the lkml lately and its amazing how such intelligent people could act like 3rd graders (I might even be giving them too much credit there).

It's just one huge ego after another. I especially see this when one coder says something that isn't quite clear, or is wrong, there is at least one jackass to take shots at him with "if you would have read the code", or "if you were able to understand the code", or one of the best, "if you had as much kernel experience as I've had!"..list goes on.

The part that confuses me..why can't someone politely explain it to them, without making the person look like a moron, its impolite to take shots at people. If I were such an experienced coder, and someone did that to me, I would most likely not waste anymore time on helping out. When stuff like that happens, it takes the fun out of it, and isn't that what linux was supposed to be all about? (right linus?)

Ah, well. Maybe it's a perfect world I want to see? No, just kernel developers working more towards being a team then at eachothers throats.

Grow up guys..

In other news ....

gncuster
on
July 24, 2002 - 12:44pm

SubVersion has gone alpha. Maybe one day BKbits will need to open up the code.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.