Re: git guidance

Previous thread: Relax permissions for reading hard drive serial number? by Dan Kegel on Thursday, November 29, 2007 - 8:13 am. (7 messages)

Next thread: [PATCH] Remove one useless extern declaration by Pierre Peiffer on Thursday, November 29, 2007 - 9:04 am. (1 message)
From: Jing Xue
Date: Thursday, November 29, 2007 - 8:52 am

But how is that supposed to work?  What happens when you make some
changes to a file and save it?  Do you want the "git file system" to
commit it right aways or wait until you to issue a "commit" command?
The first behavior would obviously be wrong, and the second would make
the "file system" not operationally transparent anyways. Right?

By the way, the only SCM I have worked with that tries to mount its
repository (or a view on top of it) as a file system is ClearCase with
its dynamic views. And, between the buggy file system implementation,
the intrusion on workflow, and the lack of scalability, at least in
the organization I worked for, it turned out to be a horrible,
horrible, horrible idea.

Cheers.
-- 
Jing Xue


-

From: Al Boldi
Date: Friday, November 30, 2007 - 11:50 pm

Not sure what you mean by operationally transparent?  It would be transparent 
for the updating client,  and the rest of the git-users would need to wait 
for the commit from the updating client; which is ok, as this transparency 
is not meant to change the server-side git-update semantic.


Judging an idea, based on a flawed implementation, doesn't prove that the 
idea itself is flawed.


You could probably do that, or you could instead use cp -al.  Both would 

Sure, you wouldn't want to change the git-engine update semantics, as that 
sits on the server and handles all users.  But what the git model is 
currently missing is a client manager.  Right now, this is being worked 
around by replicating the git tree on the client, which still doesn't 
provide the required transparency.

IOW, git currently only implements the server-side use-case, but fails to 
deliver on the client-side.  By introducing a git-client manager that 
handles the transparency needs of a single user, it should be possible to 
clearly isolate update semantics for both the client and the server, each 
handling their specific use-case.


Thanks!

--
Al

--

From: Phillip Susi
Date: Tuesday, December 4, 2007 - 3:21 pm

It isn't the implementation that is flawed, it is the idea.  The entire 
point of a change control system is that you explicitly define change 
sets and add comments to the set.  The filesystem was designed to allow 
changes to be made willy-nilly.  If your goal is to perform change 
control only with filesystem semantics, then you have a non starter as 
their goals are opposing.  Requiring an explicit command command is 
hardly burdensome, and otherwise, a git tree is perfectly transparent to 

It isn't missing a client manager, it was explicitly designed to not 
have one, at least not as a distinct entity from a server, because it 
does not use a client/server architecture.  This is very much by design, 
not a work around.

What transparency are you requiring here?  You can transparently read 
your git tree with all non git aware tools, what other meaning of 

Any talk of client or server makes no sense since git does not use a 
client/server model.  If you wish to use a centralized repository, then 
git can be set up to transparently push/pull to/from said repository if 
you wish via hooks or cron jobs.

--

From: Al Boldi
Date: Friday, December 7, 2007 - 10:35 am

Whether git uses the client/server model or not does not matter; what matters 
is that there are two distinct use-cases at work here:  one on the 

Again, this only handles the interface to/from the server/repository, but 
once you pulled the sources, it leaves you without Version Control on the 
client.

By pulling the sources into a git-client manager mounted on some dir, it 
should be possible to let the developer work naturally/transparently in a 
readable/writeable manner, and only require his input when reverting locally 
or committing to the server/repository.


Thanks!

--
Al

--

From: Andreas Ericsson
Date: Thursday, December 6, 2007 - 11:24 am

Git is distributed. The repository is everywhere. No server is actually needed.

No, that's CVS, SVN and other centralized scm's. With git you have perfect
version control on each peer. That's the entire idea behind "fully

How is that different from what every SCM, including git, is doing today? The
user needs to tell the scm when it's time to take a snapshot of the current
state. Git is distributed though, so committing is usually not the same as
publishing. Is that lack of a single command to commit and publish what's
nagging you? If it's not, I completely fail to see what you're getting at,
unless you've only ever looked at repositories without a worktree attached,
or you think that git should work like an editor's "undo" functionality,
which would be quite insane.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
--

From: Al Boldi
Date: Friday, December 7, 2007 - 11:55 am

When you read server, don't read it as localized; a server can be 
distributed.  What distinguishes a server from an engine is that it has to 
handle a multi-user use-case.  How that is implemented, locally or remotely 

As explained before in this thread, replicating the git tree on the client 

You need to re-read the thread.


Thanks!

--
Al

--

From: Johannes Schindelin
Date: Thursday, December 6, 2007 - 1:22 pm

Hi,


I don't know why you write that, and then say thanks.  Clearly, what you 
wrote originally, and what Andreas pointed out, were quite obvious 
indicators that git already does what you suggest.

You _do_ work "transparently" (whatever you understand by that overused 
term) in the working directory, unimpeded by git.

And whenever it is time to revert or commit, you cry for help, invoking 
git.

So either you succeeded in making yourself misunderstood, or Andreas had 
quite the obvious and correct comment for you.

Not that diffcult,
Dscho

--

From: Al Boldi
Date: Thursday, December 6, 2007 - 9:37 pm

If you go back in the thread, you may find a link to a gitfs client that 
somebody kindly posted.  This client pretty much defines the transparency 
I'm talking about.  The only problem is that it's read-only.

To make it really useful, it has to support versioning locally, disconnected 
from the server repository.  One way to implement this, could be by 
committing every update unconditionally to an on-the-fly created git 
repository private to the gitfs client.

With this transparently created private scratch repository it should then be 
possible for the same gitfs to re-expose the locally created commits, all 
without any direct user-intervention.

Later, this same scratch repository could then be managed by the normal 
git-management tools/commands to ultimately update the backend git 
repositories.

BTW:  Sorry for my previous posts that contained the wrong date; it seems 
that hibernation sometimes advances the date by a full 24h.  Has anybody 
noticed this as well?


Thanks!

--
Al

--

From: Andreas Ericsson
Date: Friday, December 7, 2007 - 1:40 am

Earlier you said that you need to be able to tell git when you want to make
a commit, which means pretty much any old filesystem could serve as gitfs.
Now you're saying you want every single update to be committed, which would
make it mimic an editor's undo functionality. I still don't get what it is

That's exactly what's happening today. I imagine whoever wrote the gitfs
thing did so to facilitate testing, or as some form of intellectual
masturbation.


So, to get to the bottom of this, which of the following workflows is it you
want git to support?

### WORKFLOW A ###
edit, edit, edit
edit, edit, edit
edit, edit, edit
Oops I made a mistake and need to hop back to "current - 12".
edit, edit, edit
edit, edit, edit
publish everything, similar to just tarring up your workdir and sending out
### END WORKFLOW A ###

### WORKFLOW B ###
edit, edit, edit
ok this looks good, I want to save a checkpoint here
edit, edit, edit
looks good again. next checkpoint
edit, edit, edit
oh crap, back to checkpoint 2
edit, edit, edit
ooh, that's better. save a checkpoint and publish those checkpoints
### END WORKFLOW B ###

If you could just answer that question and stop writing "transparent" or
any synonym thereof six times in each email, we can possibly help you.

As it stands now though, nobody is very interested because you haven't
explained how you want this "transparency" of yours to work in an every
day scenario.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
--

From: Al Boldi
Date: Friday, December 7, 2007 - 3:53 am

### WORKFLOW C ###
for every save on a gitfs mounted dir, do an implied checkpoint, commit, or 
publish (should be adjustable), on its privately created on-the-fly 
repository.
### END WORKFLOW C ###

For example:

  echo "// last comment on this file" >> /gitfs.mounted/file

should do an implied checkpoint, and make these checkpoints immediately 
visible under some checkpoint branch of the gitfs mounted dir.

Note, this way the developer gets version control without even noticing, and 
works completely transparent to any kind of application.


Thanks!

--
Al

--

From: Jakub Narebski
Date: Friday, December 7, 2007 - 4:47 am

It looks like it is WORKFLOW A (with the fact that each ',' is file

Why not use versioning filesystem for that, for example ext3cow
(which looks suprisingly git-like, when you take into account that
for ext3cow history is linear and centralized, so one can use date
or sequential number to name commits).

See GitLinks page on Git Wiki, "Other links" section:
  http://www.ext3cow.com/

Version control system is all about WORKFLOW B, where programmer
controls when it is time to commit (and in private repository he/she
can then rewrite history to arrive at "Perfect patch series"[*1*]);
something that for example CVS failed at, requiring programmer to do
a merge if upstream has any changes when trying to commit.

[*1*] I have lost link to post at LKML about rewriting history to
      arrive at perfect patch _series_. IIRC I have found it first
      time on this mailing list. I would be grateful for sending this
      link if you have it. TIA.

-- 
Jakub Narebski
ShadeHawk on #git
--

From: Al Boldi
Date: Friday, December 7, 2007 - 12:04 pm

Sure, Linus mentioned the cow idea before in this thread, but you would still 

Because WORKFLOW C is transparent, it won't affect other workflows.  So you 
could still use your normal WORKFLOW B in addition to WORKFLOW C, gaining an 
additional level of version control detail at no extra cost other than the 
git-engine scratch repository overhead.

BTW, is git efficient enough to handle WORKFLOW C?


Thanks!

--
Al

--

From: Valdis.Kletnieks
Date: Friday, December 7, 2007 - 12:36 pm

Imagine the number of commits a 'make clean; make' will do in a kernel tree, as
it commits all those .o files... :)

From: Luke Lu
Date: Friday, December 7, 2007 - 3:07 pm

My guess is that Al is not really a developer (product management/ 
marketing?), what he has in mind is probably not an SCM but a backup  
system a la Mac's time machine or Netapp's snapshots that also  
support disconnected commits. I think that git could be a suitable  
engine for such systems, after a few tweaks to avoid compressing  
already compressed blobs like jpeg, mp3 and mpeg etc.

__Luke
--

From: Al Boldi
Date: Friday, December 7, 2007 - 9:56 pm

.o files???

It probably goes without saying, that gitfs should have some basic 
configuration file to setup its transparent behaviour, and which would most 
probably contain an include / exclude file-filter mask, and probably other 
basic configuration options.  But this is really secondary to the 
implementation, and the question remains whether git is efficient enough.

IOW, how big is the git commit overhead as compared to a normal copy?


Thanks!

--
Al

--

From: Valdis.Kletnieks
Date: Friday, December 7, 2007 - 10:16 pm

But then it's not *truly* transparent, is it?

And that leaves another question - if you make a config file that excludes
all the .o files - then what's backing the .o files?  Those data blocks need
to be *someplace*.  Maybe you can do something ugly like use unionfs to
combine your gitfs with something else to store the other files...

But at that point, you're probably better off just creating a properly
designed versioning filesystem.
From: Al Boldi
Date: Saturday, December 8, 2007 - 3:41 am

Don't mistake transparency with some form of auto-heuristic.  Transparency 
only means that it inserts functionality without impeding your normal 


But gitfs is not about designing a versioning filesystem, it's about 
designing a transparent interface into git to handle an SCM use-case.


Thanks!

--
Al

--

From: Johannes Schindelin
Date: Saturday, December 8, 2007 - 4:13 am

Hi,


The question is not if git is efficient enough to handle workflow C, but 
if that worflow is efficient enough to help anybody.

Guess what takes me the longest time when committing?  The commit message.  
But it is really helpful, so there is a _point_ in writing one, and there 
is a _point_ in committing when I do it: it is a point in time where I 
expect the tree to be in a good shape, to be compilable, and to solve a 
specific problem which I describe in the commit message.

So I absolutely hate this "transparency".  Git _is_ transparent; it does 
not affect any of my other tools; they still work very well 
thankyouverymuch.

What your version of "transparency" would do: destroy bisectability, make 
an absolute gibberish of the history, and more!

Nobody could read the output of "git log" and form an understanding what 
was done.  Nobody could read the commit message for a certain "git blame"d 
line that she tries to make sense of.

IOW you would revert the whole meaning of the term Source Code Management.

Hth,
Dscho

--

From: Andreas Ericsson
Date: Friday, December 7, 2007 - 5:30 am

So you *do* want an editor's undo function, but for an entire filesystem.
That's a handy thing to have every now and then, but it's not what git

One other thing that's fairly important to note is that this can never
ever handle changesets, since each write() of each file will be a commit
on its own. It's so far from what git does that I think you'd be better
off just implementing it from scratch, or looking at a versioned fs, like
Jakub suggested in his reply.

You're also neglecting one very important aspect of what an SCM provides
if you go down this road, namely project history. You basically have two
choices with this "implicit save on each edit":
* force the user to supply a commit message for each and every edit
* ignore commit messages altogether

Obviously, forcing a commit message each time is the only way to get some
sort of proper history to look at after it's done, but it's also such an
appalling nuisance that I doubt *anyone* will actually like that, and since
changesets aren't supported, you'll have "implement xniz api, commit 1 of X"
messages. Cumbersome, stupid, and not very useful.

Ignoring commit messages altogether means you ignore the entire history,
and the SCM then becomes a filesystem-wide "undo" cache. This could
ofcourse work, but it's something akin to building a nuclear powerplant
to power a single lightbulb.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
--

From: david
Date: Friday, December 7, 2007 - 2:17 pm

so if you have a script that does

echo "mail header" >tmpfile
echo "subject: >>tmpfile
echo >>tmpfile
echo "body" >>tmpfile

you want to have four seperate commits

what if you have a perl script

open outfile ">tmpfile";
print outfile "mail header\n";
print outfile "subject:\n\n";
print outfile "body\n";
close ourfile;

how many seperate commits do you think should take place?

what if $|=1 (unbuffered output, so that each print statement becomes 
visable to other programs immediatly)?

what if the file is changed via mmap? should each byte/word written to 
memory be a commit? or when the mmap is closed? or when the kernel happens 
to flush the page to disk?

'recording every change to a filesystem' is a very incomplete definition 
of a goal.

David Lang
--

From: Björn
Date: Friday, December 7, 2007 - 3:00 pm

Ouch... That looks worse than "plain" per-file versioning. Not only do
you per definition get "broken" commits if there's a change that affects
two dependent files, you also get an insane amount of commits just for
testing stuff, or fixing bugs.

And unless you use some kind of union-fs on top (or keep ignored files
in special unversioned area in your gitfs, which seems somewhat ugly),
you'll probably also have to track lots of files in the working
directory that are generated, unless you want to re-generate them after
each reboot. And that leads to even more absolutely useless revisions.

Just thinking of my vim .swp files (which I definitely don't want to
loose on a crash/power outtage/pkill -9 .<ENTER> dammit) makes me scream
because of the gazillion of commits they will produce (and no, I don't
want them in some special out of tree directory).

Plus, I have vim setup to _replace_ files on write, so that I can more
easily use hard-linked copies with changing all copies at once _unless_
I explicitly want to, meaning that I'd get full remove/add commits,
which are absolutely useless. And trying to detect such patterns
(rename, then write the changed file with the old name and then delete
the renamed file) is probably not worth the trouble, because you
coincidently might _want_ to have just these three steps recorded when
you happen to perform them manually. And if you go for heuristics,
you'll complain each time you get a false-positive/negative.


That said, out of pure curiousness I came up with the attached script
which just uses inotifywait to watch a directory and issue git commands
on certain events. It is extremely stupid, but seems to work. And at
least it hasn't got the drawbacks of a real gitfs regarding the need to
have a "separate" non-versioned storage area for the working directory,
because it simply uses the existing working directory wherever that
might be stored. It doesn't use GIT_DIR/WORK_DIR yet, but hey, should be
easy to add...

Feel free to mess ...
From: Phillip Susi
Date: Thursday, December 6, 2007 - 2:46 pm

It has been pointed out to you that it DOES.  Either that or nobody else 
understands your nebulous use of "transparency" so maybe you should 
define it like we've been asking you.  Furthermore, the comment you 
replied to said nothing about transparency, nor did your comment it was 
in reply to; rather it was pointing out the fact that your statement 
that the git can not perform version control on the client is patently 

Perhaps you should.  We have been trying to get you to explain how you 
think git isn't "transparent" while at the same time pointing out how we 
think it is.  You have failed to demonstrate any evidence to back up 
your claims, all of which have been shown to be false.


--

From: Martin Langhoff
Date: Friday, December 7, 2007 - 11:33 pm

I guess what he means is that when your write to the file -- from your
editor -- it can't be considered a commit. During an editing session
you might write a dozen times, only to commit it once you are happy

If you want a dumb-ish client CVS-style, you can try git-cvsserver.
But the git model is definitely superior -- "replicating the tree on
the client" is not a workaround but a central strategy.

Have you used git and other DSCMs much? From your writing, it sounds
like you may have misunderstood how some of the principles of git work
out in practice.

cheers,


m
--

Previous thread: Relax permissions for reading hard drive serial number? by Dan Kegel on Thursday, November 29, 2007 - 8:13 am. (7 messages)

Next thread: [PATCH] Remove one useless extern declaration by Pierre Peiffer on Thursday, November 29, 2007 - 9:04 am. (1 message)