Linux: Continued Case For The O(1) Scheduler

Submitted by Jeremy
on July 4, 2002 - 8:28am

Discussion about backporting the O(1) Scheduler to the 2.4 stable kernel [earlier story] continues on the lkml. Ingo Molnar, the scheduler's author, maintains that much more testing is required:

"it might be a candidate for inclusion once it has _proven_ stability and robustness (in terms of tester and developer exposion), on the same order of magnitude as the 2.4 kernel - but that needs time and exposure in trees like the -ac tree and vendor trees. It might not happen at all, during the lifetime of 2.4."

Joe Sloan was among those who countered Ingo's comments:

"Ingo, it's apparent you are refraining from pushing this O(1) scheduler - that's admirable, but don't swing too far in the other direction. The fact is, it's working well in 2.5, it's working well in the 2.4-ac tree, it's working well in the 2.4-aa tree, and Red Hat has been shipping it."

Personally, I've been using and enjoying the O(1) scheduler on my desktop server since January when it became compatible with preemptible kernel patch [earlier story]. The following discussion makes for an interesting read, with good points made on both sides.


From: Rob Landley
Subject: Re: [OKS] O(1) scheduler in 2.4
Date: 	Tue, 2 Jul 2002 21:11:31 -0400

On Monday 01 July 2002 10:48 pm, Tom Rini wrote:
> I assume you mean 2.4.60 here, and no, I don't think O1 scheduler should
> go into 2.4 ever.  We're aiming for a _stable_ series here.  Let me

Ah, monday morning virtue, overcompensating for 2.4.10.  It's the hangover 
speaking...

"We upgrade our kernel on a production machine without testing it first, and 
we get mad if anything actually CHANGED.  We want that upgrade to be a NOP, 
darn it!  We want it to be as if we never did it in the first place, that's 
why we do it..."

If you want stone tablet stability, why the heck are you upgrading your 
kernel?  Downloading the new version off of kernel.org generally means you're 
mucking about with a working box, making changes that are not 100% required.  
If a security vulnerability comes out, you have the source and can patch the 
specific bug in your version.  (If you're not up to that, you're probably 
using a vendor kernel, which is a whole 'nother can of worms.)

If you install new hardware or software, and it going "boing" would be a bad 
thing, you try it on a scratch box first.  If you don't, you deserve what you 
get.

I'm under the impression 2.4.19 is introducing chunks of Andre Hedrick's new 
IDE code.  So it's ok to upgrade something that can, in case of a bug, eat 
your data silently in a way that journaling won't detect.  Why?  LBA-48 and 
ATA-133, of course.  But scheduling, which is SUPPOSED to be 
non-deterministic from day one and could theoretically be brain-dead round 
robin without affecting anything but performance...  That's not safe to 
upgrade.  Right.

If you have a race condition in your code that a new scheduler triggers, 
ANYTHING could trigger it.  2.4.18 behaves horribly under load, try md5sum on 
an iso image and then pull up an xterm and su to another user.  It can take 
30 seconds.  (Yeah, that's mostly IO starvation rather than the scheduler, 
but still, how is the new scheduler going to do WORSE than this?)

The argument here is basically "don't change anything".  It's not exactly a 
series then, is it?  If you want trailing edge, 2.0 is still being 
maintained, let alone 2.2.  Those have a great excuse for not accepting 
anything new beyond a really obvious bugfix.  2.4 does not, because 2.6 isn't 
out yet.  Backporting of somethings from 2.5 to 2.4 will occur until then, 
and O(1) is an obvious eventual candidate.

> stress that again, _stable_.  I'd hope that 2.4.60 is as slow in coming
> as 2.0.40 is.

So the fact that it's in Alan Cox's kernel (meaning Red Hat is shipping it in 
2.4.18-5.55, meaning that if more people aren't actually USING it yet than 
marcelo's 2.4, they will be soon), and andrea's kernel (meaning new VM 
development is being done with it in mind)...  It may not be "sufficiently 
tested" yet but it's GETTING a lot of testing.  You use anything EXCEPT a 
stock vanilla 2.4, you're probably getting O(1) at this point.

If the vendors are starting to ship the thing already, what is the DOWN side 
to integrating it?  The down side to NEVER integrating it is eventually fewer 
people using the kernel off of kernel.org.

Does this remind anybody else of the 0.90 software raid stuff?  At some point 
it makes more sense to keep the OLD one around as a patch for the 5% of the 
community that doesn't want to upgrade.  We're not there on the scheduler 
yet, but "should not happen" without a qualifier means "never"...

> > >c) I also suspect that it hasn't been as widley tested on !x86 as the
> > >stuff currently in 2.4.  And again, 2.4 is the stable tree.
> >
> > I know it is not a priority for 2.4, but say it wil never happen...
>
> I won't say it will never happen, just that I don't think it should.
> It's a rather invasive thing (and as Ingo said, it's just not getting
> stable).

Ingo's main objection was that the patch is only 6 months old, and that 2.4 
is only now stabilizing and that bug squeezing and smoothing should be given 
a little longer to ensure that people have the option of NOT upgrading, and 
that those upgrading want improvements rather than critical "this just 
doesn't work" fixes.  And that's a fine argument.

But 2.6 isn't going to be out this year.  It's not even having its first 
freeze until October.  Traditionally, we've been running a year and a half 
between stable releases (and another six months to actually get the new one 
battle-tested to where the distros and at least 50% of the production boxes 
upgrade.)  We've got a year to eighteen months left on that cycle.  Are the 
distros going to hold off adding it to 2.4 for a year to 18 months?

The real question is, how much MORE conservative than the distros should the 
mainline kernels be?

Rob

From: Ingo Molnar
Subject: Re: [OKS] O(1) scheduler in 2.4
Date: 	Wed, 3 Jul 2002 10:35:26 +0200 (CEST)

On Tue, 2 Jul 2002, Rob Landley wrote:

> If you want stone tablet stability, why the heck are you upgrading your
> kernel? [...]

to get security and stability fixes.

> The argument here is basically "don't change anything".  It's not
> exactly a series then, is it?  If you want trailing edge, 2.0 is still
> being maintained, let alone 2.2.  Those have a great excuse for not
> accepting anything new beyond a really obvious bugfix.  2.4 does not,
> because 2.6 isn't out yet.  Backporting of somethings from 2.5 to 2.4
> will occur until then, and O(1) is an obvious eventual candidate.

it might be a candidate for inclusion once it has _proven_ stability and
robustness (in terms of tester and developer exposion), on the same order
of magnitude as the 2.4 kernel - but that needs time and exposure in trees
like the -ac tree and vendor trees. It might not happen at all, during the
lifetime of 2.4.

Note that the O(1) scheduler isnt a security or stability fix, neither is
it a driver backport. It isnt a feature backport that enables hardware
that couldnt be used in 2.4 before. The VM was a special case because most
people agreed that it truly sucked, and even though people keep
disagreeing about that decision, the VM is in a pretty good shape now -
and we still have good correlation between the VM in 2.5, and the VM in
2.4. The 2.4 scheduler on the other hand doesnt suck for 99% of the
people, so our hands are not forced in any way - we have the choice of a
'proven-rock-solid good scheduler' vs. an 'even better, but still young
scheduler'.

if say 90% of Linux users on the planet adopt the O(1) scheduler, and in a
year or two there wont be a bigger distro (including Debian of course)
without the O(1) scheduler in it [which, admittedly, is happening
already], then it can and should perhaps be merged into 2.4. But right now
i think that the majority of 2.4 users are running the stock 2.4
scheduler.

> So the fact that it's in Alan Cox's kernel (meaning Red Hat is shipping
> it in 2.4.18-5.55, meaning that if more people aren't actually USING it
> yet than marcelo's 2.4, they will be soon), and andrea's kernel (meaning
> new VM development is being done with it in mind)...  It may not be
> "sufficiently tested" yet but it's GETTING a lot of testing.  You use
> anything EXCEPT a stock vanilla 2.4, you're probably getting O(1) at
> this point.

things like migration to a new kernel happen on a slighly slower scale
than the 6 months this patch has existed. I'd say in 1 year what you say
might be true. 70% of the Linux users are not running the 'very latest'
release.

also note that the O(1) scheduler patch in the Red Hat kernel rpm was a
stability fork done months ago, with stability fixes backported into it.  
The 2.4 O(1) patches being distributed now are more like direct backports
of the 2.5 scheduler - this way we can get testing and feedback even from
those people who do not want to (or cannot) run a 2.5 kernel due to the
massive IO changes being underway.

i do not say that the O(1) scheduler has bugs (if i knew about any i'd
have fixed it already :), i am simply saying that to be able to say to
Marcelo "it does not have bugs and does not introduce problems" it needs
more exposure. [ And if the author of a given piece of code says things
like this then it usually does not get merged ;-) ]

> not there on the scheduler yet, but "should not happen" without a
> qualifier means "never"...

we agree here.

> The real question is, how much MORE conservative than the distros should
> the mainline kernels be?

There's a natural 'feature race' between distros, so the distros can act
as an additional (and pretty powerful) testing tool for various kernel
features - and for which the distros are willing to spend resources and
take risks as well. In fact they also act as a 'user demand' filter, for
kernel features as well. And if all distros pick up a given feature, and
it's been in for more than 6 months, (instead of 'more than 6 months since
first patch') then Marcelo will have a much easier decision :-)

	Ingo

From: Bill Davidsen
Subject: Re: [OKS] O(1) scheduler in 2.4
Date: 	Wed, 3 Jul 2002 23:36:07 -0400 (EDT)

> it might be a candidate for inclusion once it has _proven_ stability and
> robustness (in terms of tester and developer exposion), on the same order
> of magnitude as the 2.4 kernel - but that needs time and exposure in trees
> like the -ac tree and vendor trees. It might not happen at all, during the
> lifetime of 2.4.

It has already proven to be stable and robust in the sense that it isn't
worse than the stock scheduler on typical loads and is vastly better on
some.
> 
> Note that the O(1) scheduler isnt a security or stability fix, neither is
> it a driver backport. It isnt a feature backport that enables hardware
> that couldnt be used in 2.4 before. The VM was a special case because most
> people agreed that it truly sucked, and even though people keep
> disagreeing about that decision, the VM is in a pretty good shape now -
> and we still have good correlation between the VM in 2.5, and the VM in
> 2.4. The 2.4 scheduler on the other hand doesnt suck for 99% of the
> people, so our hands are not forced in any way - we have the choice of a
> 'proven-rock-solid good scheduler' vs. an 'even better, but still young
> scheduler'.

Here I disagree. Sure behaves like a stability fix to me. On a system with
a mix of interractive and cpu-bound processes, including processes with
hundreds of threads, you just can't get reasonable performance balancing
with nice() because it is totally impractical to keep tuning a thread
which changes from hog to disk io to socket waits with a human in the
loop. The new scheduler notices this stuff and makes it work, I don't even
know for sure (as in tried it) if you can have different nice on threads
of the same process. 

This is not some neat feature to buy a few percent better this or that,
this is roughly 50% more users on the server before it falls over, and no
total bogs when many threads change to hog mode at once.

You will not hear me saying this about preempt, or low-latency, and I bet
that after I try lock-break this weekend I won't feel that I have to have
that either. The O(1) scheduler is self defense against badly behaved
processes, and the reason it should go in mainline is so it won't depend
on someone finding the time to backport the fun stuff from 2.5 as a patch
every time.

-- 
bill davidsen [email blocked]
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

From: Ingo Molnar
Subject: Re: [OKS] O(1) scheduler in 2.4
Date: 	Thu, 4 Jul 2002 08:56:01 +0200 (CEST)

On Wed, 3 Jul 2002, Bill Davidsen wrote:

> It has already proven to be stable and robust in the sense that it isn't
> worse than the stock scheduler on typical loads and is vastly better on
> some.

this is your experience, and i'm happy about that. Whether it's the same
experience for 90% of Linux users, time will tell.

> Here I disagree. Sure behaves like a stability fix to me. On a system
> with a mix of interractive and cpu-bound processes, including processes
> with hundreds of threads, you just can't get reasonable performance
> balancing with nice() because it is totally impractical to keep tuning a
> thread which changes from hog to disk io to socket waits with a human in
> the loop. The new scheduler notices this stuff and makes it work, I
> don't even know for sure (as in tried it) if you can have different nice
> on threads of the same process.

(yes, it's possible to nice() individual threads.)

> This is not some neat feature to buy a few percent better this or that,
> this is roughly 50% more users on the server before it falls over, and
> no total bogs when many threads change to hog mode at once.

are these hard numbers? I havent seen much hard data yet from real-life
servers using the O(1) scheduler. There was lots of feedback from
desktop-class systems that behave better, but servers used to be pretty
good with the previous scheduler as well.

> You will not hear me saying this about preempt, or low-latency, and I
> bet that after I try lock-break this weekend I won't fell that I have to
> have that either. The O(1) scheduler is self defense against badly
> behaved processes, and the reason it should go in mainline is so it
> won't depend on someone finding the time to backport the fun stuff from
> 2.5 as a patch every time.

well, the O(1) scheduler indeed tries to put up as much defense against
'badly behaved' processes as possible. In fact you should try to start up
your admin shells via nice -20, that gives much more priority than it used
to under the previous scheduler - it's very close to the RT priorities,
but without the risks. This works in the other direction as well: nice +19
has a much stronger meaning (in terms of preemption and timeslice
distribution) than it used to.

	Ingo

From: J Sloan
Subject: Re: [OKS] O(1) scheduler in 2.4
Date: 	Thu, 04 Jul 2002 00:36:30 -0700

Ingo, it's apparent you are refraining from
pushing this O(1) scheduler - that's admirable,
but don't swing too far in the other direction.

The fact is, it's working well in 2.5, it's working
well in the 2.4-ac tree, it's working well in the
2.4-aa tree, and Red Hat has been shipping it.

It will soon be the case that most Linux users
are using O(1) - thus any poor clown who
downloads the standard src from kernel.org
has a large task ahead of him if he wants
similar functionality to the majority of
linux users. This divergence may not be a
good thing...

;-)

Joe


From: Robert Love Subject: [PATCH] O(1) scheduler for 2.4.19-rc1 Date: 02 Jul 2002 10:11:35 -0700 Available at ftp://ftp.kernel.org/pub/linux/kernel/people/rml/sched/ingo-O1/sched-O1-rml-2.4.19-rc1-1.patch and mirrors. Aside from the resync to 2.4.19-rc1, the following changes are new since the last release (most all pulled from 2.5):
	- reintroduce sync wake ups
- whitespace cleanup, trivial cleanups
- remove frozen lock and introduce new arch-specific
switch_mm() logic
- new rq_lock and rq_unlock methods
- wake_up optimization
- nr_uninterruptible optimization for count_active_tasks
- merge the task CPU affinity system calls
- sched_yield bugfix
- minor fixes

Compiles on x86 UP and SMP.

Since Ingo recently posted 2.4-ac resyncs, I will refrain.

As I am the one doing these 2.4 patches, I will invariably be asked
whether I intend for the O(1) scheduler to be merged into 2.4. The
answer is a strong NO.

Enjoy,

Robert Love



From: venom
Subject: Re: [PATCH] O(1) scheduler for 2.4.19-rc1
Date: Wed, 3 Jul 2002 00:10:21 +0200 (CEST)

On 2 Jul 2002, Robert Love wrote:

> Since Ingo recently posted 2.4-ac resyncs, I will refrain.
>
> As I am the one doing these 2.4 patches, I will invariably be asked
> whether I intend for the O(1) scheduler to be merged into 2.4. The
> answer is a strong NO.

Of course, I think you know that you will also asked WHY?

Also if I can immagine your reasons, as similar discussions have been
done for preemption patch and so on, and as I said at the times, I Agree.

2.5 is the place for this new and cool stuff.

Luigi



From: Robert Love
Subject: Re: [PATCH] O(1) scheduler for 2.4.19-rc1
Date: 02 Jul 2002 15:18:00 -0700

On Tue, 2002-07-02 at 15:10:
> Of course, I think you know that you will also asked WHY?

Because I do not think 2.4 should be a breeding ground for every new
feature that wets someone's appetite. It should be stable and trusted
before anything else. We also have to worry about architecture
support. Let the scheduler be 2.5's thing.

> Also if I can immagine your reasons, as similar discussions have been
> done for preemption patch and so on, and as I said at the times, I Agree.

I do not think preemption should go in 2.4, either. It too is a 2.5
thing.

> 2.5 is the place for this new and cool stuff.

Agreed.

Robert Love

MnhUaUcQExIrcIfDhqV

chiccaboom (not verified)
on
August 24, 2010 - 2:07pm

CMbSeBNOkusy

no1partyboy (not verified)
on
September 9, 2010 - 5:22pm

FtxWedJghYu

richshawty (not verified)
on
September 15, 2010 - 6:11pm

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.