Hi,
I've just started testing 2.6.26 on my desktop machine and have found that I get
a long pause (20 seconds or so) as the system boots. By pause I mean that the
boot messages that normally scroll by stop doing so for, as I say, about 20
seconds.
I've done a bisect and arrived at:
b0ed43360fdca227048d88a08290365cb681c1a8 is first bad commit
commit b0ed43360fdca227048d88a08290365cb681c1a8
Author: Hannes Reinecke <hare@suse.de>
Date: Tue Mar 18 14:32:28 2008 +0100
[SCSI] add scsi_host and scsi_target to scsi_bus
This patch implements scsi_host and scsi_target device types
and adds both to the scsi_bus.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
46e4fd28cc537ba700eb4f5652abf043fed4d76d M drivers
The config file for that kernel is attached as are the kernel log for booting
that kernel and the bisect log.
I've also noticed that it takes much longer for icons to appear on my desktop
when I plug a usb storage device in. (The icons are shown/hidden by a simple
sctript that is run via udev when the device is attached/detached.)
Let me know of any way I can help solve this.
Please cc me to any reply as I'm not subscribed.
Chris
--
Beauty is in the eye of the beerholder.
cc's added. Rafael, we might need to track this as a post-2.6.26 regression. --
Thanks for the reply, Andrew. I was beginning to think that I'd committed an LKML sin :-) Yes, it is about the same. On 2.6.25.4 (and earlier), the icons appear within a second or two, but with 2.6.26-rc2 or later, it takes about 20 seconds. I've now built and installed -rc5 and am seeing the same behaviour. I've also built -rc5 on my laptop. On that my kernel uses the older IDE drivers rather than libata. The boot proceeds with no pause, but I do see the same -- Beauty is in the eye of the beerholder. --
Well you did - you reported a bug. <watches wodim wibble for five minutes then report "CD/DVD-Recorder OK that makes sense. ata and usb use the scsi layer. IDE does not. --
It goes much better when the CD is inserted upside up. --
So I can close this regression if I send you a pack of CD ring labels with "other way up" on them for you to affix to the underside of your CDs ... ? James --
Would prefer double-sided CDs. Is this 20-second-delay thing known-about/expected/etc? --
I just discovered my budget only runs to a permanent marker for you to If it's not the scsi_bus_uevent problem, then no, it isn't. The odd thing is that it occurs in the middle of USB initialisation from Note the jump from 19 to 44 seconds. I'm a bit baffled as to how that could be SCSI related, but I suppose it could be udev related somehow. Perhaps turning on driver core debugging might help (that's CONFIG_DEBUG DRIVER ... it depends on CONFIG_DEBUG_KERNEL for activation). Hopefully the verbose kernel logs from here might tell us what's going on in the lacunae. James --
OK, here's a boot log with driver core debugging turned on. Hope it helps! (Oh, and thanks for adding the word lacunae to my vocabulary. I'd never com= e=20 across it before.) =2D-=20 Beauty is in the eye of the beerholder.
--Boundary-00=_NG3SIU63+Zdv2JV Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Because of this oddity, I've also built 2.6.26-rc5-git2 with verbose USB debugging on. A boot log from that kernel is attached. You will see lots of ETIME errors from usb. A log from a similarly verbose 2.6.25.5 kernel does not Triggered by James' bafflement and for my own peace of mind, I've run another bisect session. This time I started with v2.6.26 as good and v2.6.26-rc2 as bad. It highlighted the same patch as the "culprit". -- Beauty is in the eye of the beerholder.
I'm starting to think this might be a distribution problem. I tried booting my laptop on your config (only system I have with ata_piix) without an initrd (bit painful for fedora 9) and I didn't see any slow downs (in fact it was a lot faster than previously; makes note to file bug with redhat about their initramfs). The one thing the patch you identified does is to add the SCSI target to the BUS and type model for the generic device. What this does is to add extra uevents for the target. If your hotplug handlers aren't set up to expect this, they could be delaying the system progress. Actually, that's not really necessary. Bisection is taking you back through kernel history, so unless the bug has two separate causes, it's pretty likely to home in on the same section of code. James --
What distro is this being seen on? I know that openSUSE 11.0 betas did have a bug in their init scripts that was causing a long delay, much like this one. But it has been fixed in the later releases (hopefully...) thanks, greg k-h --
As I repied to James a few minutes ago, my desktop system was originally (the now lapsed) Peanut Linux, which was a Slackware derivative. But I've been running it for 3 or 4 years, upgrading and adding packages along the way. My udev setup is based on that in a fairly recent Slackware, although I can't now recall precisely which version. Thanks -- Beauty is in the eye of the beerholder. --
[Apologies to copy recipients - I accidentally replied to only James earlier
this evening. I've enclosed the text of my reply in square brackets at the
relevant places below, followed, where appropriate, by my findings from
tonight's investigations]
[OK, I'll start investigating this line. My desktop system was originally (the
now lapsed) Peanut Linux, which was a Slackware derivative. But I've been
running it for 3 or 4 years, upgrading and adding packages along the way. My
udev setup is based on that in a fairly recent Slackware, although I can't now
recall precisely which version.]
For the time being I'll try and isolate what it is that causes the delay in
icons appearing on my desktop when I plug in a usb-storage device. The script
that shows the icon is invoked by adding the script name to udev's run list.
Here is a typical rule from /etc/udev/rules.d/20-local.rules:
ACTION=="add", KERNEL=="sd[a-z][0-9]", SYSFS{idProduct}=="6545",
SYSFS{idVendor}=="0930", SYMLINK+="usbstickc%n", MODE="0666",
RUN+="/usr/bin/udevshowicon.sh"
I've added a couple of calls to logger (from util-linux-2.12r) to the script.
What this has shown is that, when I plug in a usb-storage device, there seems
to be a delay between the usb/scsi/usb-storage drivers setting up the device
and my icon-showing script being run. Here is the snippet from the kernel log
file:
Jun 10 20:27:45 upstairs kernel: [ 3200.592107] hub 3-0:1.0: state 7 ports 5 chg
0000 evt 0004
Jun 10 20:27:45 upstairs kernel: [ 3200.592107] ehci_hcd 0000:02:0a.2: GetStatus
port 2 status 001803 POWER sig=j CSC CONNECT
Jun 10 20:27:45 upstairs kernel: [ 3200.592107] hub 3-0:1.0: port 2, status
0501, change 0001, 480 Mb/s
Jun 10 20:27:45 upstairs kernel: [ 3200.695130] hub 3-0:1.0: debounce: port 2:
total 100ms stable 100ms status 0x501
Jun 10 20:27:45 upstairs kernel: [ 3200.746270] ehci_hcd 0000:02:0a.2: port 2
high speed
Jun 10 20:27:45 upstairs kernel: [ 3200.746279] ehci_hcd 0000:02:0a.2: GetStatus ...... You can eliminate this 5-second delay by adding options usb-storage delay_use=0 to /etc/modprobe.conf. Of course, this will make the other delays just stand out that much more sharply... Alan Stern --
Hi everyone, I've done that poking around and found something that to me looks interesting. Before plugging in a usb stick, I changed udev's log_priority to "debug" (using udevcontrol). Attached is the snippet from /var/log/messages that shows all the diagnostics emitted by udev when the stick was plugged in. (This is with 2.6.26-rc5-git5). You will see that there are several instances of a message about udev unsuccesfully waiting for a file ioerr_cnt to appear deep down in the guts of /sys. Eventually, it does appear, but only after 21 seconds have elapsed. On 2.6.25.6, the file appears almost as soon as udev starts looking for it. The log extract showing this is also attached as is an extract from the kernel log for 2.6.26-rc5-git5. By the way, the main udev-rules file that is installed is the one for slackware that comes with the udev package from kernel.org. So the file these tests were run with are that from udev-122, which I upgraded to earlier this evening (more in hope than expectation :-) -- Beauty is in the eye of the beerholder.
This doesn't look good. It looks like a udev rule in your system is looking for a file that will never show up. That's a short way to a long delay time :) Try commenting out the rule that does this and see if things are fixed. thanks, greg k-h --
Actually, there's something seriously wrong here: ioerr_cnt is a property of the device, not of the host (as in it will never appear under .../host4 but under .../host4/targetX:X:X/4:X:X:X/ioerr_cnt). Perhaps an investigation of why udevd-event is looking under host4/ is in order. James --
So the change which commit b0ed43360fdca227048d88a08290365cb681c1a8 "[SCSI] add scsi_host and scsi_target to scsi_bus" introduced is leading the udev scripts onto a wrong trail? (This commit was the one which Chris found by bisection, see http://lkml.org/lkml/2008/6/1/2) -- Stefan Richter -=====-==--- -==- -=-== http://arcgraph.de/sr/ --
Yes, I think so ... what it does is make us get bus events for the target as well as the lun, whereas without it we only get them for the lun. I think something is misparsing the new target event and this is where the trouble is coming from. James --
Hi James, I assume that since we now the source of this problem, you can create it at will. I only mention because if you need me to test a patch, it will have to be before Tuesday, June 17 - I go on holiday (vacation) on that day. Thanks -- Beauty is in the eye of the beerholder. --
Yes, we can; thanks! The problem though is that it's in the udev scripts, so there's nothing we can fix in the kernel, we now need to persuade the various distros to fix it (if they're actually broken), so it might take a while. James --
I blame a badly written udev rule :) thanks, greg k-h --
Yes, that's it. It's this rule, which appears as the second rule in the slackware rules file that's distributed with udev (and it's been there since udev-97 in August 2006): ACTION=="add", SUBSYSTEM=="scsi", WAIT_FOR_SYSFS="ioerr_cnt" Commenting that rule out eliminates the pause in the boot and in the appearance of icons on my desktop. Thanks, -- Beauty is in the eye of the beerholder. --
It's called a workaround, not a fix. -- Stefan Richter -=====-==--- -==- -=-== http://arcgraph.de/sr/ --
FWIW, Gentoo's udev 119 has this:
$ grep -r ioerr_cnt /etc/udev/
/etc/udev/rules.d/05-udev-early.rules:ACTION=="add",
KERNEL=="[0-9]*:[0-9]*", SUBSYSTEM=="scsi", WAIT_FOR_SYSFS="ioerr_cnt"
Whatever this rule is good for... Maybe the pattern should be
"[0-9]+(:[0-9]+){3}" to match logical units only.
--
Stefan Richter
-=====-==--- -==- -=-==
http://arcgraph.de/sr/
--
Actually no; looks like these are just glob(7) patterns. Then the pattern might already be OK since host devices' and target devices' names don't start with decimal numbers. -- Stefan Richter -=====-==--- -==- -=-== http://arcgraph.de/sr/ --
For Slackware 12.0.0, the rule is: ACTION=="add", KERNEL=="[0-9]*:[0-9]*", SUBSYSTEM=="scsi", WAIT_FOR_SYSFS="ioerr_cnt" Using that rule instaed of the old one in the Slackware rules file included with udev appears to result in the same behaviour as with the old rule and kernel < 2.6.26. Unless someone warns me otherwise, I think I'll go with that on my (homebrewed) system, so thanks to everyone for your help. Chris -- Beauty is in the eye of the beerholder. --
In fact the log shows that the 30-second delay occurs in the middle of ohci1394 (Firewire) initialization, not USB initialization: Jun 8 05:26:35 upstairs kernel: ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[22] MMIO=[fbfff000-fbfff7ff] Max Packet=[2048] IR/IT contexts=[4/8] Jun 8 05:26:35 upstairs kernel: ohci_hcd 0000:02:0a.0: urb f71cb680 path 2 ep1in 5e160000 cc 5 --> status -62 Jun 8 05:26:35 upstairs kernel: hub 1-2:1.0: transfer --> -62 Jun 8 05:27:08 upstairs kernel: driver: '0000:02:0b.0': driver_bound: bound to device 'ohci1394' Jun 8 05:27:08 upstairs kernel: bus: 'pci': really_probe: bound device 0000:02:0b.0 to driver ohci1394 Those -62 errors from ohci_hcd are red herrings. To test this, try booting with "nousb". Alan Stern --
It's also worth noting that the first 17:25:44 message is highly suspicious. It is the initial output of the ohci-hcd driver, indicating that the udev/hotplug system had just loaded that driver. This doesn't seem to have anything to do with the SCSI stack. Whatever caused the delay, it also managed to pause the other hotplug task which was in the midst of loading uhci-hcd. Alan Stern --
Sorry folks, I'm unexpectedly away from the computers that these "problems" occur on. I should be back at home tomorrow and will answer you all as quickly as possible then. Thanks, -- Beauty is in the eye of the beerholder. --
Ah sorry, that's a legacy of when I had considerably less ram than 2GB. I only use firewire on the rare occasion that I want to grab some video from my camera and I use the parallel port even less frequently. So one of the last things When I was bisecting, I assessed good or bad based on whether I experienced the long pause during boot up. You may have seen my reply to James a short while ago that I have bisected again and arrived at the same patch. Thanks, Chris -- Beauty is in the eye of the beerholder. --
On an unrelated note: If the driver was automatically loaded in a hotplug-like fashion (based on PCI IDs and module aliases), you could prevent it with "blacklist ohci1394" in one of the modprobe configuration files. -- Stefan Richter -=====-==--- -==- -=-=- http://arcgraph.de/sr/ --
It already is ... as I asked in the actual bug (but no-one seems to have noticed, so I assume the bug entry isn't set up correctly). Does this still persist with 2.6.26-rc5? The reason for asking is that it looks very much like a silent manifestation of our scsi_bus_uevent() problem that got fixed in 2.6.26-rc5. James --
The bug reporter has no Bugzilla account, so I couldn't add him to the CC list of the bug. Thanks, Rafael --
I do have a bugzilla account, but for some reason I couldn't log in to it to report this bug, so I mailed LKML to get the ball rolling. I can now log in so I've added myself to the CC list. -- Beauty is in the eye of the beerholder. --
Yes, I still get the boot pause and the delay in desktop icons appearing when a usb-storage device is plugged in with -rc5. A boot log is attached. Is there some sort of verbose scsi boot messaging that I can switch on? I've looked but I'm damned if I can find anything in 'make menuconfig'. -- Beauty is in the eye of the beerholder.
SCSI_CONSTANTS and SCSI_LOGGING might help. Both live in the SCSI options menu. Read the Kconfig help on how to use SCSI_LOGGING. Can you please also enable CONFIG_PRINTK_TIME ? Using the timestamps in /var/log/messages is ok but not the most reliable place to get somewhat accurate time stamps of events. hth, grant --
I've enabled CONFIG_PRINTK_TIME and will look into the SCSI options. -- Beauty is in the eye of the beerholder. --
We're tracking it as http://bugzilla.kernel.org/show_bug.cgi?id=10861 . Thanks, Rafael --
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST] Look in object dir for .config |
git: | |
| Brian Downing | Re: Git in a Nutshell guide |
| John Benes | Re: master has some toys |
| Matthias Lederhofer | [PATCH 4/7] introduce GIT_WORK_TREE to specify the work tree |
| Alexander Sulfrian |
