Completely reproducible... 2.6.23-rc3 kernel boots, and normal messages are seen on console as far as disks found and partitions on each. However, once /dev is populated and the boottime scripts attempt to check filesystem status, no partitions on either of the two disks attached to the SCSI controller are seen. Dropping into a single-user root shell confirms the sudden "blindness": fdisk can't open /dev/sda. When I reboot on 2.6.24-rc2, everything works normally. System environment is Debian Etch. Both 2.6.24-rc2 and -rc3 were built from the respective unaltered kernel.org source trees, using the same kernel configuration modulo saying "no" to CONFIG_SENSORS_I5K_AMB and CONFIG_PID_NS in -rc3. No problems with -rc3 on a x86 box. -- ------------------------------------------------------------------------ Bob Tracy | "They couldn't hit an elephant at this dist- " rct@frus.com | - Last words of Union General John Sedgwick, | Battle of Spotsylvania Court House, U.S. Civil War ------------------------------------------------------------------------ -
Added to the list, http://bugzilla.kernel.org/show_bug.cgi?id=9457 . Thanks, Rafael -
I was out of town last week, and will be out this week as well. Won't be able to do the bisection until next week at the earliest, but I have remote access to the box if there's anything useful to be done that doesn't require a reboot. No logs available for the "no sd access" case: I'd have to rig up something to record the console output during boot if that's needed. Here's hoping someone else is seeing this or can replicate it in the meantime. -- ------------------------------------------------------------------------ Bob Tracy | "They couldn't hit an elephant at this dist- " rct@frus.com | - Last words of Union General John Sedgwick, | Battle of Spotsylvania Court House, U.S. Civil War ------------------------------------------------------------------------ -
Snap. 2.6.24-rc2 works fine. 2.6.24-rc3 boots on Alpha but once /dev is populated no partitions of the scsi sub-system are seen. Looks like ide sub-system similarly affected. Managed to get boot log. Follows below (with output of various /proc info). Cheerz Michael. Linux version 2.6.24-rc3 (mjc@alpha) (gcc version 4.1.3 20071019 (prerelease) (Debian 4.1.2-17)) #1 Mon Nov 26 19:28:58 NZDT 2007 Booting on Tsunami variation Monet using machine vector Monet from SRM Major Options: EV67 LEGACY_START VERBOSE_MCHECK Command line: ro root=/dev/sda3 console=ttyS0 memcluster 0, usage 1, start 0, end 215 memcluster 1, usage 0, start 215, end 131062 memcluster 2, usage 1, start 131062, end 131072 freeing pages 215:384 freeing pages 930:131062 reserving pages 930:932 4096K Bcache detected; load hit latency 21 cycles, load miss latency 127 cycles Console graphics on hose 0 Built 1 zonelists in Zone order, mobility grouping on. Total pages: 130167 Kernel command line: ro root=/dev/sda3 console=ttyS0 PID hash table entries: 4096 (order: 12, 32768 bytes) Using epoch = 2000 Turning on RTC interrupts. Console: colour VGA+ 80x25 console [ttyS0] enabled Dentry cache hash table entries: 131072 (order: 7, 1048576 bytes) Inode-cache hash table entries: 65536 (order: 6, 524288 bytes) Memory: 1030896k/1048496k available (2786k kernel code, 15216k reserved, 370k data, 168k init) Mount-cache hash table entries: 512 net_namespace: 120 bytes NET: Registered protocol family 16 PCI: Bridge: 0001:01:08.0 IO window: 8000-8fff MEM window: 09000000-090fffff PREFETCH window: disabled. SMC37c669 Super I/O Controller found @ 0x3f0 Linux Plug and Play Support v0.97 (c) Adam Belay SCSI subsystem initialized NET: Registered protocol family 2 IP route cache hash table entries: 8192 (order: 3, 65536 bytes) TCP established hash table entries: 32768 (order: 6, 524288 bytes) TCP bind hash table entries: 32768 (order: 5, 262144 bytes) TCP: Hash tables ...
On Sat, 01 Dec 2007 11:30:01 +1300 I guess this is where things go bad. scsi_id is part of udev. Perhaps some sysfs nodes aren't being created correctly. Random guess: what is your setting of CONFIG_SCSI_SCAN_ASYNC and what -
Thanks for the confirmation of the error condition. As best I can recall, your boot log is substantially the same as what I saw. Finally got back in town. Starting the git-bisect process. I've got a relatively slow network connection, and the PWS 433au isn't exactly what I would call "fast" by modern standards, so bear with me while I get things set up and crank through this. The clone of the 2.6 tree will take several more hours to finish downloading. I anticipate the best pace I'll be able to manage after that is two iterations in a 24- hour period. -- ------------------------------------------------------------------------ Bob Tracy | "They couldn't hit an elephant at this dist- " rct@frus.com | - Last words of Union General John Sedgwick, | Battle of Spotsylvania Court House, U.S. Civil War ------------------------------------------------------------------------ --
once you are done with the download of the initial cloned git repository (which is 200MB+), all the bisection steps will be local and you'll be only limited by kernel rebuild speed and by bootup and testing speed, not by network bandwidth. ( once you have the cloned repository i'd suggest for you to keep it - that way you can track susequent kernels via "git-pull" and it uses a very network-efficient delta protocol. ) Ingo --
ACK. Have tested two kernels in the past 24 hours, and the third is building as I type this. The builds seem to be taking about 3 hours each. First two tests good, so the offending commit is somewhere in the last 25% (roughly) of the changes between -rc2 and -rc3: git says 82 revisions left to test. Might have this painted into a corner in Will do... I'm in the fortunate position of having enough disk space on my Alpha that I can maintain multiple trees for this kind of effort. -- ------------------------------------------------------------------------ Bob Tracy | "They couldn't hit an elephant at this dist- " rct@frus.com | - Last words of Union General John Sedgwick, | Battle of Spotsylvania Court House, U.S. Civil War ------------------------------------------------------------------------ --
OK. Finally have this thing painted into a corner: git has identified 6f37ac793d6ba7b35d338f791974166f67fdd9ba as the first bad commit. From "git bisect log", this corresponds to # bad: [6f37ac793d6ba7b35d338f791974166f67fdd9ba] Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Here's the full log: git-bisect start # good: [9aae299f7fd1888ea3a195cfe0edef17bb647415] Linux 2.6.24-rc2 git-bisect good 9aae299f7fd1888ea3a195cfe0edef17bb647415 # bad: [f05092637dc0d9a3f2249c9b283b973e6e96b7d2] Linux 2.6.24-rc3 git-bisect bad f05092637dc0d9a3f2249c9b283b973e6e96b7d2 # good: [e6a5c27f3b0fef72e528fc35e343af4b2db790ff] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm git-bisect good e6a5c27f3b0fef72e528fc35e343af4b2db790ff # good: [42614fcde7bfdcbe43a7b17035c167dfebc354dd] vmstat: fix section mismatch warning git-bisect good 42614fcde7bfdcbe43a7b17035c167dfebc354dd # bad: [a052f4473603765eb6b4c19754689977601dc1d1] Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/x86 git-bisect bad a052f4473603765eb6b4c19754689977601dc1d1 # good: [d8e5219f9f5ca7518eb820db9f3d287a1d46fcf5] CRISv10 improve and bugfix fasttimer git-bisect good d8e5219f9f5ca7518eb820db9f3d287a1d46fcf5 # good: [d90bf5a976793edfa88d3bb2393f0231eb8ce1e5] [NET]: rt_check_expire() can take a long time, add a cond_resched() git-bisect good d90bf5a976793edfa88d3bb2393f0231eb8ce1e5 # good: [2a113281f5cd2febbab21a93c8943f8d3eece4d3] kconfig: use $K64BIT to set 64BIT with all*config targets git-bisect good 2a113281f5cd2febbab21a93c8943f8d3eece4d3 # good: [2e2cd8bad6e03ceea73495ee6d557044213d95de] CRISv10 memset library add lineendings to asm git-bisect good 2e2cd8bad6e03ceea73495ee6d557044213d95de # bad: [6f37ac793d6ba7b35d338f791974166f67fdd9ba] Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 git-bisect bad 6f37ac793d6ba7b35d338f791974166f67fdd9ba # good: [2f1f53bdc6531696934f6ee7bbdfa2ab4f4f62a3] CRISv10 fasttimer: Scrap ...
On Thu, 6 Dec 2007 18:16:12 -0600 (CST)
commit 6f37ac793d6ba7b35d338f791974166f67fdd9ba
Merge: 2f1f53b... d90bf5a...
Author: Linus Torvalds <torvalds@woody.linux-foundation.org>
Date: Wed Nov 14 18:51:48 2007 -0800
Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/n
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[NET]: rt_check_expire() can take a long time, add a cond_resched()
[ISDN] sc: Really, really fix warning
[ISDN] sc: Fix sndpkt to have the correct number of arguments
[TCP] FRTO: Clear frto_highmark only after process_frto that uses it
[NET]: Remove notifier block from chain when register_netdevice_notifier f
[FS_ENET]: Fix module build.
[TCP]: Make sure write_queue_from does not begin with NULL ptr
[TCP]: Fix size calculation in sk_stream_alloc_pskb
[S2IO]: Fixed memory leak when MSI-X vector allocation fails
[BONDING]: Fix resource use after free
[SYSCTL]: Fix warning for token-ring from sysctl checker
[NET] random : secure_tcp_sequence_number should not assume CONFIG_KTIME_S
[IWLWIFI]: Not correctly dealing with hotunplug.
[TCP] FRTO: Plug potential LOST-bit leak
[TCP] FRTO: Limit snd_cwnd if TCP was application limited
[E1000]: Fix schedule while atomic when called from mii-tool.
[NETX]: Fix build failure added by 2.6.24 statistics cleanup.
[EP93xx_ETH]: Build fix after 2.6.24 NAPI changes.
[PKT_SCHED]: Check subqueue status before calling hard_start_xmit
I'm struggling to see how any of those could have broken block device
mounting on alpha. Are you sure you bisected right?
--
Based on what's in that commit, it *does* appear something went wrong with bisection. If the implicated commit is the next one in time sequence relative to # good: [2f1f53bdc6531696934f6ee7bbdfa2ab4f4f62a3] CRISv10 fasttimer: Scrap INLINE and name timeval_cmp better then the test of whether I bisected correctly is as simple as applying the commit and seeing if things break, because I'm running on the kernel corresponding to 2f1f53bdc6531696934f6ee7bbdfa2ab4f4f62a3 right now. Let me give that a try and I'll report back. Worst case, I'll have to start over and write off the past four days... Sorry about this... -- ------------------------------------------------------------------------ Bob Tracy | "They couldn't hit an elephant at this dist- " rct@frus.com | - Last words of Union General John Sedgwick, | Battle of Spotsylvania Court House, U.S. Civil War ------------------------------------------------------------------------ --
Gad. I trust the second time will be faster. git-bisect _is_ very error prone. I find one of the problems is that each step is so far apart in time that you forget what you were doing. Did I Not appropriate ;) Thanks for helping out. --
i have a fully automated bootup-hang bisection script. It is based on "git-bisect run". I run the script, it builds and boots kernels fully automatically, and when the bootup fails (the script notices that via the serial log, which it continuously watches - or via a timeout, if the system does not come up within 10 minutes it's a "bad" kernel), the script raises my attention via a beep and i power cycle the test box. (yeah, i should make use of a managed power outlet to 100% automate it) So i dont have to a single manual decision anytime during the bisection. But the scripts are very much tied to my ad-hoc test environment so it would not be of much general use. Ingo --
Thanks for the kind words... The above-mentioned test verified that the bisection was/is correct: 2f1f53bdc6531696934f6ee7bbdfa2ab4f4f62a3 works, and 6f37ac793d6ba7b35d338f791974166f67fdd9ba doesn't. Now I've got to figure out why. "git diff 2f1f53bdc6531696934f6ee7bbdfa2ab4f4f62a3 6f37ac793d6ba7b35d338f791974166f67fdd9ba" produced a relatively short patch (18,437 bytes). The list of involved files: diff --git a/drivers/char/random.c b/drivers/char/random.c diff --git a/drivers/isdn/sc/card.h b/drivers/isdn/sc/card.h diff --git a/drivers/isdn/sc/packet.c b/drivers/isdn/sc/packet.c diff --git a/drivers/isdn/sc/shmem.c b/drivers/isdn/sc/shmem.c diff --git a/drivers/net/arm/ep93xx_eth.c b/drivers/net/arm/ep93xx_eth.c diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c diff --git a/drivers/net/fs_enet/Kconfig b/drivers/net/fs_enet/Kconfig diff --git a/drivers/net/fs_enet/Makefile b/drivers/net/fs_enet/Makefile diff --git a/drivers/net/netx-eth.c b/drivers/net/netx-eth.c diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/net/wireless/iwlwifi/iwl3945-base.c diff --git a/include/net/sock.h b/include/net/sock.h diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c diff --git a/net/core/dev.c b/net/core/dev.c diff --git a/net/ipv4/route.c b/net/ipv4/route.c diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c Current state of the source tree is the 6f37ac... version, so I'll start backing out the above diffs in related groups and continue until I've got a working kernel. For lack of an obvious target, I'll start with the seemingly innocuous change to sysctl_check.c. I'll report back when I've got something. -- ------------------------------------------------------------------------ Bob Tracy | "They couldn't hit an ...
That was quick :-). Backing out the sysctl_check.c diff gives me a
working kernel. Beats the #$%@! out of me how/why, though.
Michael Cree: could you try backing out the diff below from your
2.6.24-rc3 tree and see if things are now working for you?
Here's "uname -a", just to confirm (maybe) I'm running on what I say
works:
Linux smirkin 2.6.24-rc2-g6f37ac79-dirty #2 Fri Dec 7 08:03:12 CST 2007 alpha
Here's the diff I backed out (patch -R). It's short...
diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
index 5a2f2b2..4abc6d2 100644
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -738,7 +738,7 @@ static struct trans_ctl_table trans_net_table[] = {
{ NET_ROSE, "rose", trans_net_rose_table },
{ NET_IPV6, "ipv6", trans_net_ipv6_table },
{ NET_X25, "x25", trans_net_x25_table },
- { NET_TR, "tr", trans_net_tr_table },
+ { NET_TR, "token-ring", trans_net_tr_table },
{ NET_DECNET, "decnet", trans_net_decnet_table },
/* NET_ECONET not used */
{ NET_SCTP, "sctp", trans_net_sctp_table },
--
------------------------------------------------------------------------
Bob Tracy | "They couldn't hit an elephant at this dist- "
rct@frus.com | - Last words of Union General John Sedgwick,
| Battle of Spotsylvania Court House, U.S. Civil War
------------------------------------------------------------------------
--
reverting this makes the kernel image shorter by 8 bytes - so perhaps some alignment issue somewhere? Or something gets overflown? Does any of this get actually used by your bootup? Ingo --
i'm not sure how to do direct debugging on udev, so i can only guess about what effect on the kernel side could have caused this. One bad hack would be to "probe" udevd's behavior by changing the NET_TR entry in various ways: "tr" -> "token-ring" # breaks "tr" -> "tr" # works "tr" -> "token-rin0" # ? (1) "tr" -> "TR" # ? (2) the question is, does tweak (1) and tweak (2) work or break? but it would be a lot more effective i guess to get some udevd expert's attention on this ... Ingo --
Could we get the output of: ls -l /sys/block/sda/ and: grep . /sys/block/sda/*/dev ? Kay --
Here are the requested items for the 2.6.24-rc2-g6f37ac79-dirty kernel (the working one with the sysctl_check.c patch reverted): smirkin:/# ls -l /sys/block/sda total 0 -r--r--r-- 1 root root 8192 Dec 7 08:36 capability -r--r--r-- 1 root root 8192 Dec 7 08:36 dev lrwxrwxrwx 1 root root 0 Dec 7 08:36 device -> ../../devices/pci0000:00/0000:00:14.0/0000:01:09.0/host0/target0:0:0/0:0:0:0 drwxr-xr-x 2 root root 0 Dec 7 08:36 holders drwxr-xr-x 3 root root 0 Dec 7 08:36 queue -r--r--r-- 1 root root 8192 Dec 7 08:36 range -r--r--r-- 1 root root 8192 Dec 7 08:36 removable drwxr-xr-x 3 root root 0 Dec 7 08:36 sda1 drwxr-xr-x 3 root root 0 Dec 7 08:36 sda2 drwxr-xr-x 3 root root 0 Dec 7 08:36 sda3 drwxr-xr-x 3 root root 0 Dec 7 08:36 sda4 drwxr-xr-x 3 root root 0 Dec 7 08:36 sda5 drwxr-xr-x 3 root root 0 Dec 7 08:36 sda6 drwxr-xr-x 3 root root 0 Dec 7 08:36 sda7 -r--r--r-- 1 root root 8192 Dec 7 08:36 size drwxr-xr-x 2 root root 0 Dec 7 08:36 slaves -r--r--r-- 1 root root 8192 Dec 7 08:36 stat lrwxrwxrwx 1 root root 0 Dec 7 08:36 subsystem -> ../../block --w------- 1 root root 8192 Dec 7 08:36 uevent smirkin:/# grep . /sys/block/sda/*/dev /sys/block/sda/sda1/dev:8:1 /sys/block/sda/sda2/dev:8:2 /sys/block/sda/sda3/dev:8:3 /sys/block/sda/sda4/dev:8:4 /sys/block/sda/sda5/dev:8:5 /sys/block/sda/sda6/dev:8:6 /sys/block/sda/sda7/dev:8:7 Assuming /sys/block even exists for the non-working case, I'll forward that info in a few hours when I can get home to reboot the machine. -- ------------------------------------------------------------------------ Bob Tracy | "They couldn't hit an elephant at this dist- " rct@frus.com | - Last words of Union General John Sedgwick, | Battle of Spotsylvania Court House, U.S. Civil War ------------------------------------------------------------------------ --
Yes (conference is now over). I backed out the sysctl_check patch from 2.6.24-rc3 and, indeed, got a working kernel. The working kernel (was probably 2.6.24-rc3 less sysctl_check patch, but might have been a 2.6.23 variant) has the following in /sys/block alpha:~# ls -l /sys/block/ total 0 drwxr-xr-x 5 root root 0 2007-12-08 08:55 fd0 drwxr-xr-x 6 root root 0 2007-12-08 08:55 hde drwxr-xr-x 5 root root 0 2007-12-08 08:55 hdf drwxr-xr-x 10 root root 0 2007-12-08 08:55 sda drwxr-xr-x 9 root root 0 2007-12-08 08:55 sdb alpha:~# ls -l /sys/block/sda total 0 -r--r--r-- 1 root root 8192 2007-12-08 08:55 capability -r--r--r-- 1 root root 8192 2007-12-08 08:55 dev lrwxrwxrwx 1 root root 0 2007-12-08 08:55 device -> ../../devices/pci0001:01/0001:01:06.0/host0/target0:0:1/0:0:1:0 drwxr-xr-x 2 root root 0 2007-12-08 08:55 holders drwxr-xr-x 3 root root 0 2007-12-08 08:55 queue -r--r--r-- 1 root root 8192 2007-12-08 08:55 range -r--r--r-- 1 root root 8192 2007-12-08 08:55 removable drwxr-xr-x 3 root root 0 2007-12-08 08:55 sda1 drwxr-xr-x 3 root root 0 2007-12-08 08:55 sda2 drwxr-xr-x 3 root root 0 2007-12-08 08:55 sda3 drwxr-xr-x 3 root root 0 2007-12-08 08:55 sda4 drwxr-xr-x 3 root root 0 2007-12-08 08:55 sda5 -r--r--r-- 1 root root 8192 2007-12-08 08:55 size drwxr-xr-x 2 root root 0 2007-12-08 08:55 slaves -r--r--r-- 1 root root 8192 2007-12-08 08:55 stat lrwxrwxrwx 1 root root 0 2007-12-08 08:55 subsystem -> ../../block --w------- 1 root root 8192 2007-12-08 08:55 uevent alpha:~# grep . /sys/block/sda/*/dev /sys/block/sda/sda1/dev:8:1 /sys/block/sda/sda2/dev:8:2 /sys/block/sda/sda3/dev:8:3 /sys/block/sda/sda4/dev:8:4 /sys/block/sda/sda5/dev:8:5 The broken kernel (2.6.24-rc3) has the following in /sys/block alpha:~# ls -l /sys/block/ total 0 drwxr-xr-x 5 root root 0 Dec 8 09:22 fd0 drwxr-xr-x 6 root root 0 Dec 8 09:22 hde drwxr-xr-x 5 root root 0 Dec 8 09:23 hdf drwxr-xr-x 10 root root 0 Dec 8 09:22 sda drwxr-xr-x 9 ...
Yeah, that looks all fine. What distro is that, and what's the udev version? You are booting your kernel with an initramfs? Is the udev daemon (still) running while it fails? If you run /sbin/udevtrigger, do the nodes appear? Kay --
Mine is Debian Etch, normally with the latest released or -rcX kernel from kernel.org. Updates current as of about 18 hours ago. Udev package version is 0.105-4. The RELEASE-NOTES file in /usr/share/doc/udev I can answer the above later when I'm back in front of the machine, but even in the "not good" case, I still see the following messages from the /etc/rcS.d/S03udev file: Starting the hotplug events dispatcher udevd. Synthesizing the initial hotplug events. This is where udevtrigger gets called, followed by the load_input_modules and create_dev_makedev functions, then... Waiting for /dev to be fully populated. which is where udevsettle gets called. None of the above appear to be exiting abnormally for the bad case, but I'll definitely take a closer look at what MAKEDEV (/dev/MAKEDEV --> /sbin/MAKEDEV) is doing. In particular, Debian MAKEDEV is looking at /proc/devices to decide what to do, so maybe "cat /proc/devices" would be useful to look at for the broken case. -- ------------------------------------------------------------------------ Bob Tracy | "They couldn't hit an elephant at this dist- " rct@frus.com | - Last words of Union General John Sedgwick, | Battle of Spotsylvania Court House, U.S. Civil War ------------------------------------------------------------------------ --
Yes, and there's something else I forgot to mention that may be significant... For the bad case, in addition to udevd, "ps -ef" shows a "sh -e /lib/udev/net.agent" running with a PPID of 1. This process doesn't exit until I reboot. If this is normal under the circumstances, please disregard. -- ------------------------------------------------------------------------ Bob Tracy | "They couldn't hit an elephant at this dist- " rct@frus.com | - Last words of Union General John Sedgwick, | Battle of Spotsylvania Court House, U.S. Civil War ------------------------------------------------------------------------ --
Does SysRq-T show where it hangs? Kay --
Ummm... No. I didn't have the CONFIG_MAGIC_SYSRQ flag set, so I set it, and recompiled the kernel. Guess what - now the system comes up normally without any problem. The block devices appear in /dev. To recap: without CONFIG_MAGIC_SYSRQ on the 2.6.24-rc3 kernel the missing block devices error in /dev occurs and the init scripts fall over on startup, and with CONFIG_MAGIC_SYSRQ the system comes up normally. To answer the earlier questions about distro, and udev version, my system is similar to Bob's, except that I am running Debian testing/lenny which comes with udev version 114 (dpkg reports udev version 0.114-2). I am running an EV67 variant CPU. I do not run an initramfs - I have the necessary drivers for the various discs compiled into the kernel and use the root kernel option to point to the required root partition. When running the broken kernel udev is running (according to 'ps') and executing /sbin/udevtrigger manually generates a number of errors of the form: scsi_id[<pid>]: scsi_id: unable to access '/block' The missing /dev/* entries do not appear. Cheerz Michael. --
Incredible... Toggling CONFIG_MAGIC_SYSRQ works for me too, so I'm finally able to reproduce the problem (which is the main positive result so far ;-) There are lots of possible reasons why this happens, but at the moment I honestly have no idea. For now I have reassigned the bug #9457 to myself and will gradually hack into udev... Ivan. --
Thanks... Let me know if there's anything useful I can do to help. --Bob T. --
It turns out to be yet another strncpy() bug that indeed shows up only with certain src/dst alignments and breaks kobject_get_path(). Ugh... Hopefully I'll have a patch tomorrow. Ivan. --
Verified that 6f37ac793d6ba7b35d338f791974166f67fdd9ba is the next commit after the "good" kernel I'm running now. The build is running, and I should have an answer for us in a few hours. -- ------------------------------------------------------------------------ Bob Tracy | "They couldn't hit an elephant at this dist- " rct@frus.com | - Last words of Union General John Sedgwick, | Battle of Spotsylvania Court House, U.S. Civil War ------------------------------------------------------------------------ --
the bisection log looks healthy so far - with nicely alternating good/bad bisection points. Barring the possibility that the bug is non-deterministic, i'd guess the bisection points are OK, at least judging from their statistical properties. but ... i went over the diffs too, and i fail to see how they could affect the bootup path of an Alpha box, which i suspect has no networking dependency up to the failure point. Ingo --
