Netbooting NetBSD in Previous

Started by cuby, January 08, 2023, 09:58:12 AM

Previous topic - Next topic

cuby

I'm experimenting with alternative OS's for black hardware again, this time NetBSD.

I checked that the NetBSD network boot loader works on real hardware (it has a problem with the mounted NFS root, but is able to mount the FS). Currently, only non-Turbo slabs are supported.

When running in Previous, the NetBSD bootloader is not receiving packets. When enabling some debugging in Previous, I see that the packets are received by slirp and decoded as DHCP rfc1533_cookie and Previous prepares a reply packet, but this never seems to arrive in the emulation.

When the en_get function in the NetBSD bootloader is executed, Previous complains:

[DMA] Channel Ethernet Receive: Error! DMA not enabled!
[EN] Receiving packet: Error! Receiver overflow (DMA disabled)!

So it seems there's a problem with the DMA behavior?

The NetBSD bootloader ethernet code can be found in this file (it did not change very much between the versions, none of the versions I tested works and all show a very similar behavior).

I'll try to debug this a bit further, let's see what I can find...

andreas_g

The Ethernet DMA channels are a bit mysterious! I experienced a similar behavior while doing some experiments with Ethernet speed. Normally the Ethernet receive DMA channel is in chaining mode (after one buffer is full, next buffer is set up automatically from start and stop registers by hardware and an interrupt is generated; the OS then sets new values for start and stop and sets the chaining bit in the DMA CSR). If the speed is too high, the end of the chain is reached (last buffer of chain is full; DMA is automatically disabled). Normally that would not be a problem and the OS would re-enable the DMA channel, but it seems with Ethernet the OS does not re-enable the DMA channel.

The Ethernet channels have some function to repeat buffers in case of collisions or similar errors. But I have not been able to reverse the functionality or find detailled information about it. Maybe it can be reversed from real hardware. It involves the registers named saved_next, saved_limit, saved_start and saved_stop in Previous (see dma.c). Not sure that is the source of the problem, but it might be it.

It might also be worth to check what happens before the DMA channel gets disabled.

cuby

Quote from: andreas_g on January 09, 2023, 12:36:05 AMThe Ethernet DMA channels are a bit mysterious! I experienced a similar behavior while doing some experiments with Ethernet speed. Normally the Ethernet receive DMA channel is in chaining mode (after one buffer is full, next buffer is set up automatically from start and stop registers by hardware and an interrupt is generated; the OS then sets new values for start and stop and sets the chaining bit in the DMA CSR). If the speed is too high, the end of the chain is reached (last buffer of chain is full; DMA is automatically disabled). Normally that would not be a problem and the OS would re-enable the DMA channel, but it seems with Ethernet the OS does not re-enable the DMA channel.
Thanks for the hints Andreas! Yes, I can't really say I understand what the NetBSD bootloader code tries to do... but it could well be that it's a timing problem which never shows up on real hardware and the NetBSD developers got away with this for the last 17 years or so :). The curse of too little hardware diversity – but maybe this is also the reason why the bootloader doesn't work on Turbo machines (apart from some differences in the network chip if I'm not mistaken).

QuoteThe Ethernet channels have some function to repeat buffers in case of collisions or similar errors. But I have not been able to reverse the functionality or find detailled information about it. Maybe it can be reversed from real hardware. It involves the registers named saved_next, saved_limit, saved_start and saved_stop in Previous (see dma.c). Not sure that is the source of the problem, but it might be it.
Good idea - I'll investigate. I saw some strange effects with unaligned addresses in the saved_xxx pointers, but this was not consistent.

QuoteIt might also be worth to check what happens before the DMA channel gets disabled.
Noted, thanks! :)

andreas_g

Quote from: cuby on January 09, 2023, 01:21:27 AMYes, I can't really say I understand what the NetBSD bootloader code tries to do... but it could well be that it's a timing problem which never shows up on real hardware and the NetBSD developers got away with this for the last 17 years or so :).

You can play around with timings by changing ENET_IO_DELAY and ENET_IO_SHORT in ethernet.c (lines 312 and 313).

cuby

Looks like it's a bug in the bootloader after all.

For non-Turbo machines, it sets rxdma->dd_saved_next to 0 in l. 308 of boot.en.c.

If I change the line to read rxdma->dd_saved_next = dma_buffers[0]; Previous does netboot! The NetBSD kernel then crashes later during startup when setting up DMA, it seems. More to debug...

I also made a quick video of the boot process.

andreas_g

Quote from: cuby on January 09, 2023, 02:57:51 AMLooks like it's a bug in the bootloader after all.
Not necessarily. If it works on real hardware it also needs to work with Previous. What happens if you add following line to dma_enet_write_memory() right before TRY (dma.c, line 835)?

dma[CHANNEL_EN_RX].saved_next = dma[CHANNEL_EN_RX].next

cuby

Quote from: andreas_g on January 09, 2023, 03:16:42 AMNot necessarily. If it works on real hardware it also needs to work with Previous. What happens if you add following line to dma_enet_write_memory() right before TRY (dma.c, line 835)?

dma[CHANNEL_EN_RX].saved_next = dma[CHANNEL_EN_RX].next
That fixes the problem, thanks! Tested with the binary bootloader from NetBSD 9. So the real hardware seems to perform some automatic init of saved_next, it seems? It sounds a bit strange, if you check the Turbo-specific code in the bootloader, saved_next is explicitly set there (but I'm not sure if this code has ever been tested)...

There's still a problem when the loaded kernel binary size is large (more than ca. 3.5 MB). This results in a "CPU halted" error message from Previous, but I guess that's a different problem...

The NetBSD kernel crash after a successful boot seems to be related to accesses to mmap'ed registers at addresses 0x02004000...003 which don't seem to be emulated (no entry in ioMemTabNEXT.c). The last lines of the log are:

channel SCSI:
DMA CSR write at $02000010 val=$18 PC=$001532f4
DMA from mem to dev
DMA: unknown command!
channel SCSI:
DMA CSR write at $02000010 val=$00 PC=$001532f6
DMA from mem to dev
DMA no command
channel SCSI:
DMA Next write at $02004010 val=$deadbeef PC=$00153454
channel SCSI:
DMA Limit write at $02004014 val=$deadbeef PC=$00153458
Bus error $02004000 PC=$00153464 /Users/me/Projects/NeXT/previous-dev/previous-code/src/ioMem.c at 413
Bus error $02004001 PC=$00153464 /Users/me/Projects/NeXT/previous-dev/previous-code/src/ioMem.c at 423
Bus error $02004002 PC=$00153464 /Users/me/Projects/NeXT/previous-dev/previous-code/src/ioMem.c at 413
Bus error $02004003 PC=$00153464 /Users/me/Projects/NeXT/previous-dev/previous-code/src/ioMem.c at 423

andreas_g

Can you point me to the palce in the NetBSD sources, where 0x02004000 is accessed? Maybe there is some comment. I suspect there is no real register, but also no bus error. You could try just populating the address with
{ 0x02004000, SIZE_LONG, DMA_Saved_Next_Read, DMA_Saved_Next_Write },

cuby

Quote from: andreas_g on January 09, 2023, 04:01:28 AMCan you point me to the palce in the NetBSD sources, where 0x02004000 is accessed?
I was trying to find the exact location, but somehow debug symbols don't show as expected in my disassembler. I think it's one of the nd_bsw4 calls in nextdma_setup_curr_regs (.../dev/nextdma.c), but I need to dig through the macros that are used there to find out the referenced addresses.

Here's the NetBSD source...

Adding the line you proposed made it crash when trying to access 0x02004004...007, then 4008 and 400c... adding entries for all of these lets the kernel proceed further.
Another case of strange register aliases as with Plan 9 earlier?

Now Previous hangs (with a spinning wheel of death) after "scsibus0: waiting 2 seconds for devices to settle"...

andreas_g

Found the place: https://github.com/NetBSD/src/blob/635c4e7ee7570e1e42d915233cface78a330cd48/sys/arch/next68k/dev/nextdma.c#L409

So it should be like this:
{ 0x02004000, SIZE_LONG, DMA_Saved_Next_Read, DMA_Saved_Next_Write },
{ 0x02004004, SIZE_LONG, DMA_Saved_Limit_Read, DMA_Saved_Limit_Write },

cuby

Quote from: andreas_g on January 09, 2023, 04:19:12 AMSo it should be like this:
Code Select Expand
{ 0x02004000, SIZE_LONG, DMA_Saved_Next_Read, DMA_Saved_Next_Write },
{ 0x02004004, SIZE_LONG, DMA_Saved_Limit_Read, DMA_Saved_Limit_Write },
There are also writes to 0x020040008 and 0x0204000c which happen in nextdma_setup_cont_regs, I assume l. 464/465?

However, adding the (I think) obvious entries for these
        { 0x02004008, SIZE_LONG, DMA_Saved_Start_Read, DMA_Saved_Start_Write },
        { 0x0200400c, SIZE_LONG, DMA_Saved_Stop_Read, DMA_Saved_Start_Write },
[/s]
crashes the kernel, the debug output shows bogus values for the saved_xxx registers (see screenshot). [edit] The bogus values are set by the NetBSD code in the lines just above these, so that's probably expected...

Maybe I fail to understand something here...


[edit]Stupid me, copy-and-paste error in the 000c line, but the kernel still hangs at the scsibus0 line...

Screenshot 2023-01-09 at 12.33.40.png

cuby

Quote from: cuby on January 09, 2023, 04:34:43 AMthe kernel still hangs at the scsibus0 line...
The last log entries are
DMA Start read at $02004018 val=$deadbee0 PC=$0000fc78
channel SCSI:
DMA Stop read at $0200401c val=$deadbee0 PC=$0000fc84
channel SCSI:
DMA SStart read at $02004008 val=$deadbee0 PC=$0000fd0a
channel SCSI:
DMA SStop read at $0200400c val=$deadbee0 PC=$0000fd14
channel SCSI:
DMA CSR read at $02000010 val=$00 PC=$0000fd9e
IO write at $0211400c val=00 PC=$00014128
IO write at $0211400b val=48 PC=$00014128

No crash this time, though (but maybe busy waiting, thus the wheel of death in Previous? Just guessing...).

[edit] The code seems to hang in the call to kpause. Setting SCSI_DELAY to 0 then results in

esp0: invalid state: 7 [intr 10, phase(c 3, p 3)]
0x020x_xxxx seems to be NEXT_P_DEV_SPACE, 0x021x_xxxx NEXT_P_DEV_BMAP according to NetBSD's sys/arch/next68k/include/cpu.h?

andreas_g

Please apply this patch upon the latest revision of branch_softfloat (r1270) and test if it fixes the problem with kpause():

diff -ru a/src/cpu/newcpu.c b/src/cpu/newcpu.c
--- a/src/cpu/newcpu.c 2023-01-05 18:15:58
+++ b/src/cpu/newcpu.c 2023-01-09 19:18:53
@@ -2414,7 +2414,6 @@
  if (regs.stopped)
  return;
  regs.stopped = 1;
- set_special(SPCFLAG_STOP);
 #ifndef WINUAE_FOR_HATARI
  if (cpu_last_stop_vpos >= 0) {
  cpu_last_stop_vpos = vpos;
@@ -2425,7 +2424,6 @@
 static void m68k_unset_stop(void)
 {
  regs.stopped = 0;
- unset_special(SPCFLAG_STOP);
 #ifndef WINUAE_FOR_HATARI
  if (cpu_last_stop_vpos >= 0) {
  cpu_stopped_lines += vpos - cpu_last_stop_vpos;
@@ -6259,11 +6257,7 @@
  run_other_MPUs();
 
  /* We can have several interrupts at the same time before the next CPU instruction */
- /* We must check for pending interrupt and call do_specialties_interrupt() only */
- /* if the cpu is not in the STOP state. Else, the int could be acknowledged now */
- /* and prevent exiting the STOP state when calling do_specialties() after. */
- /* For performance, we first test PendingInterruptCount, then regs.spcflags */
- while ( ( PendingInterrupt.time <= 0 ) && ( PendingInterrupt.pFunction ) && ( ( regs.spcflags & SPCFLAG_STOP ) == 0 ) ) {
+ while ( ( PendingInterrupt.time <= 0 ) && ( PendingInterrupt.pFunction ) ) {
  CALL_VAR(PendingInterrupt.pFunction); /* call the interrupt handler */
  }
 
@@ -6390,11 +6384,7 @@
  run_other_MPUs();
 
  /* We can have several interrupts at the same time before the next CPU instruction */
- /* We must check for pending interrupt and call do_specialties_interrupt() only */
- /* if the cpu is not in the STOP state. Else, the int could be acknowledged now */
- /* and prevent exiting the STOP state when calling do_specialties() after. */
- /* For performance, we first test PendingInterruptCount, then regs.spcflags */
- while ( ( PendingInterrupt.time <= 0 ) && ( PendingInterrupt.pFunction ) && ( ( regs.spcflags & SPCFLAG_STOP ) == 0 ) ) {
+ while ( ( PendingInterrupt.time <= 0 ) && ( PendingInterrupt.pFunction ) ) {
  CALL_VAR(PendingInterrupt.pFunction); /* call the interrupt handler */
  }
  /* Previous: for now we poll the interrupt pins with every instruction.

cuby

Quote from: andreas_g on January 09, 2023, 11:33:08 AMtest if it fixes the problem with kpause()
It does, the emulation now continues (and then fails with the SCSI error I mentioned above, of course) instead of hanging.

Great, thanks again! :)


andreas_g

Great! So next let's find out where in the NetBSD code this message is printed:
esp0: invalid state: 7 [intr 10, phase(c 3, p 3)]