A new benchmark tool

Started by verdraith, January 27, 2023, 04:03:58 PM

Previous topic - Next topic

What configuration should NSBench use as a baseline?

25MHz 68030 NeXTcomputer with 16MB RAM
7 (43.8%)
33MHz 68040 NeXTcube with 64MB RAM
1 (6.3%)
25MHz 68040 NeXTstation Color with 64MB RAM
4 (25%)
33MHz 68040 NeXTcube with 64MB RAM + 64MB Dimension with NSBench on the colour display.
4 (25%)

Total Members Voted: 16

Voting closed: February 06, 2023, 01:20:17 PM

verdraith

Quote from: zombie on January 30, 2023, 03:08:31 PMI like making the benchmark the highest pinnacle of black hardware performance released. The turbo next dimension.
I was thinking the same thing.  But I can also see how baselines from stock hardware could be used to show the performance benefits of, say, using a Dimension board.   In the end, I decided to make it into a poll... I'll go with whatever wins :)

Quote from: zombie on January 30, 2023, 03:08:31 PMAlso, I suspect you will get different results if you run it on NS 2.0 vs OS 4.2.
Yes, but probably not for the reason you're thinking.

NXFactor exercises the Display PostScript subsystem, which has the following dependencies:
  • Performance of the core DPS subsystem,
  • CPU speed,
  • RAM access speed,
  • Disk seek and read/write speeds (for when stuff gets dumped into swap on systems with low memory.)

In fact, when I run NXBench on Previous configured as a 25MHz '040 Cube with 16MB RAM, I initially get around 240KB of free memory once the OS loads the NSBench program and bundle images into RAM.  DPS exercises the swap file when I execute the NXFactor benchmark, which immediately creates a disk I/O bottleneck, which skews the results.

Right now, NSBench is only compiled on NEXTSTEP 3.3, so it will not use any of the OPENSTEP frameworks whatsoever:
asmodai@nova:/Users/asmodai/Projects/NSBench/NSBench.app $ uname -a
OPENSTEP nova 4.2 NeXT Mach 4.2: Tue Jan 26 11:21:50 PST 1999; root(rcbuilder):Objects/mk-183.34.4.obj~2/RELEASE_I386 I386 Intel 486
asmodai@nova:/Users/asmodai/Projects/NSBench/NSBench.app $ otool -L NSBench
NSBench:
        /usr/shlib/libMedia_s.A.shlib (minor version 12)
        /usr/shlib/libNeXT_s.C.shlib (minor version 89)
        /usr/shlib/libsys_s.B.shlib (minor version 62)
asmodai@nova:/Users/asmodai/Projects/NSBench/NSBench.app $
I deliberately avoided using MiscKit, Foundation, etc.  In fact I even implement my own String class to avoid external dependencies beyond the three linked libraries.

So any differing results could -- on a system with 0 load at rest and with sufficient RAM to avoid page faults and swapping -- be placed squarely on the shoulders of anything else other than the OPENSTEP frameworks.  There are a lot of other variables at play.

Let's look at the results of both NEXTSTEP 3.3 and OPENSTEP 4.2 running in identically-configured VMware virtual machines (all listed times are in milliseconds.)
TestOPENSTEP 4.2    NEXTSTEP 3.3
Line
153
140
Curve
78
90
Fill
47
40
Transpose
90
50
Composite   
542
490
Userpath
332
340
Text
110
120
Window
107
90
The biggest differences there are transposition and compositing.  There could have been a change between DPS 125.19 (NS3.3) and DPS 184 (OS4.2) that affects those DPS operations, but we don't know.  Instead, confirmation bias kicks in.

To give you an idea of how things can skew results, here are the differences between OS 4.2 and OS 4.0 Mecca.
TestOPENSTEP 4.2    OPENSTEP 4.0 'Mecca'
Line
153
167
Curve
78
87
Fill
47
55
Transpose
90
89
Composite   
542
594
Userpath
332
334
Text
110
113
Window
107
450
What threw out the window move results?  I'd guess the fact that Mecca has that background image thing going on, and I didn't change it.  In fact, when I set it to the default solid colour, the window move test takes 260ms.

For anything meaningful to be construed from the differences between NEXTSTEP 3.3, OPENSTEP 4.0, and OPENSTEP 4.2 we would need be aware of the exact differences. The number of variables involved is too great.

Ideally, I should re-factor NSBench so that there are TWO bases.  One for NEXTSTEP 3.3, and another for OPENSTEP 4.2.  Any port of NSBench to anything else should have a baseline for that thing, e.g. NSBench on NEXTSTEP 2.2 should have a baseline for NEXTSTEP 2.2.  In fact, I think that is precisely what I will do.

Benchmarking library load times will also be highly misleading given that loading object files and libraries etc is I/O-bound (disk I/O, memory access speed, swap I/O, etc etc).  It wouldn't give an indication of the performance of the kernel itself.

The point you appear to be trying to make -- that OPENSTEP is bloat and is causing loading to be slower -- is confirmation bias.  In simple terms, WriteNow would not be loading any OPENSTEP framework whatsoever -- it's linked against libNeXT_s and libsys_s only.

I'd need to dig into the sources to see if `rld' uses ref counting.  If it does, then `rld_unload' might not trigger Mach to immediately remove the object from memory as something else could be using it.

It probably does use ref counting, as shared libraries were a big thing in Unix at the time -- the fact that a shared library is kept around in main memory for as long as its needed in order to ensure other programs using the same library do not load the library every time they're executed; with the added bonus that there is less overall memory usage.  In simple terms, even if WriteNow made use of OPENSTEP frameworks, they would already be in memory due to Workspace loading them when it starts up.

The speed of an application starting up has many variables and factors, benchmarking those is potentially meaningless, as it could boil down to a mix of cache hits vs misses and the I/O that results from a miss.

Quote from: zombie on January 30, 2023, 03:08:31 PMAnyway, this is a super cool project, thank you for doing it!

It's actually been a lot of fun.  I haven't really done any stuff involving Display PostScript before.  Most of my code has been with Ye Olde AppKit, OpenStep AppKit/Foundation, and occasionally MiscKit.

I've even managed to write a usable DPS userpath test!

Next step will be CoreMark, which will be interesting to hack up in a portable way.  I'm hoping I can use ObjC's List and HashTable alongside plain C linked-lists and hash tables.

It might end up being CoreMark+ObjC, but it ought to be fun nonetheless. :)
Lisp Hacker

verdraith

Quote from: MindWalker on January 30, 2023, 01:51:16 PMOk, that CPU makes sense, I just saw it said it was a 486 and was stunned how that could be *that* much faster than my Sparc  ;D

My original plan was to invoke the wrath of
__asm__ __volatile__("cpuid\t\n"...)
But, GCC is clever and knows that `cpuid` requires an i586, resulting in an i586 CPU subtype in the Mach-O header, which the Mach loader balks at.  I didn't feel like doing low-level assembler hacks, as the values involved in userland could potentially be changed by context switches.

I suspect that in order to do *any* sort of processor identification for either Motorola or Intel CPUs, I'd need to write a Mach kernel module... the question then becomes "is it worth the effort and the risk of killing the kernel dead?".  I think the answer to that is "nah."

Maybe one day I'll explore that, but not until I have other stuff done first :)
I'd also need to figure out how that stuff works on SPARC and PA-RISC.  I believe SPARC has the CPU type enumerated in the PROM... thankfully, there's kernel sources available so I can find out.
I have no idea how HP did it, though.

-- edit --
Actually, what would be neat is a programmatic way of identifying whether something is running in Previous.  It is possible to identify VMware (and possibly other virtualisation software) by peeking at obscure memory addresses (VMware backchannels and the like), so might be neat to have something similar in Previous -- perhaps a value set in an NI directory or something?
Lisp Hacker

zombie

A bit like figuring out we living a simulation or not.  Does the network device maybe have a custom name, or could be given one, to make the check easy, or something of that sort.

verdraith

A lot of stuff on Black hardware is locked away in the kernel, without any means to get at it aside from writing a kernel module.

I don't want to resort to disklabels either, as they can be changed.   Any sort of means of identifying as running inside Previous would by necessity have to be independent of any operating system.   Maybe there's an unused trap or something re the CPU... something that can be accessed from user mode.

Actually, what happens if one invokes trap with an out-of-bounds callback number?  would it fault and cause an illop?
Lisp Hacker

zombie

I do not recall if any of the old bench apps had the ability to tell the CPU.

I forget, at boot, post, does the log list the CPU. Maybe you can just scrape from there?

verdraith

Version 0.2 is now available at https://github.com/Asmodai/NeXT-NSBench/releases/tag/v0.2

This version includes my new DPS userpath test, as well as fixing CPU type/subtype lookup so that SPARC now shows as 'SPARC' and not 'Generic'.
Lisp Hacker

verdraith

Version 0.3 is now available at https://github.com/Asmodai/NeXT-NSBench/releases/tag/v0.3

This version includes a window resize test.  Once you've finished the NXFactor testing, the results are viewable via the 'Log...' button.

The NXFactor benchmark is pretty much done.  I want to do some UI cleanup and add some graphing to the NXFactor results so there's a visual comparator between your machine, the baseline, other common machines etc.
Lisp Hacker

verdraith

Started initial work on CoreMark.

The 'simple' variant works.



Next steps are to verify timing (so I don't end up with a benchmark that takes days to run on m68k) and get it using the NXZone variants of `malloc`/`free` et al.
Lisp Hacker

verdraith

So, there's a huge caveat with CoreMark that I think needs to be highlighted.

I'll need to modify CoreMark's main routine, which is a big no-no with regards to sending results to EEMBC.

Due to the massive disparity in processor speeds involved here -- from 25MHz to 4GHz and beyond -- I need wrapper code that computes the number of benchmark iterations that can be performed in one second.  This value is then multiplied so that the actual test takes 25 seconds.

This means now that the main routine should take and return a structure with various bits of info that gets passed back up the chain to the test runner, as opposed to the current method where the number of iterations is a compile-time constant.

This invalidates the CoreMark results as far as submissions to the EEMBC go.

I'm going to fork CoreMark into NeXTMark to highlight that the result should not be submitted to anyone but, well, I guess us.
Lisp Hacker

MindWalker

Tested 0.3 on my mono NeXTstation - working as expected. From the scores it seems that the NeXTstep's drivers for the Sparc graphics are not that fast...

That "Don't panic" message  ;D

20230203_195049.jpg

20230203_195657.jpg 

verdraith

I love that there appears to be a typo in the hostname :)

The next version I release will give more detailed information on what is going on with the Display PostScript stuff:  i.e.

Starting tests.
Line: time taken: 380 ms, raw: 2631.5789, factor: 327.6328
Timer 1: Trials: 10000  App: 0.0200s  Server: 0.1280s  ServerPct: 86.49  Total: 0.1480s

Curve: time taken: 190ms, raw: 5263.1579, factor: 342.9527
Timer 1: Trials: 5000  App: 0.0200s  Server: 0.0960s  ServerPct: 82.76  Total: 0.1160s

Fill: time taken: 170 ms, raw: 5882.3529, factor: 186.3999
Timer 1: Trials: 5000  App: 0.0200s  Server: 0.0640s  ServerPct: 76.19  Total: 0.0840s

Transpose: time taken: 90 ms, raw: 11111.1111, factor: 740.2867
Timer 1: Trials: 150  App: 0.0000s  Server: 0.0800s  ServerPct: 100.00  Total: 0.0800s

Composite: time taken: 740 ms, raw: 1351.3514, factor: 471.6759
Timer 1: Trials: 10000  App: 0.0300s  Server: 0.4960s  ServerPct: 94.30  Total: 0.5260s

Userpath: time taken: 560 ms, raw: 1785.7143, factor: 295.7215
Timer 1: Trials: 10000  App: 0.0100s  Server: 0.3200s  ServerPct: 96.97  Total: 0.3300s

Text: time taken: 310 ms, raw: 3225.8065, factor: 317.2352
Timer 1: Trials: 10000  App: 0.0100s  Server: 0.1120s  ServerPct: 91.80  Total: 0.1220s

Window move: time taken: 170 ms, raw: 5882.3529, factor: 253.4415
Timer 1: Trials: 1100  App: 0.0000s  Server: 0.1600s  ServerPct: 100.00  Total: 0.1600s

Window resize: time taken: 70 ms, raw: 14285.7143, factor: 354.0853
Timer 1: Trials: 81  TotalWall: 0.070001

NXFactor = 365.492401
Tests complete, run time = 10760 ms.
---------------------------------------------------------

'App' is how much drawing time was spent in the actual app, and 'server' is drawing time spent in the DPS server.
Lisp Hacker

verdraith

Version 0.4 is now available.

DPS engine metrics are collected and displayed in the NXFactor log window, and.... CoreMark!

Quad-fat binary is available at https://github.com/Asmodai/NeXT-NSBench/releases/tag/v0.4
Lisp Hacker

andreas_g

I've just run the NXFactor test in Previous (25 MHz 68030, 64 MB). I got an NXFactor of 0,96. Did you normalize the test using real hardware or did you use Previous?

verdraith

I used Previous 2.6 (softfloat branch) compiled on my GNU/Linux laptop for the time being as I don't own any NeXT hardware (yet).

I'd expect there to be some variance, as the DPS system has quite a few variables that can affect the result.  Can you DM me the NXFactor result log?

Looks like the 25MHz 68030 cube is winning the poll, so I'll want someone to run it on that and give me the NXFactor and NXMark results at some point :)
Lisp Hacker

andreas_g

Previous is quite accurate in simulating many parts of NeXT hardware. The one point it is definitely not correct is timings. I do not recommend using Previuos as reference for anything related to timings.