Spacious Mind rating test reloaded

spacious_mind · Post by **spacious_mind** » Tue Mar 19, 2024 7:36 pm

Tibono2 wrote: ↑Sun Mar 17, 2024 10:58 am I use LibreOffice Calc application, not MS Excel, thus maybe the below comment is just about a compatibility issue my side:
I noticed your search formula (to retrieve move's score) ends with 1 as last argument; as a consequence any input (even unlisted ones) would retrieve a value from the data list.
Using copy/paste to feed the moves doesn't trigger the drop-list control, and can lead to un-noticed issues. I would rather use false() as the last search argument, meaning a "not found" move would result in #NA value; and so require fixing the input. If I am correct, 0 can be an alternative to false().
Just an additional consistency control, not a mandatory one.

Kind regards,
Eric

Hi Eric

Copy and pasting should have no impact on the spreadsheet. I have protected all the relevant fields. Unless LibreOffice works differently, you should be able to just copy the White moves and paste them into the fields in Column F where the White moves are and then do the same in Column H for the Black moves. I find it a quick and useful way to check on previous results this way. I usually high light all the White moves from the Computer score tab and just paste them for White in the Computer Test tab, and then do the same for Black.

Best regards
Nick

Tibono2 · Post by **Tibono2** » Wed Mar 20, 2024 1:26 pm

Hello Nick,

here is a first set of results: the King Performance levels.
In the first tab, where scores are aggregated, I left the ones I got from my "reloaded" fork, if any useful.
The scores from your "revised" test are just stored below.
Small cosmetic fix: game 05, computer score tab, header total score should get H37 cell value instead of H36.

My comments: for really bad moves, I like your idea to score a constant penalty (-15). I suspect you apply this starting with -5 lost points or so threshold?

It is very uneasy to tune the test for very low Elo. I suspect a standardization effort is required, this means running the test using a number of weak devices. I started doing that (Novag MK I, Delta-1, Fidelity CC7...). Once I shall have more data I shall share. A concern I spotted: a weak device can stubbornly keep choosing the same move over several consecutive positions from a game test (such as the very same pawn move) regardless of position changes across next moves. By chance, this pawn move can be good enough for a while, and result in irrelevant scores. Of course more test games would give an opportunity for some balance.

Kind regards,
Eric

spacious_mind · Post by **spacious_mind** » Wed Mar 20, 2024 3:52 pm

Hi Eric,

I agree it is not easy to get every computer to a satisfactory rating level as their performance varies with each game since each game is unique by itself. But I think you can get pretty close, but it needs the average of several games. What I don't want to do is pick and choose a game or pick and choose a formula depended on a game. All I am doing at the moment is picking any games of that era and there are not that many games pre-1800.
The formulas must be the same for every game that is evaluated. Which is what I have done for these tests.

I am in the process of adding 3 more games to the first five and this set of 8 will be called the "Renaissance School" as they are all Philidor and earlier. What really throws the first 5 tests is game 1 where all the weak computers score so highly but I just think there is going to be other games that will have these things happen to computers where they score higher or lower. Therefore, it is having enough tests that should average it all out. But in all these tests you can get a feel of what the computer understands and does not understand so they are I think all valuable and great to compare on a spreadsheet.

In about 3 or 4 weeks I should have more games ready to share. I am also working on the next school as well which of course will be covering the "Romantic School" I am currently while my other laptops are busy doing the games analysis. playing some games on the first test game of the "Romantic School": (Multitasking like crazy here

!)

As you can see from the above test with same conditions as the other 5 tests you have. The low computers score quite correctly but this time it is Mephisto TM Lyon that doesn't like it

The R30 King 2.5 scores again pretty much as expected! Enterprise S a little low too but then as you know in the next game it could be a little high again.

The -15 that I am using is based on a 50% penalty meaning with high rating of 3800 ELO then the penalty is -1900. This of course can be adjusted as well as when the penalty kicks in. Currently I have it set at around a 1.7 pawn loss, which to me is enough to lose a game but since other programs also make losses the -15 penalty seems to balance out at the moment quite well and everyone gets the same penalty including the TM Lyon

But I think in order to make other adjustments we need more test games to understand the averages.

Also, I was pleased to see that USCF finally did some adjustments to allow for weaker players. These fit in nicely with any low scores in these tests:

Senior Master 2400 and up
National Master 2200–2399
Expert 2000–2199
Class A 1800–1999
Class B 1600–1799
Class C 1400–1599
Class D 1200–1399
Class E 1000–1199
Class F 800–999
Class G 600–799
Class H 400–599
Class I 200–399
Class J 100–199

The above should compare nicely with your King test where Fun 0 is suitable for Class J players. Pity that Fide can't do the same unless I missed it. Anyway, I shall be using USCF moving forwards because of the above.

Anyway, Eric I think we should hold back with a final tweaking of penalties etc until we have a few more test games and then we can do final mass tweak.?

Best regards
Nick

spacious_mind · Post by **spacious_mind** » Wed Mar 20, 2024 5:43 pm

Tibono2 wrote: ↑Wed Mar 20, 2024 1:26 pm Small cosmetic fix: game 05, computer score tab, header total score should get H37 cell value instead of H36.

Hi Eric,

I made the correction on Game. Here is the link:

https://www.spacious-mind.com/forum_rep ... evised.zip

Regards
Nick

spacious_mind · Post by **spacious_mind** » Thu Mar 21, 2024 9:22 am

Tibono2 wrote: ↑Wed Mar 20, 2024 1:26 pm Hello Nick,

It is very uneasy to tune the test for very low Elo. I suspect a standardization effort is required; this means running the test using a number of weak devices. I started doing that (Novag MK I, Delta-1, Fidelity CC7...). Once I shall have more data I shall share. A concern I spotted: a weak device can stubbornly keep choosing the same move over several consecutive positions from a game test (such as the very same pawn move) regardless of position changes across next moves. By chance, this pawn move can be good enough for a while, and result in irrelevant scores. Of course, more test games would give an opportunity for some balance.

Kind regards,
Eric

Yes, please test Mk1, Delta-1 etc. I only tested Delta-1 on the new test sheet I created so it is missing on our current 5 tests. Anyone else that wants to test weak ones, please do!

Best regards
Nick

fourthirty · Post by **fourthirty** » Fri Mar 22, 2024 7:40 pm

spacious_mind wrote: ↑Sat Mar 16, 2024 6:13 pm It has been 5 years since the last time I posted or visited a Forum. I hope you are all doing well. I had been very busy in recent years as a result of a promotion where I ended up being the head of my business area for North America for the Spanish company I worked for.

Wow, I just read this. Welcome back Nick! Glad you are doing well.

Greg

spacious_mind · Post by **spacious_mind** » Fri Mar 22, 2024 10:16 pm

fourthirty wrote: ↑Fri Mar 22, 2024 7:40 pm Wow, I just read this. Welcome back Nick! Glad you are doing well.

Greg

Thanks Greg, I am glad to be back.

Tibono2 · Post by **Tibono2** » Sat Mar 23, 2024 7:21 pm

Hello Nick, hi all,

A set of 3 weak chess computers tests. Scores look rather fine over five test games.

Best regards,
Eric

spacious_mind · Post by **spacious_mind** » Sat Mar 23, 2024 7:31 pm

Tibono2 wrote: ↑Sat Mar 23, 2024 7:21 pm Hello Nick, hi all,

A set of 3 weak chess computers tests. Scores look rather fine over five test games.

Best regards,
Eric

Hi Eric,

Wow you are quick. The scores are not bad over 5 tests. Can't wait to get the other test sheets completed so we can fine tune the minus scoring across all the test sheets. I have almost finished the analysis for 4 tests, then 7 more to go.

Tibono2 · Post by **Tibono2** » Wed Mar 27, 2024 4:31 pm

Hello,

next batch completed, 3 weak ones (Boris, Fidelity CC10, Conic CC).
Have fun,
Eric

spacious_mind · Post by **spacious_mind** » Wed Mar 27, 2024 8:02 pm

Tibono2 wrote: ↑Wed Mar 27, 2024 4:31 pm Hello,

next batch completed, 3 weak ones (Boris, Fidelity CC10, Conic CC).
Have fun,
Eric

Thanks Eric, it will be another week before I can share games. Still busy doing evals. I see that all 3 still over performed.

Best regards
Nick

Tibono2 · Post by **Tibono2** » Thu Mar 28, 2024 8:36 pm

Hello Nick,
I suspect a typo (or maybe an un-rolled-back test) in game1, score tab, D26 cell (STOCKFISH 16-1 on i7 PC with 30s search can't have moved 21.Qxf7+).
Teaser: I spot it easily using a score_analysis worksheet I am currently developing; I am still testing it and exploring some best practices to use it.
I plan to share it soon...
Also keeping on performing the tests with some additional devices; slightly increasing the expected strength. Mike Johnson's Novag Chess Champion Super System III is being tortured...

Kind regards,
Eric

spacious_mind · Post by **spacious_mind** » Thu Mar 28, 2024 10:07 pm

Tibono2 wrote: ↑Thu Mar 28, 2024 8:36 pm Hello Nick,
I suspect a typo (or maybe an un-rolled-back test) in game1, score tab, D26 cell (STOCKFISH 16-1 on i7 PC with 30s search can't have moved 21.Qxf7+).
Teaser: I spot it easily using a score_analysis worksheet I am currently developing; I am still testing it and exploring some best practices to use it.
I plan to share it soon...
Also keeping on performing the tests with some additional devices; slightly increasing the expected strength. Mike Johnson's Novag Chess Champion Super System III is being tortured...
Kind regards,
Eric

Hi Eric

Ignore the Stockfish results, I forgot I used 60 PV on its test which probably is not it's fastest score for 30 seconds and it's possible I did a typo on move 21. If you want to run SF16.1 on your PC then please do.

Yes, please share your worksheet when you are ready to do so.

Regards
Nick

Tibono2 · Post by **Tibono2** » Sat Mar 30, 2024 5:45 pm

Hi Nick,

I am stepping forward with my analysis tool, now duplicating it for test game number 4.
It enabled me discovering an issue in this game 4 with move #25 White: Rxa5 is granted 0 points whilst the G1-DATA is set to 27,90.
The issue is with the score search area:$'G1-DATA'.$CM$2:$CN$39 (should expand to $40)
BR,
Eric

Tibono2 · Post by **Tibono2** » Sat Mar 30, 2024 6:18 pm

...and next to the above, I re-uploaded those results (Boris & Conic CC did play 25.Rxa5).
Best, Eric

HIARCS Chess Forums

Spacious Mind rating test reloaded

Re: Spacious Mind rating test reloaded

Re: Spacious Mind rating test reloaded

Re: Spacious Mind rating test reloaded

Re: Spacious Mind rating test reloaded

Re: Spacious Mind rating test reloaded

Re: Spacious Mind rating test reloaded

Re: Spacious Mind rating test reloaded

Re: Spacious Mind rating test reloaded

Re: Spacious Mind rating test reloaded

Re: Spacious Mind rating test reloaded

Re: Spacious Mind rating test reloaded

Re: Spacious Mind rating test reloaded

Re: Spacious Mind rating test reloaded

Re: Spacious Mind rating test reloaded

Re: Spacious Mind rating test reloaded