Spacious Mind rating test reloaded

This forum is for general discussions and questions, including Collectors Corner and anything to do with Computer chess.

Moderators: Harvey Williamson, Steve B, Watchman

Forum rules
This textbox is used to restore diagrams posted with the fen tag before the upgrade.
User avatar
spacious_mind
Senior Member
Posts: 4018
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Re: Spacious Mind rating test reloaded

Post by spacious_mind »

Tibono2 wrote: Sun Mar 17, 2024 10:58 am I use LibreOffice Calc application, not MS Excel, thus maybe the below comment is just about a compatibility issue my side:
I noticed your search formula (to retrieve move's score) ends with 1 as last argument; as a consequence any input (even unlisted ones) would retrieve a value from the data list.
Using copy/paste to feed the moves doesn't trigger the drop-list control, and can lead to un-noticed issues. I would rather use false() as the last search argument, meaning a "not found" move would result in #NA value; and so require fixing the input. If I am correct, 0 can be an alternative to false().
Just an additional consistency control, not a mandatory one.

Kind regards,
Eric
Hi Eric

Copy and pasting should have no impact on the spreadsheet. I have protected all the relevant fields. Unless LibreOffice works differently, you should be able to just copy the White moves and paste them into the fields in Column F where the White moves are and then do the same in Column H for the Black moves. I find it a quick and useful way to check on previous results this way. I usually high light all the White moves from the Computer score tab and just paste them for White in the Computer Test tab, and then do the same for Black.

Best regards
Nick
Nick
User avatar
Tibono2
Full Member
Posts: 713
Joined: Mon Jan 16, 2017 7:55 pm
Location: France
Contact:

Re: Spacious Mind rating test reloaded

Post by Tibono2 »

Hello Nick,

here is a first set of results: the King Performance levels.
In the first tab, where scores are aggregated, I left the ones I got from my "reloaded" fork, if any useful.
The scores from your "revised" test are just stored below.
Small cosmetic fix: game 05, computer score tab, header total score should get H37 cell value instead of H36.

My comments: for really bad moves, I like your idea to score a constant penalty (-15). I suspect you apply this starting with -5 lost points or so threshold?

It is very uneasy to tune the test for very low Elo. I suspect a standardization effort is required, this means running the test using a number of weak devices. I started doing that (Novag MK I, Delta-1, Fidelity CC7...). Once I shall have more data I shall share. A concern I spotted: a weak device can stubbornly keep choosing the same move over several consecutive positions from a game test (such as the very same pawn move) regardless of position changes across next moves. By chance, this pawn move can be good enough for a while, and result in irrelevant scores. Of course more test games would give an opportunity for some balance.

Kind regards,
Eric
User avatar
spacious_mind
Senior Member
Posts: 4018
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Re: Spacious Mind rating test reloaded

Post by spacious_mind »

Hi Eric,

I agree it is not easy to get every computer to a satisfactory rating level as their performance varies with each game since each game is unique by itself. But I think you can get pretty close, but it needs the average of several games. What I don't want to do is pick and choose a game or pick and choose a formula depended on a game. All I am doing at the moment is picking any games of that era and there are not that many games pre-1800.
The formulas must be the same for every game that is evaluated. Which is what I have done for these tests.

I am in the process of adding 3 more games to the first five and this set of 8 will be called the "Renaissance School" as they are all Philidor and earlier. What really throws the first 5 tests is game 1 where all the weak computers score so highly but I just think there is going to be other games that will have these things happen to computers where they score higher or lower. Therefore, it is having enough tests that should average it all out. But in all these tests you can get a feel of what the computer understands and does not understand so they are I think all valuable and great to compare on a spreadsheet.

In about 3 or 4 weeks I should have more games ready to share. I am also working on the next school as well which of course will be covering the "Romantic School" I am currently while my other laptops are busy doing the games analysis. playing some games on the first test game of the "Romantic School": (Multitasking like crazy here :) !)

Image

As you can see from the above test with same conditions as the other 5 tests you have. The low computers score quite correctly but this time it is Mephisto TM Lyon that doesn't like it :) The R30 King 2.5 scores again pretty much as expected! Enterprise S a little low too but then as you know in the next game it could be a little high again.

The -15 that I am using is based on a 50% penalty meaning with high rating of 3800 ELO then the penalty is -1900. This of course can be adjusted as well as when the penalty kicks in. Currently I have it set at around a 1.7 pawn loss, which to me is enough to lose a game but since other programs also make losses the -15 penalty seems to balance out at the moment quite well and everyone gets the same penalty including the TM Lyon :)

But I think in order to make other adjustments we need more test games to understand the averages.

Also, I was pleased to see that USCF finally did some adjustments to allow for weaker players. These fit in nicely with any low scores in these tests:

Senior Master 2400 and up
National Master 2200–2399
Expert 2000–2199
Class A 1800–1999
Class B 1600–1799
Class C 1400–1599
Class D 1200–1399
Class E 1000–1199
Class F 800–999
Class G 600–799
Class H 400–599
Class I 200–399
Class J 100–199

The above should compare nicely with your King test where Fun 0 is suitable for Class J players. Pity that Fide can't do the same unless I missed it. Anyway, I shall be using USCF moving forwards because of the above.

Anyway, Eric I think we should hold back with a final tweaking of penalties etc until we have a few more test games and then we can do final mass tweak.?

Best regards
Nick
Nick
User avatar
spacious_mind
Senior Member
Posts: 4018
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Re: Spacious Mind rating test reloaded

Post by spacious_mind »

Tibono2 wrote: Wed Mar 20, 2024 1:26 pm Small cosmetic fix: game 05, computer score tab, header total score should get H37 cell value instead of H36.
Hi Eric,

I made the correction on Game. Here is the link:

https://www.spacious-mind.com/forum_rep ... evised.zip

Regards
Nick
Nick
User avatar
spacious_mind
Senior Member
Posts: 4018
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Re: Spacious Mind rating test reloaded

Post by spacious_mind »

Tibono2 wrote: Wed Mar 20, 2024 1:26 pm Hello Nick,

It is very uneasy to tune the test for very low Elo. I suspect a standardization effort is required; this means running the test using a number of weak devices. I started doing that (Novag MK I, Delta-1, Fidelity CC7...). Once I shall have more data I shall share. A concern I spotted: a weak device can stubbornly keep choosing the same move over several consecutive positions from a game test (such as the very same pawn move) regardless of position changes across next moves. By chance, this pawn move can be good enough for a while, and result in irrelevant scores. Of course, more test games would give an opportunity for some balance.

Kind regards,
Eric
Yes, please test Mk1, Delta-1 etc. I only tested Delta-1 on the new test sheet I created so it is missing on our current 5 tests. Anyone else that wants to test weak ones, please do!

Best regards
Nick
Nick
User avatar
fourthirty
Full Member
Posts: 763
Joined: Fri Dec 06, 2013 8:46 pm
Location: San Francisco

Re: Spacious Mind rating test reloaded

Post by fourthirty »

spacious_mind wrote: Sat Mar 16, 2024 6:13 pm It has been 5 years since the last time I posted or visited a Forum. I hope you are all doing well. I had been very busy in recent years as a result of a promotion where I ended up being the head of my business area for North America for the Spanish company I worked for.
Wow, I just read this. Welcome back Nick! Glad you are doing well.

Greg
User avatar
spacious_mind
Senior Member
Posts: 4018
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Re: Spacious Mind rating test reloaded

Post by spacious_mind »

fourthirty wrote: Fri Mar 22, 2024 7:40 pm Wow, I just read this. Welcome back Nick! Glad you are doing well.

Greg
Thanks Greg, I am glad to be back.
Nick
User avatar
Tibono2
Full Member
Posts: 713
Joined: Mon Jan 16, 2017 7:55 pm
Location: France
Contact:

Re: Spacious Mind rating test reloaded

Post by Tibono2 »

Hello Nick, hi all,

A set of 3 weak chess computers tests. Scores look rather fine over five test games.

Best regards,
Eric
User avatar
spacious_mind
Senior Member
Posts: 4018
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Re: Spacious Mind rating test reloaded

Post by spacious_mind »

Tibono2 wrote: Sat Mar 23, 2024 7:21 pm Hello Nick, hi all,

A set of 3 weak chess computers tests. Scores look rather fine over five test games.

Best regards,
Eric
Hi Eric,

Wow you are quick. The scores are not bad over 5 tests. Can't wait to get the other test sheets completed so we can fine tune the minus scoring across all the test sheets. I have almost finished the analysis for 4 tests, then 7 more to go.
Nick
User avatar
Tibono2
Full Member
Posts: 713
Joined: Mon Jan 16, 2017 7:55 pm
Location: France
Contact:

Re: Spacious Mind rating test reloaded

Post by Tibono2 »

Hello,

next batch completed, 3 weak ones (Boris, Fidelity CC10, Conic CC).
Have fun,
Eric
User avatar
spacious_mind
Senior Member
Posts: 4018
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Re: Spacious Mind rating test reloaded

Post by spacious_mind »

Tibono2 wrote: Wed Mar 27, 2024 4:31 pm Hello,

next batch completed, 3 weak ones (Boris, Fidelity CC10, Conic CC).
Have fun,
Eric
Thanks Eric, it will be another week before I can share games. Still busy doing evals. I see that all 3 still over performed.

Best regards
Nick
Nick
User avatar
Tibono2
Full Member
Posts: 713
Joined: Mon Jan 16, 2017 7:55 pm
Location: France
Contact:

Re: Spacious Mind rating test reloaded

Post by Tibono2 »

Hello Nick,
I suspect a typo (or maybe an un-rolled-back test) in game1, score tab, D26 cell (STOCKFISH 16-1 on i7 PC with 30s search can't have moved 21.Qxf7+).
Teaser: I spot it easily using a score_analysis worksheet I am currently developing; I am still testing it and exploring some best practices to use it.
I plan to share it soon...
Also keeping on performing the tests with some additional devices; slightly increasing the expected strength. Mike Johnson's Novag Chess Champion Super System III is being tortured... :mrgreen:
Kind regards,
Eric
User avatar
spacious_mind
Senior Member
Posts: 4018
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Re: Spacious Mind rating test reloaded

Post by spacious_mind »

Tibono2 wrote: Thu Mar 28, 2024 8:36 pm Hello Nick,
I suspect a typo (or maybe an un-rolled-back test) in game1, score tab, D26 cell (STOCKFISH 16-1 on i7 PC with 30s search can't have moved 21.Qxf7+).
Teaser: I spot it easily using a score_analysis worksheet I am currently developing; I am still testing it and exploring some best practices to use it.
I plan to share it soon...
Also keeping on performing the tests with some additional devices; slightly increasing the expected strength. Mike Johnson's Novag Chess Champion Super System III is being tortured... :mrgreen:
Kind regards,
Eric
Hi Eric

Ignore the Stockfish results, I forgot I used 60 PV on its test which probably is not it's fastest score for 30 seconds and it's possible I did a typo on move 21. If you want to run SF16.1 on your PC then please do.

Yes, please share your worksheet when you are ready to do so.

Regards
Nick
Nick
User avatar
Tibono2
Full Member
Posts: 713
Joined: Mon Jan 16, 2017 7:55 pm
Location: France
Contact:

Re: Spacious Mind rating test reloaded

Post by Tibono2 »

Hi Nick,

I am stepping forward with my analysis tool, now duplicating it for test game number 4.
It enabled me discovering an issue in this game 4 with move #25 White: Rxa5 is granted 0 points whilst the G1-DATA is set to 27,90.
The issue is with the score search area:$'G1-DATA'.$CM$2:$CN$39 (should expand to $40)
BR,
Eric
User avatar
Tibono2
Full Member
Posts: 713
Joined: Mon Jan 16, 2017 7:55 pm
Location: France
Contact:

Re: Spacious Mind rating test reloaded

Post by Tibono2 »

...and next to the above, I re-uploaded those results (Boris & Conic CC did play 25.Rxa5).
Best, Eric
Post Reply