Hi Eric,
I agree it is not easy to get every computer to a satisfactory rating level as their performance varies with each game since each game is unique by itself. But I think you can get pretty close, but it needs the average of several games. What I don't want to do is pick and choose a game or pick and choose a formula depended on a game. All I am doing at the moment is picking any games of that era and there are not that many games pre-1800.
The formulas must be the same for every game that is evaluated. Which is what I have done for these tests.
I am in the process of adding 3 more games to the first five and this set of 8 will be called the "Renaissance School" as they are all Philidor and earlier. What really throws the first 5 tests is game 1 where all the weak computers score so highly but I just think there is going to be other games that will have these things happen to computers where they score higher or lower. Therefore, it is having enough tests that should average it all out. But in all these tests you can get a feel of what the computer understands and does not understand so they are I think all valuable and great to compare on a spreadsheet.
In about 3 or 4 weeks I should have more games ready to share. I am also working on the next school as well which of course will be covering the "Romantic School" I am currently while my other laptops are busy doing the games analysis. playing some games on the first test game of the "Romantic School": (Multitasking like crazy here
!)
As you can see from the above test with same conditions as the other 5 tests you have. The low computers score quite correctly but this time it is Mephisto TM Lyon that doesn't like it
The R30 King 2.5 scores again pretty much as expected! Enterprise S a little low too but then as you know in the next game it could be a little high again.
The -15 that I am using is based on a 50% penalty meaning with high rating of 3800 ELO then the penalty is -1900. This of course can be adjusted as well as when the penalty kicks in. Currently I have it set at around a 1.7 pawn loss, which to me is enough to lose a game but since other programs also make losses the -15 penalty seems to balance out at the moment quite well and everyone gets the same penalty including the TM Lyon
But I think in order to make other adjustments we need more test games to understand the averages.
Also, I was pleased to see that USCF finally did some adjustments to allow for weaker players. These fit in nicely with any low scores in these tests:
Senior Master 2400 and up
National Master 2200–2399
Expert 2000–2199
Class A 1800–1999
Class B 1600–1799
Class C 1400–1599
Class D 1200–1399
Class E 1000–1199
Class F 800–999
Class G 600–799
Class H 400–599
Class I 200–399
Class J 100–199
The above should compare nicely with your King test where Fun 0 is suitable for Class J players. Pity that Fide can't do the same unless I missed it. Anyway, I shall be using USCF moving forwards because of the above.
Anyway, Eric I think we should hold back with a final tweaking of penalties etc until we have a few more test games and then we can do final mass tweak.?
Best regards
Nick