Our dedicated chess computers in fact 300-350 elo weaker ??

This forum is for general discussions and questions, including Collectors Corner and anything to do with Computer chess.

Moderators: Harvey Williamson, Steve B, Watchman

Forum rules
This textbox is used to restore diagrams posted with the fen tag before the upgrade.
User avatar
mclane
Senior Member
Posts: 1600
Joined: Sun Jul 29, 2007 9:04 am
Location: Luenen, germany, US of europe
Contact:

Our dedicated chess computers in fact 300-350 elo weaker ??

Post by mclane »

I was confronted with the idea that our dedicated chess computer
Have in fact a 300-350 elo lower rating then ssdf suggests.

E.g. Mm5 1575 elo.
What seems like a fairy tale today may be reality tomorrow.
Here we have a fairy tale of the day after tomorrow....
User avatar
Steve B
Site Admin
Posts: 10140
Joined: Sun Jul 29, 2007 10:02 am
Location: New York City USofA
Contact:

Re: Our dedicated chess computers in fact 300-350 elo weaker

Post by Steve B »

mclane wrote:I was confronted with the idea that our dedicated chess computer
Have in fact a 300-350 elo lower rating then ssdf suggests.

E.g. Mm5 1575 elo.
you are saying that the SSDF ratings for dedicated chess computers are all overstated by 300-350 Elo?

SSDF ratings correlated quite nicely with Selective Search ratings for more then 30 years

they both cant be all wrong

Doubtful Regards
Steve
User avatar
mclane
Senior Member
Posts: 1600
Joined: Sun Jul 29, 2007 9:04 am
Location: Luenen, germany, US of europe
Contact:

Post by mclane »

This is not my opinion. I was confronted with this opinion in a discussion about
Ratings in CSS forum.

My opinion is that due to games dedicated computers vs. humans the ratings were always calibrated at that time. While in later days of computerchess
The ratings are not calibrated anymore.

I remember that super constellation, mm2 and mm4 or 5 was quite often
Playing against humans.

Porzer Open, Aegon tournament, in chess clubs, ...
There were the position test ratings ( Bratko kovacs, bednorz toe Nissen etc.)
And rating lists of ssdf and the British rating lists.

I do believe in the 1900 elo not in the 300 or 350 elo weaker numbers.
What seems like a fairy tale today may be reality tomorrow.
Here we have a fairy tale of the day after tomorrow....
User avatar
ricard60
Senior Member
Posts: 1285
Joined: Thu Aug 09, 2007 2:46 pm
Location: Puerto Ordaz

Post by ricard60 »

One of the ways to have an idea of an elo of a machine is to play it against humans but also we can test it against positions, there are a lot of test with positions like the colditz, BT-2450, BT-2630,BS2830 and others. With those test you can also have an idea of the strength of the machine. If you do those tests today you will have the same result if you do those test in 5 years from now. ¿So why changing the elo of the machines?

Elo regards

Ricardo
Larry
Senior Member
Posts: 2269
Joined: Wed Aug 01, 2007 8:42 am
Location: Gosford, NSW Australia

Post by Larry »

I think you'll find the problem with rating the dedicateds is that only
the early games against the same person have a lot of meaning. Once
the owner figures out how to beat the comp, he can more or less go
ahead and beat at will, on any level, because he has long since found
holes in it's knowledge. It seems to me that the ratings are a fair
reflection of the strength of each comp.
Nick made a comment about this a few months ago, when he said that
a given dedicated chess comp would not be so easy to beat if someone
other than yourself made the first several moves for you.
L
User avatar
spacious_mind
Senior Member
Posts: 3999
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

The problem lies in that for example the 40/40 List and other modern lists are no longer realistic to reality. They are only worthwhile as a ranking of engines to engines for comparisons and totally useless for anything else.

MM5 per what you are used to is listed at ELO 1982. You could test it by playing it against any of the below listed engines running at your fastest PC speed and MM5 would end up with a performance of 1500 ELO.

Image

You could try it, play any of the above on a Pentium I7 against Mephisto MM5 20 times at 40/40, 30 seconds or 2hr/40 and see what the results are. Heck you could even use that Athlon 64 X2 4600+ (2.4 GHz) which is about 4 times slower than an I7 (about 4x50=200 ELO) and the results would still be the same.

MM5 would be lucky to get one or two draws never mind a win. So the score would be engine 19 points and MM5 1 point.

Impossible right? considering on that list MM5 has the same ELO rating?

Do you remember the days and reports in CS&S etc where top Grandmasters played and lost against Genius 6 or Rebel 8 or MChess Pro? And that was with hardware that played on a lowly Pentium 300 or 450? So what chance would these Grandmaster today have against for example Prodeo?

Image

Well after this match the Grandmaster would probably loose a couple of hundred points against Prodeo and be lucky escape with a final rating of 2400-2500. (About the same loss as what you are seeing with MM5)

So it doesn't surprise me at all that the engine people have completely lost their sense of reality. To me a good list matches the ability of the programs against humans. It's for this reason that I stopped taking other lists seriously and stick to my own line in the sand list that I created a couple of years ago taking the data from Schachcomputer.Info and freezing it in time.

Computer rating lists nowadays only interest me if I can match the ELO to some degree with humans. Everything else in my opinion is uninteresting.

http://www.spacious-mind.com/html/ratin ... ments.html

best regards
Best regards
Nick
User avatar
Steve B
Site Admin
Posts: 10140
Joined: Sun Jul 29, 2007 10:02 am
Location: New York City USofA
Contact:

Post by Steve B »

spacious_mind wrote:The problem lies in that for example the 40/40 List and other modern lists are no longer realistic to reality. They are only worthwhile as a ranking of engines to engines for comparisons and totally useless for anything else.
which is all that interests me
I own many computers and I want to know how they will play against each other ..could care less how they play against humans or against pc engines or against other modified computers
the SSDF and Selective Search Lists remain accurate and very meaningful to this day.. as do the BT test suites ..and other indicia of rating.. all of which correlate well with each other and have done so for 30+ years

spacious_mind wrote:
Computer rating lists nowadays only interest me if I can match the ELO to some degree with humans. Everything else in my opinion is uninteresting.
and they only interest me if they can tell me how they will play against other dedicated computers..only mildly interested in performance VS humans
as mentioned elsewhere Humans can manipulate their play against computers and over time create false results increasing their win percentage
Selectice Search on occasion would publish a rating list Vs Humans and I never paid much attention to it
Hallsworth eventually dropped publishing the lists due to lack of submissions

you would change the age old definition of a dedicated chess computer and now you would change their historical ratings

your doing well Nick
:wink:


Alt Left Regards
Steve
Last edited by Steve B on Thu Jun 29, 2017 11:32 am, edited 1 time in total.
User avatar
spacious_mind
Senior Member
Posts: 3999
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

Steve B wrote:
spacious_mind wrote:The problem lies in that for example the 40/40 List and other modern lists are no longer realistic to reality. They are only worthwhile as a ranking of engines to engines for comparisons and totally useless for anything else.
which is all that interests me
I own many computers and I want to know how they will play against each other ..could care less how they play against humans or against pc engines or against other modified computers
the SSDF and Selective Search Lists remain accurate and very meaningful to this day.. as do the BT test suites ..and other indicia of rating.. all of which correlate well with each other and have done so for 30+ years

spacious_mind wrote:
Computer rating lists nowadays only interest me if I can match the ELO to some degree with humans. Everything else in my opinion is uninteresting.
and they only interest me if they can tell me how they will play against other dedicated computers..only mildly interested in performance VS humans
as mentioned elsewhere Humans can manipulate their play against computers and over time create false results increasing their win percentage
Selectice Search on occasion would publish a rating list Vs Humans and I never paid much attention to it

you would change the age old definition of a dedicated chess computer and now you would change their historical ratings

your doing well Nick
:wink:


Alt Left Regards
Steve
I seem to be the only one Steve that has stuck to their historical ratings. SSDF doesn't it is out of proportion and has been for years. Why do you think they show at the bottom of the list some old defunct program hardware and at the top of the list the latest newest program. So how do you compare the old defunct program hardware to the hardware at the top of the list. You can't there is no comparison.
Nick
User avatar
Steve B
Site Admin
Posts: 10140
Joined: Sun Jul 29, 2007 10:02 am
Location: New York City USofA
Contact:

Post by Steve B »

spacious_mind wrote:
I seem to be the only one Steve that has stuck to their historical ratings. SSDF doesn't it is out of proportion and has been for years. Why do you think they show at the bottom of the list some old defunct program hardware and at the top of the list the latest newest program. So how do you compare the old defunct program hardware to the hardware at the top of the list. You can't there is no comparison.

I don't compare old dedicated computers to modern PC engines
I cant speak to the modern testing methods of the SSDF today
I guess Lars Sandin could do that
not sure he would take an old dedicated computer and play it against a PC engine for a rating
I think he does that with dedicated computers that are PC engine based like the Phoenix computers


I am interested though in the BT test suite performance of some of the modern PC engines you listed with a 1500 rating
my guess is they would score much higher then 1500 which I think is an indication that something is not quite right with their rating
not sure of that though...
any tests like that available?

Drilling Down Regards
Steve
User avatar
spacious_mind
Senior Member
Posts: 3999
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

Steve B wrote:
spacious_mind wrote:
I seem to be the only one Steve that has stuck to their historical ratings. SSDF doesn't it is out of proportion and has been for years. Why do you think they show at the bottom of the list some old defunct program hardware and at the top of the list the latest newest program. So how do you compare the old defunct program hardware to the hardware at the top of the list. You can't there is no comparison.

I don't compare old dedicated computers to modern PC engines
I cant speak to the modern testing methods of the SSDF today
I guess Lars Sandin could do that
not sure he would take an old dedicated computer and play it against a PC engine for a rating
I think he does that with dedicated computers that are PC engine based like the Phoenix computers


I am interested though in the BT test suite performance of some of the modern PC engines you listed with a 1500 rating
my guess is they would score much higher then 1500 which I think is an indication that something is not quite right with their rating
not sure of that though...
any tests like that available?

Drilling Down Regards
Steve
There are plenty of tests you can try my rating test which would accurately show their results of those tests for all and everything ;)

The errors in modern lists is that they seem to think that there is this magical ceiling of 3400 ELO and god forbid that someone gets past it. That is what is ruining previously good ratings of old programs. If there is a ceiling you should be reaching it through diminishing returns meaning that today if Komodo were 9 points better than Stockfish at that ceiling as it shows today then perhaps in reality through diminishing returns the difference is really 1 point between them and add 2 points behind the decimal.

It needs a total reinvention of the rating calculation system as it don't work.

Besides us humans buy computers so we have every right to know where we stand accurately on any list.

Best regards
Nick
User avatar
Steve B
Site Admin
Posts: 10140
Joined: Sun Jul 29, 2007 10:02 am
Location: New York City USofA
Contact:

Post by Steve B »

spacious_mind wrote:
The errors in modern lists is that they seem to think that there is this magical ceiling of 3400 ELO and god forbid that someone gets past it. That is what is ruining previously good ratings of old programs.
OK so your beef is with the modern rating lists and not the older established ones like Selective Search ( or the old SSDF lists) which NEVER had older computers play against MODERN pc engines
is that correct?
I didn't get that from you first post
if so then that's my bad and I can see your point

Missed Your Point (I think) Regards
Steve
User avatar
paulwise3
Senior Member
Posts: 1505
Joined: Tue Jan 06, 2015 10:56 am
Location: Eindhoven, Netherlands

Post by paulwise3 »

spacious_mind wrote: The errors in modern lists is that they seem to think that there is this magical ceiling of 3400 ELO and god forbid that someone gets past it. That is what is ruining previously good ratings of old programs.
You found the solution Nick!
So this means our rating of dedicated machines is ok, and that programs like Komodo and Stockfish are thus rated 350 points too low!!! :-P

Rating the rating system regards,
Paul
2024 Special thread: viewtopic.php?f=3&t=12741
2024 Special results and standings: https://schaakcomputers.nl/paul_w/Tourn ... 25_06.html
If I am mistaken, it must be caused by a horizon effect...
donkeylane
Full Member
Posts: 679
Joined: Mon Aug 29, 2016 8:31 pm
Location: Cheshunt, Hertfordshire, UK

Post by donkeylane »

I subscribed to selective search magazine for a few years,and following games in the magazine by dedicateds against rated humans,there is no doubt in my mind the ratings are not far off,of course if you play a good dedicated often enough you will find a weakness in an opening line,and could beat it almost at will,but whenever you play another human ,whether in club games or a higher level,that is not how the real scenario works.There are plenty of tests,including giving the computer positions out of the chess informant periodical ,which I have done,and the R30 for one has done well .
User avatar
spacious_mind
Senior Member
Posts: 3999
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

Steve B wrote:
spacious_mind wrote:
The errors in modern lists is that they seem to think that there is this magical ceiling of 3400 ELO and god forbid that someone gets past it. That is what is ruining previously good ratings of old programs.
OK so your beef is with the modern rating lists and not the older established ones like Selective Search ( or the old SSDF lists) which NEVER had older computers play against MODERN pc engines
is that correct?
I didn't get that from you first post
if so then that's my bad and I can see your point

Missed Your Point (I think) Regards
Steve
Hi Steve,

Yes I believe that the problem lies in keeping all the programs within a range. Therefore the stronger the top end gets the more you lower ratings at the bottom and create some crazy justification why that needs to be done. Ie...miraculously the chess player today is sooooooo much better than the chess player from 10 years ago and he can now beat all the dedicated computers etc...which you know is bs.

The challenge on the top end is that if you use today's formula then all of a sudden you might see R30 rated at 2550 or something and no one is going to like that either.

Therefore the rating calculation that was created years ago is no longer adequate for todays top engines and fast computer speeds.

Best regards
Nick
User avatar
spacious_mind
Senior Member
Posts: 3999
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

When I mentioned in an earlier post that I seemed to be the only one who has stayed faithful to the old ratings take a look at this CCR List from 1995 which was shortly before PLY (SSDF) decided to mess around (founders of the messing around) and downgrade chess computers in order at that time make space for the new generation chess software and winboard engines.

Image

Back then dedicated computers players were about a 1000 times more than what we have today. But I guess a committee of two can decide that if the shoe doesn't fit change it.

So show me list today that comes as faithfully close to what experts of the past reported on as what I showed here:

http://www.spacious-mind.com/html/ratin ... ments.html

Look at the USCF ratings and compare them the above CCR list and you will get my point.

At the time when I did this a couple of years ago, I was soooo tempted to increase start base even higher to make the USCF ratings resemble even more closely to the CCR list, but my temptation stopped because Info list for dedicated chess computers is the best there is today with the amount of games played and collected. So I used Info's list as an accurate start base. Even though over the years that list swayed with the wind as well. Starting high then adjusting downwards by 100 ELO's to suit SSDF and then a few years ago adjusting back upwards again by 100 ELO.

Even today Info has this self inflicted barrier where under no circumstance should a dedicated computer program be listed above 2400 ELO. The only exception being R40 which no one has except for Steve :) So no one really cares that it lies above 2400 today :) So probably even Hiarcs 1% which easily won their tournament continues to fit below 2400 although overall it trounces every other dedicated chess computer :)

So who is to say what really is right when a few can decide for the rest of the world that the number 100 is appropriate for a reduction or an increase. So the moral of the story is, "If the shoe doesn't fit, chop of some toes and then it will fit nicely!" ;)

Best regards
Nick
Post Reply