Alternative (informal) rating methods for dedicated machines

This forum is for general discussions and questions, including Collectors Corner and anything to do with Computer chess.

Moderators: Harvey Williamson, Steve B, Watchman

Forum rules
This textbox is used to restore diagrams posted with the fen tag before the upgrade.
Post Reply
Reinfeld
Member
Posts: 486
Joined: Thu Feb 17, 2011 3:54 am
Location: Tacoma, WA

Alternative (informal) rating methods for dedicated machines

Post by Reinfeld »

It seems to be received wisdom among the assembled here that computer vs rated human is the truest method. Matching machine against machine meets with a certain amount of skepticism, even in the context of the hard-working Hallsworth efforts.

Machine-v-machine tournaments are wonderful, partly because they yield verifiable results. They are utterly VALID on one level, in the sense that they provide clear data. The debate revolves around the worth of the data, the potential bias of operators, etc. Obviously, as posters have said elsewhere, the number of games is an important factor. Other factors include the depth and accuracy of opening books, time controls and programmer style.

And yet, the approach seems crude. Suppose a Morsch program, known for tactical ability, randomly opts for an opening that requires a "solid" approach, based on positional factors. Would it perform as well in that context? Would it be beaten by a Lang program geared toward long-term factors? A human GM would naturally choose lines of play that suit his style, and try to shift a match (or tournament) to his preferred methods. Presumably, he would try to avoid the types of positions where his opponent excels. Shouldn't we expect the same of a program?

The opening book aspect is a topic in itself. I've watched good machines tie themselves in knots against weaker machines by random selection of a bad line. I haven't researched this, but it would be interesting to see examples of even-strength machines with smaller vs larger opening books. (It also strikes me that one testing method, similar to the Spacious Mind experiments, would limit machines to particular variations. For example, does anyone know whether machine-v-machine tournaments have opted for a single line, say the Ruy Lopez Exchange?)

And yet, which opening books, and at what point in recent chess history were they derived? What happens when an older machine automatically runs through its programmed paces, and gets belted by a crushing refutation of a once-popular line, since demolished by master praxis? It's been a minor sidelight of mine to seek the literary sources of certain openings used by machines (I have a vast library, much more extensive than my computer collection, which makes the exercise more interesting.)

Logic dictates that my old reliable first-love Excellence, constructed ca. 1985, cannot have opening knowledge after that date (apart from its own "innovations.") Ergo, when I test its knowledge, I rely on the first edition of BCO, MCO 12, Informators from that period, individual opening manuals and older volumes (the late 19th-century Freeborough and Ranken manual is great for old gambit lines). If I'm looking for victory, I'll jump into NCO, or later editions of MCO, and see if I can find something the computer doesn't know.

What else could programmers rely on? My books are dotted with footnotes that spot the moment when Excellence and other machines leave the book, and I've combed through sources, guessing at which volumes the programmers used.

But I digress from the point of this conversation. One alternative rating method is the BT tests - a series of positions handed to machines, with ratings based on solving times. The faster the solution, the stronger the machine, so the thinking goes.

I'm not so sure about the validity of that approach. Something else strikes me, though. Numerous books and software programs attempt to provide the reader with a way to test himself. Examples are legion, from pure quizzes to vague scoring methods known only to the author. British GM Daniel King writes a regular column using this method. Off the top of my head, I can think of a few more:

The old Larry Evans book, "What's the Best Move?" is an opening manual based on a quiz format. The author assigns points based on the reader's guesses. The Chessmaster series of software programs includes similar features, such as the diagnostic rating exam.

An experiment might match a dedicated machine against several of those rating-measurement examples. The combined averages of the various tests would yield a prospective rating for the machine. I'm thinking about how to apply this approach. Suggestions welcome.

- R.
"You have, let us say, a promising politician, a rising artist that you wish to destroy. Dagger or bomb are archaic and unreliable - but teach him, inoculate him with chess."
– H.G. Wells
Mike Watters
Member
Posts: 429
Joined: Fri Sep 26, 2008 12:31 pm
Location: Milton Keynes
Contact:

Re: Alternative (informal) rating methods for dedicated mach

Post by Mike Watters »

Reinfeld wrote:It seems to be received wisdom among the assembled here that computer vs rated human is the truest method. Matching machine against machine meets with a certain amount of skepticism, even in the context of the hard-working Hallsworth efforts.
Up until 2005 Selective Search included a computer vs rated human rating alongside the computer vs computer ratings. So for instance your old reliable Excellence had a rating v humans of 1828 based on 57 games. Some quite major differences of rating were flagged up for some machines based on samples of around 200 human games.
User avatar
ricard60
Senior Member
Posts: 1285
Joined: Thu Aug 09, 2007 2:46 pm
Location: Puerto Ordaz

Re: Alternative (informal) rating methods for dedicated mach

Post by ricard60 »

Mike Watters wrote:
Reinfeld wrote:It seems to be received wisdom among the assembled here that computer vs rated human is the truest method. Matching machine against machine meets with a certain amount of skepticism, even in the context of the hard-working Hallsworth efforts.
Up until 2005 Selective Search included a computer vs rated human rating alongside the computer vs computer ratings. So for instance your old reliable Excellence had a rating v humans of 1828 based on 57 games. Some quite major differences of rating were flagged up for some machines based on samples of around 200 human games.
In computer v.s computer rating excellence has 1781. I have allways found that the computer rating is between 40 and 70 points under related to the computer v.s human rating for the same machine.
Post Reply