EloStat is a rating calculating program not only for chess engines.
With permission by Dr. Frank Schubert (Germany).
Version 1.3 is included in Arena.
The program ELOstat was written by Dr. Frank Schubert. Frank is 36 years old and is living in Dresden in the eastern part of Germany. He is a physicist by profession and is interested in computer chess for many years. The program produces a comprehensive statistical evaluation of PGN databases and generates an ELO rating list similar to the Swedish SSDF list. It was written primarily for evaluation of computer chess games but can also be applied to ‘human‘ chess databases. ELOstat uses an iteration algorithm as introduced by Ken Thompson. For this, all programs start with the same ELO value. After that the whole database is treated as a single huge tournament and the ELO performances of each program are calculated. These new ELO values replace the start values. This procedure is repeated until all ELO values remain constant. Additionally ELOstat calculates the margins of error of the ELO mean values with 95% statistical confidence. These margins are an important indicator to decide whether the ELO mean value of a program is already certain or still uncertain due to an insufficient number of games. More details about the program and some other statistical information can be found in the extensive Readme-File which comes along with the program. Actually Frank is working on a new version of ELOstat with a lot of more statistical features.
Dr. Frank Schubert wrote to the new version 1.2 ...
"Some comments: Version 1.2 is a special version of ELOStat only produced to properly work with Arena. It has only the 'Rating List' option. The options 'Tournament' and 'Single Competition' are removed. Thus if you are working with a 32Bit version of Version 1.1 there is no need for an update."
Dr. Frank Schubert wrote to the new version 1.3 (in German) ...
"Im Anhang findest Du die neueste Version. Wichtigste Änderung ist eine neue bessere Routine zur Berechnung der Konfidenzintervalle. Für Konfidenzintervalle kann man auch "Vertrauensintervalle" schreiben. Das ist im Grunde der statistische Fehler der berechneten Elozahlen (in der Rangliste sind das die beiden Spalten hinter der Elozahl). Je mehr Partien gespielt werden, umso kleiner werden die. Jede Rangliste ohne solche Fehlergrenzen ist aus meiner Sicht wertlos, da man nicht erkennen kann, wie zuverlässig diese Rangliste ist."
The readme files are modifiend (in English and German available).
Very interesting is the history of ELOstat!
Changes from version 1.2 to 1.3:
maximum number of different players/programs increased to 1500
algorithm for calculating the confidence intervals completely changed (now uses the so called nonparametric ABC method (approximated bootstrap confidence)) by Efron and Tibshirani. Many thanks to Dr. Jeff Lischer (US) for drawing my attention to this fantastic method and to all users who pointed out the insufficiencies of the old method.
some minor bugs in individual statistics output removed