Duplicates not found
Moderators: Watchman, Mark Uniacke, mrudolf
Duplicates not found
I created a small database with 2 pairs if identical games (attached the pgn). However when performing duplicate check, HCE Pro says there are no duplicates.
- Attachments
-
- duplicates.pgn.zip
- (1.93 KiB) Downloaded 22 times
Re: Duplicates not found
The same on my Mac. Maybe the problem has to do with the annotations.
When I copy game #4, strip the annotations and insert it in the database then HCE Pro finds a duplicate.
When I copy game #4, strip the annotations and insert it in the database then HCE Pro finds a duplicate.
Re: Duplicates not found
Although the games are identical, they have different UTCTime and UTCDate tags.
HCE perceiving the games as different would be natural when they seem to have been played on different dates and different times.
HCE perceiving the games as different would be natural when they seem to have been played on different dates and different times.
Re: Duplicates not found
After some further investigation:
The Games were played on chess.com, then they were uploaded to LiChess for analysis.
And this is the crap about LiChess PGNs, They are not consistent. Some don't use the Date and and Time tags, and even ignores them and only use the UTCDate and UTCTime tags. Other times they use both. Some of them even misses the Site tag, which is part of the 7-tag roster and is mandatory in any PGN.
So what happened is that the LiChess PGN retained the original Date/Time tags from chess.com but also included the UTC equivalents, which were set to the download Date/Time, and both tag pairs eventually appeared in the resulting PGN, with different values.
Handle LiChess PGNs with some care. They also have other issues, like the handling of 960 castling rights. Maybe not important for you, but a huge issue for 960 players.
The HCE programmers could probably make some LiChess-specific workaround, but it should rather be a task for LiChess to clean up their act. A hundred million games played a month, and not being able to deliver a PGN according to standard is just silly.
The Games were played on chess.com, then they were uploaded to LiChess for analysis.
And this is the crap about LiChess PGNs, They are not consistent. Some don't use the Date and and Time tags, and even ignores them and only use the UTCDate and UTCTime tags. Other times they use both. Some of them even misses the Site tag, which is part of the 7-tag roster and is mandatory in any PGN.
So what happened is that the LiChess PGN retained the original Date/Time tags from chess.com but also included the UTC equivalents, which were set to the download Date/Time, and both tag pairs eventually appeared in the resulting PGN, with different values.
Handle LiChess PGNs with some care. They also have other issues, like the handling of 960 castling rights. Maybe not important for you, but a huge issue for 960 players.
The HCE programmers could probably make some LiChess-specific workaround, but it should rather be a task for LiChess to clean up their act. A hundred million games played a month, and not being able to deliver a PGN according to standard is just silly.
Re: Duplicates not found
@Bxh7
Great detective work ! Thanks for digging dip into this.
Next time I will make sure to take the games directly from chess.com.
Great detective work ! Thanks for digging dip into this.
Next time I will make sure to take the games directly from chess.com.
Re: Duplicates not found
For annotated games we indeed compare all the comments/tags, as it is impossible to guess, what to ignore.
I don't see a good solution here (except for always downloading your games from one source). The basic idea of duplicate finder was to avoid any data loss, so that users can be sure nothing important is removed. Perhaps we should have another option of fuzzy matching, but I don't think many people would like to carefully review thousands of games.
Re: Duplicates not found
In the meantime I've found a temporary workaround - a simple script that removes the UTCTime and UTCDate tags.
That helped HPE find all the duplicates in my current database.
That helped HPE find all the duplicates in my current database.