LondonFrau Admin
Posts : 1295 Reputation : 3534 Join date : 2010-02-27 Location : ???
| Subject: MAY DAY Stockfish Development Versions Mon May 02, 2016 11:47 am | |
|
[You must be registered and logged in to see this link.] | Author: Marco Costalba Date: Sun May 1 15:10:33 2016 +0200 Timestamp: 1462108233 Fix a warning with MSVC Introduced by 2dd24dc4e618dc7b ("Use popcount intrinsic with Intel") No functional change. | [You must be registered and logged in to see this link.] | Author: joergoster Date: Sun May 1 14:30:50 2016 +0200 Timestamp: 1462105850 Fix LazySMP when searching to a fixed depth. Currently, helper threads will only search up to the specified depth limit. Now let them search until the main thread has finished the specified depth. On the other hand, we don't want to pick a thread with a higher search depth. This may be considered cheating. ;-) No functional change. | [You must be registered and logged in to see this link.] | Author: erbsenzaehler Date: Sun May 1 14:18:16 2016 +0200 Timestamp: 1462105096 Use popcount intrinsic with Interl compiler It seems that icc used our fallback version of popcount. Now use intrinsics. icc version 16.0.2 (gcc version 5.3.0 compatibility) bmi2 compile uname -r 4.5.1-1-ARCH 20xbench gives a nice speedup ./stockfish-icc-master 2161515 +- 34462 ./stockfish-icc-sse42 2260857 +- 50349 | [You must be registered and logged in to see this link.] | Author: Krgp Date: Sun May 1 14:11:28 2016 +0200 Timestamp: 1462104688 Remove useless -mbmi flag in Makefile I could not find anything documented that is necessary that prepending -mbmi to -mbmi2 gives some benefit. Instead at [You must be registered and logged in to see this link.] The following built-in functions are available when -mbmi is used. All of them generate the machine instruction that is part of the name. unsigned int __builtin_ia32_bextr_u32(unsigned int, unsigned int); unsigned long long __builtin_ia32_bextr_u64 (unsigned long long, unsigned long long); The following built-in functions are available when -mbmi2 is used. All of them generate the machine instruction that is part of the name. unsigned int _bzhi_u32 (unsigned int, unsigned int) unsigned int _pdep_u32 (unsigned int, unsigned int) unsigned int _pext_u32 (unsigned int, unsigned int) unsigned long long _bzhi_u64 (unsigned long long, unsigned long long) unsigned long long _pdep_u64 (unsigned long long, unsigned long long) unsigned long long _pext_u64 (unsigned long long, unsigned long long) and at [You must be registered and logged in to see this link.] ( "... The real optimization comes from being able to use pext (parallel bit extract), which can implement several bextr expressions in parallel.") Apart from that we don't use all -msse -msse2 -msse3 -msse4.2 etc. but just -msse3 (or -msse4.2) only. As regards to the speedup within noise level - this pull request is actually reversal of mcostalba#198 wherein prepending -mbmi to -mbmi2 was claimed to be 0.3% faster and here (removing -mbmi) gives 0.4% speed gain. |
_________________ Bettina...............The greatest happiness of life is the conviction that we are loved.
Victor Hugo
| |
|