Houdini Pro is intended for power users with high-end hardware.
The main differences with the Standard version are:
• | Houdini Pro supports up to 32 threads. |
• | Houdini Pro supports up to 32 GB of hash memory (32768 MB). |
• | Houdini Pro supports Large Memory Pages. |
• | Houdini Pro is NUMA-aware. |
Large Memory Pages
Houdini Pro will use so-called large memory pages if they are
provided by the operating system. Depending on the hash table size the
speed gain may be between 5% and 15%.
To enable this feature in Windows, you need to modify the Group Policy for your account:
1. | Run: gpedit.msc (or search for "Group Policy"). |
2. | Under "Computer Configuration", "Windows Settings", "Security Settings", "Local Policies" click on "User Rights Assignment". |
3. | In the right pane double-click the option "Lock Pages in Memory". |
4. | Click on "Add User or Group" and add your account or "Everyone". |
5. | You may have to logoff or reboot for the change to take effect. |
You'll also need to run your chess GUI with administrative rights ("Run as Administrator") or disable UAC in Windows.
Very often large pages will only be available shortly after booting
Windows. After a while the Windows memory becomes too fragmented for
large page allocation, and Houdini will fall back to standard memory
page usage.
You can test the availability of Large Pages with the lp command. Run Houdini in a command window (simply by double-clicking on the executable) and type lp
followed by Enter. Houdini will produce a summary with the number of
allocated large pages as a function of the large page size. This command
can take several minutes on a system with lots of ram (16 GB or more),
so be patient.
NUMA-awareness
Most CPU mother boards with multiple sockets employ the so-called "NUMA" architecture.
Houdini Pro detects the NUMA configuration at start-up and will adapt
its memory management and thread interaction based on the different
NUMA nodes that are available.
Speed gain can be 5% to 15% depending on the number of cores, the motherboard and CPU brand.
Running Multiple Houdini Pro instances
If you're simultaneously running multiple Houdini Pro instances they
will by default compete for the resources on the same NUMA nodes. To
avoid this, you should set the Numa Offset parameter to different values in the different Houdini instances.
For example, if you want to run two Houdini instances with 6 threads
each on 12-core hardware, you should use Numa Offset 1 for the second
instance so that it will allocate its 6 threads on the second NUMA node.
See also the
[You must be registered and logged in to see this link.] configuration.
Some Real Performance Data
The test system was a 16-core dual AMD Opteron-6128 box running at the stock 2.0 GHz speed.
The autotune command (see the
[You must be registered and logged in to see this link.]) was used as benchmark to measure the impact of the Large Pages and the NUMA-awareness.
Hash memory was set at 2048 MB, 16 threads were used.
Configuration
| Best Split Depth
| Average Node Speed
| Speed Gain
|
Standard
| 14
| 13600 kN/s
|
|
With Large Pages
| 14
| 14900 kN/s
| +10%
|
With NUMA and Large Pages
| 12
| 16200 kN/s
| +20%
|
On this system Houdini Pro with NUMA and Large Pages was about 20% faster than the Standard version.