I'm wondering if the thread parallelism I'm seeing is the limit of what is possible. I've read and applied all relevant articles, and addressed the single thread limitation of Norgate Data. My test case optimization looks like so:
On this dual CPU machine note that all 12 threads on Node1 (second CPU) are used. The 12 threads on Node0 remain idle. When I disable affinity for all threads on Node1 the AMI Borker workload moves to Node0. Busy threads across both nodes is never achieved.
In the BIOS turning off NUMA and reverting to SMP sees perhaps 15% of the workload performed on the first 12 threads with 75% on the second 12 threads. Run times are slightly worse.
Is AMI Broker compiled to take advantage of multi-CPU's?
Is the use of static vars binding the workload to a single CPU?
Anyone with code I can run that is known to run on multi-CPU's?
88% (244 sec out of 277 sec) of time in your optimization is spent waiting for DATA PLUGIN, and not actual optimization or formula execution.
AFL execution is perfectly parallel (it takes only 12% of time).
As explained here:
Amdahl’s law says that if 95% of your program runs in multiple threads and only 5% of it is serial (single-threaded), the maximum achievable speedup regardless of how many CPUs and how many cores you have is 20x (20 times).
Don't use plugins when you count on speed. Disconnect from external source and use local only. Make sure your "in memory cache" is large to allow keeping ALL symbols in RAM. Also don't use Norgate functions as they are slow.
The thread on very same subject already exists, don't post duplicates please. Continue in existing thread: