Slow optimizations with Norgate Data

Hello,

I have subscribed to Norgate US Stocks Diamond Package for the very first time, everything works great except for the speed of optimizations. I realize that using a third party plugin is going to slow down the process, but sometimes it's so slow that it makes the whole task quite unpractical.

I would like to know if there is somethingo to be done to reduce time.

In regards to Amibroker, I am not sure if it is possible to calculate less metrics in the backtest and if that is going to speed up the process.

In regards to Norgate Plugin, I am not able to find how to configurate the inicial date for the downloaded data. For example, when it comes to S&P 500 from 1957, the watchlist contains more than 1800 symbols, if I can download data only from 1980 the number of symbols in the number of symbols will be smaller.

Any ideas to better the processing times will be welcomed

Thank you beforehand.

Best regards.

If you want fewer symbols in your watchlist, you could just write a simple exploration to find symbols that were members of the S&P 500 index on or after 1-Jan-1980 (or whatever your desired start date is).

For reference, my Platinum package has data back to 1990, and the S&P 500 Current & Past watchlist has 1213 symbols.

You could also use the Code Check & Profile tool in the AFL Editor to analyze your code and find areas where you might be able to speed up execution.

1 Like

Read this article:
http://www.amibroker.com/kb/2017/10/06/limits-of-multithreading/

and look at the numbers reported in Info tab. These are super important and will give you information about WHAT takes most time. The number of metrics does not even contribute to 1% of backtesting time.

If you want help, provide numbers because "slow" or "fast" does not really mean anything.
Please follow this advice: How to ask a good question

Are you using Norgate functions, for example, NorgateOriginalCloseTimeSeries? If so, you can improve the speed with an approach like this:

function OriginalPrice()
{
  staticVarKey = "OriginalPrice_" + Name();
  price = StaticVarGet(staticVarKey);
  if (typeof(price) != "array")
  {
    price = NorgateOriginalCloseTimeSeries();
    StaticVarSet(staticVarKey, price);
  }
  return price;
}

mradtke,

Thanks for your answer.

I was wondering if it was possible to limit the date range for downloading data, as in AmoQuote. If not, I agree with you, the easiest way to do that is composing a new list from an exploration.

I'm checking now Code Check & Profile tool as well, I was not familiar with it. It seems quite useful to check where the time is consumed.

Best regards.

Tomasz,

Thanks for your answer. You're right, slow or fast doesn't give any useful information.

I have tested one of the formulas from the article http://www.amibroker.com/kb/2017/10/06/limits-of-multithreading/ on the S&P 100 Current & Past Norgate Watchlist (227 symbols) from 31/12/2009 and 31/12/2019, daily periodicity.

> First version - Regular

period = Optimize( "period", 10, 2, 102, 1 );
Buy = Cross( C, MA( C, period ) );
Sell = Cross( MA( C, period ), C );

Image 001

Image 002

> Second version - Using Norgate provided function

#include_once "Formulas\Norgate Data\Norgate Data Functions.afl"
period = Optimize( "period", 10, 2, 501, 1 );
Buy = Cross( C, MA( C, period ) ) AND NorgateIndexConstituentTimeSeries("S&P 100");
Sell = Cross( MA( C, period ), C );

Image 003

Image 004

From the Code Check & Profile Window I can see that the Norgate function is the most time consuming one, by far. According to the Info tab, it slows down other parts of the proccess as well, apart from the formula execution.

Finally, the Norgate version has run 14 times slower than the regular one, that makes it quite unpractical when it comes to optimize larger codes.

How I could improve this simple formula?

Thanks berforehand

Best regards

Steve,

Thanks for your answer.

I'm just using NorgateIndexConstituentTimeSeries. Following your idea, I have tried to code a custom function with that Norgate function, but I coudn't figure out something that works...

Could you give me some extra help about that?

Thanks beforehand.

Best regards.

Clearly DATA ACCESS consumes almost all time:
image

Out of 12.48 seconds that it took, data access is responsible for 11.73 seconds which is 94% of entire optimization time. In other words actual optimization took BELOW 1 second.

AmiBroker just WAITED 11.73 seconds for data.

As TJ mentioned, data access is expensive. It is made worse when the data is not cached. Historical quotation data is cached internally by AmiBroker, but functions like NorgateIndexConstituentTimeSeries are not quotation data and not cached. You can improve performance by storing the non-cached data in static array variables upon the first access. Subsequent access can fetch the data from the static arrays -- it will be faster.

I haven't tested it, but the function below should improve your performance. Use this function in place of NorgateIndexConstituentTimeSeries.

function IndexConstituentTimeSeries(indexName)
{
  staticVarKey = "IndexConstituentTimeSeries_" + indexName + "_" + Name();
  array = StaticVarGet(staticVarKey);
  if (typeof(array) != "array")
  {
    array = NorgateIndexConstituentTimeSeries(indexName);
    StaticVarSet(staticVarKey, array);
  }
  return array;
}

Steve,

Thanks for your answer and further explanation.

I've complemented the previous system with your code:

#include_once "Formulas\Norgate Data\Norgate Data Functions.afl"

function IndexConstituentTimeSeries(indexName)
{
  staticVarKey = "IndexConstituentTimeSeries_" + indexName + "_" + Name();
  array = StaticVarGet(staticVarKey);
  if (typeof(array) != "array")
  {
    array = NorgateIndexConstituentTimeSeries(indexName);
    StaticVarSet(staticVarKey, array);
  }
  return array;
}

period = Optimize( "period", 10, 2, 501, 1 );
Buy = Cross( C, MA( C, period ) ) AND IndexConstituentTimeSeries("S&P 100");
Sell = Cross( MA( C, period ), C );

and these were the results:

Image 1
Image 2

The optimation has come down from 169 to 75 second, which is great.

I've compared the results from both systems, those are identical except for the number of trades (263 for the present code and 212 for the previous one)

Image 3

Image 4

I'm not sure where that difference comes from, I will investigate it.

Anyway, thanks for your idea, it will be very useful.

Regards.

Still 71 seconds out of those 75 seconds is just WAITING FOR DATA.
In other words AmiBroker needed 4 SECONDS to run optimization.

If you are into optimizations it is recommended to TURN OFF external data source (plugin) and run only native AmiBroker database ("Data source" set to "(local database)").
You may also want to increase "in-memory cache" setting in Preferences so it is more than number of symbols under test.

1 Like

Tomasz,

Thanks for your additional comments.

I tested both proposals, but I got similar results. I guess I'm doing something wrong, but I can't figure out what

1) Data source:

Image 1
Image 2

2) In-Memory Cache (before this test, the values were 10.000 symbols and 526 MebaBytes)

Image 3
Image 4

I tested the same code:

#include_once "Formulas\Norgate Data\Norgate Data Functions.afl"

function IndexConstituentTimeSeries(indexName)
{
  staticVarKey = "IndexConstituentTimeSeries_" + indexName + "_" + Name();
  array = StaticVarGet(staticVarKey);
  if (typeof(array) != "array")
  {
    array = NorgateIndexConstituentTimeSeries(indexName);
    StaticVarSet(staticVarKey, array);
  }
  return array;
}

period = Optimize( "period", 10, 2, 501, 1 );
Buy = Cross( C, MA( C, period ) ) AND IndexConstituentTimeSeries("S&P 100");
Sell = Cross( MA( C, period ), C );

Is it possible that the NorgateIndexConstituentTimeSeries function is slowing everything down anyway? I couldn't go without it if I would need to take the historical constituents in the backtest.

Regards.

Do you really need 14000 bars per symbol (53 years)? You can improve performance by reducing the number of bars in the database settings. Decide how many years of data you will use in your optimizations, and adjust the number of bars accordingly.

I don't know how many symbols you are testing. And how much data per symbol you use. Chances are that 4GB is not enough to fit all your data.
Check Tools->Performance Monitor if ALL symbols are in RAM cache.

and if you are using padding or not. Padding is costly as data can't be used directly they need to be pre-processed (padded).

Tomasz,

Thanks for your answer.

I'm testing 227 symbols from the S&P 100 Current & Past Norgate Watchlist, from 31/12/2009 to 31/12/2019, I use End-of-Day data, I'm not padding, all symbols are in RAM chache:

Captura 1

Do you think I could try something else?

Regards.

So you are optimizing 10 years of 227 symbols worth of data, 500 optimizations steps (500 backtests) in 74 seconds.
It looks that on average single backtest (analysing 590200 data points in each backtest) takes on your end 74/500 seconds = 0.14 sec.
That gives 5 million data points per second throughput.
If that's the case what do you really expect? Realistically?

Tomasz,

Thanks for your answer.

I didn't have a concrete expectation about how long the optimization should last, but since I have to optimize a system which is larger than the one in the example, I was wondering how to make the process faster.

However, if the indicated time is reasonable, I feel satisfied to know that I'm not doing something wrong.

Thanks to the rest of the users for your anwers as well.

Best regards.

As I wrote, for maximum speed you can change from plugin to "(local database)" this will
run backtest without ever touching anything outside AmiBroker and will be the fastest option. As indicated earlier, any plugin-exposed function is going to slow down backtest, so earlier suggested ways to store plugin-exposed data in static variables gives you nice boost (2x).

Backtest speed depends on including CPU speed, RAM bandwidth, size of data used (number of bars), the formula itself, options used (for example MonteCarlo simulation turned on/off). It is all discussed here:



http://www.amibroker.com/kb/2017/10/06/limits-of-multithreading/

When I got some time I will provide some more "baseline" numbers for performance so people can compare to the speed they are getting if they doubt if their performance is optimum or not.

5 Likes

This topic was automatically closed 100 days after the last reply. New replies are no longer allowed.