How to set up a database considering index constituents

Hello everybody,
I'm new to Amibroker and try to understand how to set up a database considering index constituents. I'm aware of Norgate greate solution but I would also like to perform becktest on non US indices like Eurostoxx 600, HDAX, etc.

I now that it will be a very time consuming task but as far as I know there is no Norgate like source to cover non US indices.

Up to now I have a monthly time series with index constituents. This time series contains all tickers which were part of a index in a particular month. Format is: MM.YYYY,ticker,ticker,ticker,...
So I know on monthly basis the composition of the index.

Before tackling the challange of getting past data especially for delisted stocks I try to understand how I can consider index affiliation in the source data. Accodring to another postings there is a way to do this with aux1 and aux2 field but I do not fully understand how this work in view of backtest.

Thanks in advance!

Thanks in advance!

Interesting project, indeed very time consuming, I personaly wonder if it will be worth your time because you will bump into many problems that you can not solve and will end up with something not 100%.

Usualy index constituents are in the index because of their trading volume. You might consider using that also. Ofcourse, that would only help if you would be aiming for the big cap indexes and not small cap indexes.

Anyway, the way I understand how the aux fields were used (there might very well be other solutions) is that you set the aux field to "true" or "false". If the stock was in the index, it's true. You could check for that. I do not know if you could for instance write information like "CAC40" in the aux fields and look for that. My guess is that you are actualy looking for that... and then check, in your script, if the stock was in that particular index and base your trading logic on that.

Hope someone else will give you the exact solution but just wanted to share my thoughts and give you the idea of volume in case you had not thought of that.

I went through a similar exercise before Norgate's index constituent data was available. I used Excel and wrote a macro to batch process each individual CSV dropping a 1 or 0 into the Aux 1 field dependent on whether the symbol was in the index at that time. It was very time consuming and maintaining it was also an issue. It's possible, but as you know already time-consuming.
Good luck!
Tony R

Hello,

You may want to consider ODBC Plugin for storing and retrieving Index Constituent Information.
I think that might be more efficient -

  1. You may store information for as many Indices as you wish
    (One security might be part of many indices at any point of time)
  2. You use only as much storage as needed.
    (if you go for - AUX1/AUX2 - they might be used for all data points)

With Regards

Sanjiv Bansal

Hello Henri,
thanks for your repsonse!

You are right about 100% correct but in the end it's a question how much a backtest result is affacted by 99% or 95% corrrect data.

Regarding considering volume, you mean filtering by cap size?

Hello TonyR,
thanks for your response!

I had the idea to use aux1 as multiplier. IMO this should work for momentum systems.

How did you consider aux1 in your backtest?

Yes, I mean filtering by cap size. In my experience it will get you pretty close.

I was also a big believer in that data needed to be 100% correct. Ofcourse, we do not want data errors, and split errors, but I recently tested with Norgate, which basicaly is 100% correct with index members, and the results suprissed me testing that against the data without index information. There was hardly any real difference noticable in the trading system results.

We always worry about stocks being in an index but the biggest problem for that comes from stocks getting in to an index. Usualy they are on their way up and for that reason end up in the index. Because they are not in the index, we would not have traded them before they ended up in the index.

So that is actualy a strange thing.... stocks might be on their way up, half way have enough volume to be tradable for us, but we skip them because they are not added to the index yet ?

As said, I was a firm believer of indexes.... it's a nice way to start, expecialy in Europe, for finding the bigger stocks in Euro, saves you some time sifting through but I think you might be missing some good opportunities by doing it that way....

And as said, not sure if you allready have Norgate, run a test on current S&P500 and run it on current and past and see the differences. It will depend on your trading system but I am pretty confident you will find simular results as I found.

Adding to Sumangalam's solution, not sure where he means you store the information but you can sture it in an SQL database and retrieve the information from there. Will introduce problems like symbols that are identical for different stocks and stuff like that but it can be managed.

Anyway, to much work in my book for the results is my idea :wink:

Very interessting thoughts!

Regarding survivalship bias, there is a nice post on this blog https://teddykoker.com/2019/05/creating-a-survivorship-bias-free-sp-500-dataset-with-python/
On other hand, I completed Urbans Amibroker course and in one of his session he performed a test for his bollinger system on database from Norgate and one without index constituents. If I recall correctly there is a difference of 12% vs. 14%. This confirms your findings.

Re the SP550, if you are familiar with R, you can check also this blog article and for Python (from the comments of that blog) this actively maintained Github repository. where you'll find also a recently updated .cvs file.

And if you add things like startdate for a trading system, which will also can have a big influence, you will introduce another 12% vs 14% situation.

I will go in to that a little deaper, deviating from the original subject but in a way still interesting I think.

Knipsel

This is a trading system that I have and actualy trade. G is the date, H to L are trading day's of the week, monday to friday. (not even going to talk about that difference :wink: )

As you can see, if I would have started trading this system om first of june 2000 on monday, I would roughly have a 20.5% CAGR.

Would I have traded this same system 2 months later, 1st of august 2000, I would have 21.5% CAGR

So the date you actualy start backtesting can allready have an almost 5% difference ! So basicaly... there is also some "luck" involved in picking the right starting date.

Therefore, I would say, don't worry to much about survivorship bias.

And, another screenshot, just because I was running some tests and wanted to check for myself what the difference would be.

Knipsel

All the same trading system, all from januari 2000 till present. C&P = current and past cinstituents.

Interestingly enough, both tests show fewer signals when only using current. This might very well be survivorship bias because this system is a momentum system, some stocks that get in to the index have good momentum and you now pick them up before they are actualy in the index. If I would add a volume filter, I would filter that out. I checked for example TSLA, it was a trade in 2013 and I have no idea if it allready belonged to the index (S&P500) at that time.

Also the avg profit per trade is better in the current versions. Again, apperently because you pick up some stocks that are not in the index yet...

Both good reasons to look further than the index itself.

But look ath the annual return... S&P does 1% better but Nasdaq does 2% worse. I can go in very deeply about the how and why, but all I want to point out is that yes, having index constituents can be usefull, but I doubt it is worth all the work you are putting in....

But everyone is different, some people can not do without, and that is fine with me. I also worried a lot about all this until I bumped into Norgate and found that it is far less important than I thought. For me that is, others might think very differently :wink:

Since I find it an interesting subject, I did some more research.

Knipsel

Left side, current and past. Sorted on %profit since I would assume that the survivorship bias will introduce some big winners that you would otherwise not have had.

Right side I marked 5 trades. The question to start with is, were those stocks in the index (S&P500) at that time ? I did a quick test, I think they all allready belonged to the index.

MU 2010
BKNG 2009
FB 2012
ILMN 2008
CTXS 2000

That is what my test shows when those stocks would have been added to the S&P500

So... that shows, if those dates are correct, that atleast those 5 trades have nothing to do with survivorship bias.

Then the question is, why were those trades not taken by the trading system ?

My idea, at this moment is, "pickluck".

It will depend a lot on the trading system, but if a trading system has a maximum of 5 positions, then if those positions are filled up, you could miss some trades or get in those trades a few day's, weeks, later changing the outcome of the trading system.

My 2 cents.... survivorship bias exists, but I question if that is as great an influence as we value it for....

Very interessting and insightful!

For what it's worth, using a testing universe like S&P 500 Current & Past with no check for index membership is a very poor way to do your baseline testing. You should instead use some sort of "all stocks" universe, which in the Norgate world might translate to a watchlist that you create from the Stock Market / Equity group in NDU. Since this topic is no longer about AmiBroker but rather testing philosophy, I'll leave it at that.

Index membership is tested using the norgate code for it. Therefore I now realize I did not even have to look if those symbols were in the index during the time of the trade, the norgate code allready did that for me.

That means the test will have less symbols in the early years. Only symbols that were allready in the index in 2000 are included in the test. Symbols that were taken over or bankrupt are not in the current S&P so they are not in the test.

So the tests on the current S&P are true survivors, the script only trades them when they are actualy in the index...... that also means there are no trades from before those stocks entered the index.... No idea how much that will change the results but I will look for it.

Is there anything against a topic "changing" ? If so we should stop the discussion or take it elsewere.

I opened a new thread to discus pros and cons of considering index constituents

This topic was automatically closed 100 days after the last reply. New replies are no longer allowed.