Survivorship bias: why only delisting date?

I'm trying to set up a survivorship bias free database for the stocks of the S&P500 index (using AmiQuote and Yahoo quotes). WikipediaWikipedia comprises a table with not only the present constituents but also with the changes since 1999. This can be imported into Excel and, using the procedure set out in "Import ASCII" Import ASCII, this can easily be put in the field “Delisting date” of the stock. That works like a charm!

However I'm stumbling over two problems:

  1. I think that the “Delisting date” is provided for the reason I’m using it for, i.e. to eliminate a survivor ship bias. However, there is no corresponding "First listing date" field and stocks will be backtested before they were part of the index. How can there be survivorship bias free database without such a field? Am I missing something?

I thought about just manually deleting the quotes before the date the stock was listed in the S&P 500. However then no indicator using quotes from before the listing day would work.

  1. I tried using the proposed code exitLastBar = datetime() >= GetFnData("DelistingDate"); (see Closing trades) for selling delisted stocks. However, this condition doesn’t work; “exitLastBar” is always true if the stock hasn’t been delisted ("DelistingDate" is empty, i.e. has the value zero, and “datetime()” is always larger). It’s easy to work around this problem. However, everything in the Amibroker environment is so perfectly thought through and it seems a bit odd that this code is proposed. I’m probably missing something.

Now my question: Has anybody found and is willing to share a solution to exclude stocks from backtesting before they were listed in the S&P 500?

The term "delisted" means that a stock can no longer be traded. It has nothing to do with whether or not the stock is a member of an index. For example, a stock could be dropped from the S&P 500 index, but still be part of the Russell 1000.

If you want to backtest historically-accurate index constituents, I suggest you check out the offerings from Norgate Data. They are currently the only data provider that I'm aware of that has historical index membership data for US and Australian markets at a reasonable price.

4 Likes

Thank you for your reply, mradtke. Of course it is a disappointment that it appears not to be possible to set up a survivorshipbias-free database for an index comprising a limited amount of stocks. I cannot see how the field "DelistingDate" could be useful for the purpose you mentioned: no private person can build a complete stock universe with many thousands delisted stocks.

I checked out Norgate and their offer is absolutely reasonable for what they offer. However, for survivorshipbias free data I need the platinum subscription and that's a bit of an overkill given that for now I'm only interested in S&P500 stocks.

I might just do my backtesting with only the delisting date or go with the original plan and delete quotes before the date the stock was listed in the S&P 500. I can live with the fact that indicators are not working immediately after a stock is listed.

Thanks again!

@Canary the topic has been discussed elsewhere on the forum and it may be worth your time to review those discussions. But a possible suggestion for you, depending upon the strategy you want to trade you may create a watch list with the 500 most liquid stocks on the U.S. exchanges and use that as a substitute for the S&P 500.

That type of list will capture most of the SP500 and may be a simple method of testing your strategies.

It is worth mentioning that on any given day 60 or 70 of the top 500 most liquid stocks are often mid-cap or small-cap stocks making news (and not members of the SP500 index).

Depending on how much work you want to do, you could also store the "is in S&P 500 index" data in one of the Aux fields in the database.

@portfoliobuilder. I cannot imagine that that would be more accurate than what I currently have (current and delisted S&P500 stocks, the only thing missing is the start date of the listing; I might misunderstand what you mean, but just filtering liquid stocks from a large stock universe will not provide any of these).
@mradtke. That would work and might not be undoable (there are 277 index mutations since 1999). Importing the delisting day with $FORMAT was just soo easy.

You will also have to think about saving the symbols in a different way than with the symbol. Duplicate symbols don't work I think. Norgate solves that by adding the delisting date to the symbol name, CSI solves it by using the CSI Number for their database.

But you need to solve that problem if you download old data from somewhere.

I'm actually working on this as well. I'm using: GitHub - fja05680/sp500: Current and Historical Lists of S&P 500 components since 1996 as the base for sp500 through time.

I haven't gotten this far yet, but I was simply going to query the csv with afl by date, then loop through the tickers as part of my buy/sell condition.

This wont be the quickest piece of code but it should get the job done for my needs.

Great idea here, thanks for sharing!

That is NOT true. It is easy to setup such database. You get historical listings of say SP500 from the internet. Then you create say yearly watch lists each holding index constitutents for given year. Then you use InWatchList() or InWatchlistname() to retrieve constituents on yearly basis and exit symbols that drop out of index.

I wrote about this on this very forum and you would find it if you just searched:

Coding examples were presented many times on old YahooGroups, for example this: [amibroker] InWatchListName Help (was Re: InWatchList help)

1 Like

@Henri Do you mean that tickers get assigned to other stocks after delisting. I haven't thought about that, It could of course seriously mess up my backtesting. I will look into that.
@Pinecone That looks like a great source. Thanks for sharing!
@Tomasz Thank you for pointing me in the right direction. I need some time to study your answer and look into the posts you mentioned to understand what you mean. I do search the forum and have found many answers, often from you, mradtke, portfoliobuilder and many others. It is not always easy though to find the right search terms. For those like me, who have not reached a level of very high proficiency, it is not so easy to even recognise how useful some posts may be.

@Canary, Tomasz is suggesting that you create a lot of watch lists then query them based on year in your code. This is a decent idea, though it seems there would be a lot of leg-work*, and to perfect it you would need more than yearly watchlists. SP500 changes multiples times a year.

*I haven't looked at watchlist files, might be possible to generate these more quickly outside of AMI if they aren't proprietary like the DB.

More quickly?

You do not need to create watchlists outside.
There is CategoryCreate() function.
Have you read about it?
Have you heard of looping?
No?

Cool, there's a way to do it inside ami. Great find @fxshrat, thanks for sharing. :wink:

As documented in the manual, watchlists are stored in plain text ASCII files
http://www.amibroker.com/guide/h_watchlist.html

Quote from manual:

Watch lists are now stored as text files inside "Watchlists" folder inside database. The folder contains of any number of .TLS files with watch lists themselves and index.txt that defines the order of watch lists. You can add your own .tls file (one symbol per line) and AmiBroker will update index.txt automatically (adding any new watch lists at the end)The .TLS files can also be open in AmiQuote.

They can be created programmatically (CategoryCreate) but also externally just by saving text file into "Watchlists" subdirectory.

1 Like

Thomasz, I've seen some of the solutions you point to (here is another useful one). But only now, with the ticker lists from the data source provided by Pinecone, I realize how they can be used.
I wonder however if these solutions are really without survivership bias. Using watchlists each covering a year, as proposed by you, will include up to a year of data from before a stock is put in or removed from an index. This might appear irrelevant, but the prospect of being included in an important index might result in the stockprice going up exactly in that period of up to a year and cause an even greater bias as data from a random period would cause.

You can go for quarterly lists but the law of diminishing returns tells that at some point it does not make sense to get down to microscopic level. Yearly is good balance between accuracy and the amount of work required. If you want to go microscopic you can just buy Norgate database.

Just some findings to finish this post:

I did set up a S&P500 database with yearly working lists using the guidance of this thread and used the following code for a sample backtest:

Allow = IIF( DateNum() <= 961231, InWatchListName("WL1996"),
        IIF( DateNum() <= 971231, InWatchListName("WL1997"),
        IIF( DateNum() <= 981231, InWatchListName("WL1998"),
        IIF( DateNum() <= 991231, InWatchListName("WL1999"),
        IIF( DateNum() <= 1001231, InWatchListName("WL2000"),
        IIF( DateNum() <= 1011231, InWatchListName("WL2001"),
        IIF( DateNum() <= 1021231, InWatchListName("WL2002"),
        IIF( DateNum() <= 1031231, InWatchListName("WL2003"),
        IIF( DateNum() <= 1041231, InWatchListName("WL2004"),
        IIF( DateNum() <= 1051231, InWatchListName("WL2005"),
        IIF( DateNum() <= 1061231, InWatchListName("WL2006"),
        IIF( DateNum() <= 1071231, InWatchListName("WL2007"),
        IIF( DateNum() <= 1081231, InWatchListName("WL2008"),
        IIF( DateNum() <= 1091231, InWatchListName("WL2009"),
        IIF( DateNum() <= 1101231, InWatchListName("WL2010"),
        IIF( DateNum() <= 1111231, InWatchListName("WL2011"),
        IIF( DateNum() <= 1121231, InWatchListName("WL2012"),
        IIF( DateNum() <= 1131231, InWatchListName("WL2013"),
        IIF( DateNum() <= 1141231, InWatchListName("WL2014"),
        IIF( DateNum() <= 1151231, InWatchListName("WL2015"),
        IIF( DateNum() <= 1161231, InWatchListName("WL2016"),
        IIF( DateNum() <= 1171231, InWatchListName("WL2017"),
        IIF( DateNum() <= 1181231, InWatchListName("WL2018"),
        IIF( DateNum() <= 1191231, InWatchListName("WL2019"),
        IIF( DateNum() <= 1201231, InWatchListName("WL2020"),
        IIF( DateNum() <= 1211231, InWatchListName("WL2021"),
        IIF( DateNum() <= 1221231, InWatchListName("WL2022"),
        0)))))))))))))))))))))))))));
buy = cross( macd(), 0 );
sell = cross( 0, macd() );

Buy = Allow AND Buy;
Sell = Allow OR Sell;

However, I came to the conclusion that it just doesn’t work. As Henri pointed out above, you need to download the data from somewhere. I was hoping for Yahoo historical data. However, altough they keep delisted symbols in their database, they do not keep their quotes . Amiquote thus indicates that the data is imported, but closer inspection reveals that 0 bars have been imported. Further, ticker symbols are assigned to new stocks indeed: for instance CPWR was used for Compuware up to 2011 but is now used for the Ocean Thermal Energy Corporation.

I’m afraid that there is no alternative for Norgate if survivorshipbias free data is desired. I hope this insight saves some time for others who are trying the same.

Having delisted EOD in the database is the key solving survivorship bias problem. Yahoo Finance does not keep EOD for delisted stocks. In my database, I have around 6% delisted(329 out of 5117). This site does keep track of delisted US stock tickers but I still find some missing.
Latest Delisted Stocks | Stock Analysis
Tiingo has EOD for delisted from 2015 onwards. Without delisted stocks in the database, backtest is prone to survivorship bias. If you choose to manage your own database for delisted stock, you need to manage the tickers name. For example, HP has become HPE and HPQ and someone else have taken over HP. So what to with the old HP? One solution is to give it another meaningful name e.g. HP-delisted for example.

1 Like

The Wikipedia change list is also quite wrong (missing lots of changes).

Ticker re-use is common. For example, the ticker AB has been reused at least 6 times so far for companies completely unrelated to each other.

We've spent thousands of hours of research on historical databases, ticker changes, corporate actions and historical index constituents.

@Peter2407 Regarding "HP" - Hewlett Packard never had this ticker.

Prior to May 6 2002, it was NYSE:HWP. Hewlett-Packard Co merged with Compaq at this time (with Hewlett-Packard being the surviving entity. It changed its ticker to HPQ.

In Nov 2015, it demerged NYSE:HWE (Hewlett Packard Enterprise Co) and changed company name to HP Inc.

Norgate has a continuous history for NYSE:HPQ back to its 1961 NYSE listing. It has been a S&P 500 member since Oct 1974.

HP has been used as a ticker by Helmerich & Payne Inc (an Oil & Gas company) since the 1960s and has no relationship with Hewlett-Packard.

2 Likes