Since I recently discovered Amibroker, and am now rewriting all my prior Python analyses in AFL. I figured it's a good time to better validate my dataset. I have previously been using Polygon.io, and wanted to ensure the data is correct and complete.
So, I used Norgate Platinum to pull daily data for all common stocks and ETFs from 2004-present (the polygon 1-min goes back to 2004). I then went back and tried to match the intraday data from Polygon to the date ranges from Norgate, and was surprised to find thousands of both missing tickers and missing days/years of data from Polygon. I resolved some of this once I realized Norgate uses the last known ticker name, while Polygon has a separate feed each time a ticker name changes. This was an annoying and painful process that required yet another data provider, but I did it because I thought it would resolve the issue.
Unfortunately, even after resolving every issue I could possibly think of, I am still left with thousands of missing date ranges intraday, many from delisted tickers, or those that still trade but only on OTC now instead of a major exchange.
I'm sure others here understand how frustrating that is. I need the 1-min data to validate my strategy, and I doubt I will be confident to trade it without survivorship-bias free data. So I'm trying to evaluate my options now, which seems to be sign up for service after service until the data is filled (eyeing esignal, iqfeed). None of the services are that clear on availability of delisted tickers, so it could be a huge waste of time.
Besides just ranting, I'm wondering if anyone here has gone through this same experience to create a stellar long-term 1-min dataset, and if you have any advice for me.
Update2: I setup the AmiBroker eSignal plugin, seems to be working. My plan is to pull in a watchlist from the norgate data (common stocks, etfs), force backfill them all with the esignal plugin, export the data (or date ranges for each ticker at least), and validate I have the tickers/date ranges from esignal that match with norgate.
Update3: Each time I force backfill, I get a varying number of intraday bars back (within the bar count threshold). I am not sure if this is an AmiBroker or eSignal issue though, need to learn a bit more.
eSignal has two different methods of retrieving histories. Normal request and "extended history". To get "extended history" you have to be subscribed to extended history and have in enabled in the settings. I don't know what you are using now.
Those two methods may lead to different results.
Also:
Do NOT trigger "Force backfill" frequently for the same symbol because it is AGAINST eSignal terms of service and they have right to limit response to protect their servers or even cut you off entirely from the feed if you abuse the service that way.
Thank you for the reference @Stefan-GER. I like that this service provides the list of tickers and dates. I checked my "tricky" ticker there and it is not available in their bundle. I should be able to compare it with Norgate using the list provided though. They provide the "most liquid" tickers, so I'm guessing the "tricky" ticker did not qualify for whatever that definition is.
If all else fails, I may have to lower my expectations of a "complete" historical dataset. I may be shooting myself in the foot, but I'll put a bit more effort into it, as it should be a one-time effort. That said, I may contact them to see if they can share their liquidity definition, so at the very least I could add a filter to ignore these tickers in historical testing and realtime trading.
@Tomasz Thanks. Yes I'm using the extended history addon w/ AmiBroker configured to use it. As far as I'm aware, or can find, there was no terms of service given to me by esignal that limits access other than the 500 concurrent symbol limit. I will check with their support to clarify and make sure I comply. If this is true, it puts esignal back into the deadend bucket.
I talked with eSignal support. They confirmed the only limit is 500 concurrent symbols monitored. There is no limit to pulling historical data of any kind. This is at least what they told me in writing. I also couldn't find other limits in the terms I was given. Maybe this is a change from the past.
Update4: I created the ~15K ticker watchlist from norgate and imported to AmiBroker. I decided to first do just a Daily pull using the eSignal plugin, which only took about ~20mins. The exact results were that 11965 of 15084 symbols were available from eSignal, ~80%. Polygon also had around ~80%. Each provider's 80% has a different subset of tickers, so combining from multiple providers may be required. I've arbitrary decided to be satisfied if I can get 95% coverage.
Interesting side note, I noticed one ticker, ABV.C, not available in eSignal's charting, but AmiBroker was able to pull that same ticker's data in. Weird, but yay AmiBroker.
Interesting side note2, daily data pulled by esignal plugin is split adjusted, intraday is not.
It is not in end-user terms of service. It is in eSignal API/SDK docs that are private and available to developers only . Regular support guys that you talk to have no idea about eSignal SDK.
No worries, I've done my due diligence with them to adhere to limits. If I can't pull extended data from the extended data service, they have violated their contract with me, their service is not as claimed or useful for my purpose, and I'll just grab a refund and move on. No skin off my back.
You need to understand them. Pulling gigabytes of data every few seconds costs bandwidth and hardware resources. Multiply that by thousands of users doing the same…
I get your point, it's a hard technical problem. That said, it's what I've payed for, confirmed by their own representatives. They throttle the throughput anyway, so it's probably more like 1-5MB/s. I use polygon the exact same way and its fine, they encourage it, and have even helped me troubleshoot higher throughput in the past. Its why I use their service and not something with severe rate limiting.
I have needs that need to be understood too . Nothing nefarious happening here. Just trying to arrive at a viable backtest solution using ethically sourced capitalism and some duct tape. I don't want to get too philosophical in the forum though.
For what it is worth, I was not talking about normal use, but only about repeated force backfill of entire history on very same symbol done every few seconds. I hope that this clarifies my previous post.
No worries. I can see how repeated backfill of one symbol could be interpreted as an intentional denial of service. I backfilled a few times manually (not scripted) just to debug the issues I was seeing.