# Plot suspiciously long wicks in different color

I use IQFeed, and have noticed that they consistently send bad ticks which can be removed by using the "Force Backfill" function.

However, I must first detect these bad ticks before I can click the Force Backfill button.

This code looks for wicks that are 5 times larger than the body and also 2 times larger than the ATR, and replaces the High and Low arrays with the median value of the past 3 bars.

I would appreciate any ideas about detecting them better, or optimizing the function so it runs as fast as possible.

``````// Experimental code to fix bad wicks from IQFeed.
// Coded by Peter Deal
// 20230524

{
// To smooth out bad ticks from the H/L fields using a 3-bar median. Assumes that O and C fields are ok.
// 20230524
CBT		= Max( O, C ); // candle body top
CBB		= Min( O, C ); // candle body bottom
Body	= CBT - CBB;
WickT	= H - CBT;
WickB	= CBB - L;
ATR10 	= Ref( ATR(10), -1 );
TestT	= WickT > 5 * Body AND WickT > 2 * ATR10;
TestB	= WickB > 5 * Body AND WickB > 2 * ATR10;
H 		= IIf( TestT, Median( Ref( H, -1 ), 3 ), H );
L 		= IIf( TestB, Median( Ref( L, -1 ), 3 ), L );
return TestT OR TestB;
}

SetBarFillColor( colorLightOrange );
Plot( C, "Close", colorLightOrange, styleCandle | styleNoLabel | styleNoTitle, Null, Null, Null, -1 );
Plot( C, "Close", colorDefault, styleCandle, Null, Null, Null, 0 );
``````

Many thanks!

I just realized that using the median value of the past 3 bars won't work with successive bad ticks. Here's a corrected version that uses ATR to estimate where the bar's wicks would be more likely.

``````// Experimental code to fix bad wicks from IQFeed.
// Coded by Peter Deal
// 20230524v2

{
// To smooth out bad ticks from the H/L fields using ATR to guess where the wick should be. Assumes that O and C fields are ok.
// 20230524
CBT		= Max( O, C ); // candle body top
CBB		= Min( O, C ); // candle body bottom
Body	= CBT - CBB;
WickT	= H - CBT;
WickB	= CBB - L;
ATR30 	= Ref( ATR(30), -1 );
TestT	= WickT > 5 * Body AND WickT > 2 * ATR30;
TestB	= WickB > 5 * Body AND WickB > 2 * ATR30;
H 		= IIf( TestT, Max( CBT, L + ATR30 ), H );
L 		= IIf( TestB, Min( CBB, H - ATR30 ), L );
return TestT OR TestB;
}

SetBarFillColor( colorLightOrange );
Plot( C, "Close", colorLightOrange, styleCandle | styleNoLabel | styleNoTitle, Null, Null, Null, -1 );
Plot( C, "Close", colorDefault, styleCandle, Null, Null, Null, 0 );
``````

First of all, thank you for asking a good question that succinctly explains the intent.

To set the context, previously Tomasz posted:

On page 16 that whitepaper clarifies:

the purpose of this paper is to overview the subject of high frequency data filtering
and briefly describe the methodologies employed by Tick Data, Inc. This paper is not intended
to fully disclose the filtering process, as full disclosure is not appropriate in a paper intended as an
overview on the subject.

Consider a scenario, wherein, by happenstance multiple bad ticks are received; would ATR still be potent for successive bad tick? ATR would get spoiled because of that unusual spread! To fix that you would need to iteratively reset ATR with newly transformed Highs or Lows.

Statistically speaking bad ticks are nothing but outliers. AFAIK using percentiles and z-scores, outliers of a dataset can be easily detected. Z-scores seems fit as it prioritizes spread-tendency of dataset (unlike Median depicting central-tendency).

Basically, Z-score tells us how many standard deviations the data point is away from the mean. A +ve Z-score indicates that the data point is above the mean, while a -ve Z-score indicates that the data point is below the mean. The magnitude of the z-score indicates the distance from the mean in terms of standard deviations.

Here's the formula to calculate the Z-score:

``````z = ( x - μ ) / σ
``````

where:

• x is the data point
• μ is the mean of the dataset
• σ is the standard deviation of the dataset

Algorithm to calculate Z-score:

1. Calculate the mean (μ) and the standard deviation (σ) of the dataset.
2. Choose the data point (x) for which you want to calculate the Z-score.
3. Subtract the mean (μ) from the data point (x) = (x - μ)
4. Divide the result by the standard deviation (σ) = (x - μ) / σ

Therefore, data points that have Z-scores beyond a certain threshold (e.g., 2 or 3) are considered outliers (bad ticks).

2 Likes

Hi Cougar, thank you for your very thoughtful comments and input. It's funny that I already coded a Z-score function a few years ago, but never considered applying it to the wicks!

Yesterday I was trading NVDA and saw some very long wicks in the premarket that were actually legitimate. Although it was satisfying to see the long wicks highlighted, the legitimately tall wicks made me think about false positives and that I should spend some more time thinking about how to proceed.