Data outlier issue

How do i fix or skip bad data outlier that is messing up my charts as shown below? (monhtly/quarterly charts)

badoutlierdataiqfeed

Try it by using "standard deviation * some_multiplier"...

In the below code I only filter High array;

Do the same for O, L, C arrays if required...

/// Filter large outliers in price data by code
/// Here High array is filtered only
/// Do the same for O, L, C arrays if required...
/// adjust "mult" variable to your needs
/// @link https://forum.amibroker.com/t/data-outlier-issue/7922
/// snippet by fxshrat
mult = 10;

bi = BarIndex();
fvb = FirstVisibleValue(bi);
lvb = LastVisibleValue(bi);

stdperiod = lvb-fvb+1;//BarCount-1;//

prevH = Ref(HighestSince(bi == bi[fvb], H), -1);
//prevH = Ref(HighestSince(bi == bi[0], H), -1); 
//prevH = Ref(H,-1); 

Hsigma = mult * StDev(H, stdperiod);
// uncomment below line if stdperiod = lvb-fvb+1;
Hband = prevH + Hsigma[lvb]; 
// uncomment below line if stdperiod = BarCount-1;
//Hband = prevH + LastValue(Hsigma);
printf( "Hsigma %g, HBand %g", Hsigma, Hband );

Hcond = H < Hband;
H = SparseExpand( Hcond, SparseCompress( Hcond, H ) );
H = IIf( IsNull(H), Ref(H, -1), H );

Plot( C, "Price", colorDefault, styleBar, Null, Null, 0, 0, -20 );

Before
450

After
518

3 Likes

BTW, instead of upper three overcomplicating lines you would just need this single line

H = IIf( H < Hband, H, Ref(H, -1) );
3 Likes

How do i apply this formula so that the bad data is eliminated and my chart is populated right?
(Should i overlay or insert linked in the chart window and save it as a template?)

I was looking for some sort of code which goes to the DB and modifies the high value that is messing up the chart - so the data is fixed for once and all.

I also checked with IQFEED folks and they said they dont see this bad at their end and so i right clicked on "OK" and clicked "Force BackFill" assuming that will fix the problem - but it is persisting - am i doing something wrong here?

@uvdsatish you can use the Quote Editor to fix the outlier value directly in the DB.
Obviously, after this manual change you should no longer query/import the same past data from your data provider.

1 Like

Obviously the spike remains in IQfeed's data feed after each backfill.


So if you do not want to apply annoying manual data manipulation via quote editor each time after backfill simply create an include file saved to Formulas\Include folder.

E.g. save below sample code as FilterDataSpikes.afl to Include folder.
And e.g. in your applied main AFL(s) at the top (1st line) add following #include line:

 #include <FilterDataSpikes.afl>

If you are too lazy to write upper line each time then add upper line to code snippets.


Include file content:

/// THIS IS INCLUDE FILE
/// Filter large outliers in OHLC price data by code
///
/// Save the code to AFL file e.g. named "FilterDataSpikes.afl"
/// save that file to Formulas/Include folder,
/// in your applied main AFL(s) (e.g. in chart pane) at the top (1st line) add:
/// #include <FilterDataSpikes.afl>
///
/// @link https://forum.amibroker.com/t/data-outlier-issue/7922/6
/// snippet by fxshrat
stdmult = 10;// set stdev multiplier value
stdperiod = BarCount-1;

array1 = O;
array2 = H;
array3 = L;
array4 = C;

for( i = 1; i <= 4; i++ ) {
	arr = VarGet("array" + i);
	prevhighest = Ref(Highest(arr),-1);
	sigma = stdmult * StDev(arr, stdperiod);
	VarSet( "band" + i, prevhighest + LastValue(sigma) );
}

altOC = IIf(L < band3, Max(L, Ref(C, -1)), Min(H, Ref(C, -1)));
O = IIf( O < band1, O, altOC );
C = IIf( C < band4, C, altOC );
H = IIf( H < band2, H, Max(O, C) );
L = IIf( L < band3, L, Min(O, C) );

4 Likes

I tried the quote editor - but surprisingly there is discrepancy between what quote editor is showing vs what the chart is displaying - this is interesting!Capture-dataerror

I tried the code stuff as well as suggested - i "clicked edit formula" and used the include line - but no change in the chart as cited below:
codedataerror

@uvdsatish
Look at your quarterly chart value is 70000 where monthly & weekly is 1800.
Right click on quarterly chart, select "Delete indicator" & remove the unnecessary AFL.
Also take a look in your code need of ‚ÄústyleNoRescale‚ÄĚ when plotting Horizontal line if via afl.

1 Like

In the include file simply lower the value of stdmult variable! Obviously 10 times stdev is too much. So change it to 5 or 3 or ...

Also using stdev is just an example! You have to decide what you do consider as unacceptable data spike You but not us!

Instead of

sigma = stdmult * StDev(arr, stdperiod);

You could change to a fix value

sigma = 1000; // stdmult * StDev(arr, stdperiod);

If you think that "current" bar's highest value shall not be 1000 points higher than previous highest high value.

Or change to

sigma = 5 * prevHighest; // stdmult * StDev(arr, stdperiod);

if you want that "current" bar's highest value shall not be 5 times higher than previous highest high value.

Or use some percentile calculation and so on.....

Again, it is up to you how and what you want to filter.


As aside...Also I just looked at the symbol you are using... it seems to be Russell 2000 and in 2008 at the start of the year the price range was around 700 but your spike value is at 70000. So to me it looks like as if in the feed a comma is missing at some data point in 2008.


I do not know what you are talking about in regards to "unnecessary". Apparently there is a data error in the price data. styleNorescale will not adjust formula calculations being dependent on OHLC.. arrays. They will still include the spike data in their calculation! If you do not want that then you have to filter the spike data (or have to remove manually every time).


BTW you could make the include file dependent on symbols of a Watchlist (such as WL "FilterDataOutliers" ). Then the include file's calculations are only applied to symbols which are members of that WL. So create a watchlist named FilterDataOutliers and if you initially witness a symbol having spikes that you do not want to accept then simply move that symbol to that watchlist...

And again StDev is just an example!

/// Only symbols of watchlist named "FilterDataOutliers" will be considered
if( InWatchListName( "FilterDataOutliers" ) ) 
{
/// THIS IS INCLUDE FILE
/// Filter large outliers in OHLC price data by code
///
/// Save the code to AFL file e.g. named "FilterDataSpikes.afl"
/// save that file to Formulas/Include folder,
/// in your applied main AFL(s) (e.g. in chart pane) at the top (1st line) add:
/// #include <FilterDataSpikes.afl>
///
/// @link https://forum.amibroker.com/t/data-outlier-issue/7922/11
/// snippet by fxshrat
	stdmult = 5;// set stdev multiplier value
	stdperiod = BarCount-1;

	array1 = O;
	array2 = H;
	array3 = L;
	array4 = C;

	for( i = 1; i <= 4; i++ ) {
		arr = VarGet("array" + i);
		prevhighest = Ref(Highest(arr),-1);
		sigma = stdmult * StDev(arr, stdperiod);// or use FIX value such as 1000 etc or 5*prevhighest
		VarSet( "band" + i, prevhighest + LastValue(sigma) );		
		printf( "\nband%g: %g, sigma:%g", i, VarGet( "band" + i ), LastValue(sigma) );
	}

	altOC = IIf(L < band3, Max(L, Ref(C, -1)), Min(H, Ref(C, -1)));
	O = IIf( O < band1, O, altOC );
	C = IIf( C < band4, C, altOC );
	H = IIf( H < band2, H, Max(O, C) );
	L = IIf( L < band3, L, Min(O, C) );
}
4 Likes

@fxshrat
Unnecessary afl mean something he copy from some random site.
Sorry, due to my writing style you get me wrong.

1 Like

Thank you - the coding solved the problem!
But - i am still curious - why the IQFEED charts/data doesnt have the outlier, why the Amibroker quoteseditor doesnt have the data - but the chart has that outlier bar!

Please listen... it does have it there. If the formula solves the issue by setting new target OHLC arrays then the outlier data is definitely there within that symbol.

How to get to that outlier line in quote editor:
Before comment line

 #include <FilterDataSpikes.afl>

of chart pane's AFL and then save that AFL (click Apply button)

Now

  1. Set the chart interval to base time interval of DB (quote editor interval seems to show 1-minute TF). So set chart interval to 1-minute TF.
  2. Now scroll to the spike of year 2008 in that 1-minute chart.
  3. Then select the spike bar in the chart
  4. Open quote editor
  5. Click "Go to selected" button at the left bottom of quote editor
  6. Check the last row at the bottom showing up after that click
  7. There you should see the outlier within OHLC columns' cells of that row.

Next test (checking whether IQfeed has fixed the data)... disconnect from IQFeed first.

Repeat upper 7 steps.

Step 8 onwards:

  1. click that spike row in quote editor
  2. click delete button of quote editor
  3. spike should be removed now
  4. close quote editor and reconnect to IQfeed.
  5. Do backfill again and check whether outlier data shows up in chart again.
  6. If it does then that erroneous data is still sent by IQFeed.
  7. Ask IQfeed (whatever you want about it)
  8. Uncomment #include <FilterDataSpikes.afl> again and save AFL
1 Like

For what it is worth, bad data points in historical data can and should be reported to IQFeed as that is the only proper place to fix them. See for example: http://forums.iqfeed.net/index.cfm?page=topic&topicID=2579 and this http://forums.iqfeed.net/index.cfm?page=topic&topicID=3201

1 Like