Code to identify Clusters


I am trying to identify and group similar values that i call cluster. It looks something like above. A PF has a value PL. If previous PF is PL distance (or nearer) before it than the two PFs are same cluster. The chart is self explanatory.
Below is the code that does it. The result is perfect. But I want to remove the loop. Sample data is attached. Open it in 4 hours interval chart.

_TRACE( "!CLEAR!" );
_SECTION_BEGIN( "Price" );
SetChartOptions( 0, chartShowArrows | chartShowDates | chartWrapTitle );
_N( Title = StrFormat( "{{NAME}} - {{INTERVAL}} {{DATE}} Open %g, Hi %g, Lo %g, Close %g (%.1f%%) {{VALUES}} ", O, H, L, C, SelectedValue( ROC( C, 1 ) ) ) );
PlotOHLC( O, H, L, C, "Close", colorRed, styleBar , Null, Null, 0, 1, 1 );
_SECTION_END();
SetBarsRequired( sbrAll, sbrAll );
largenumber = 1e9;
_TRACE( "BarIndex()=" + BarIndex() );
PL = 0;

PF =/*Cluster_1*/( BarIndex() == 24 || BarIndex() == 27 || BarIndex() == 30 || BarIndex() == 31 || BarIndex() == 33 || BarIndex() == 40 )
                 || /*Cluster_2*/( BarIndex() == 55 || BarIndex() == 57 || BarIndex() == 58 )
                 || /*Cluster_3*/( BarIndex() == 70 || BarIndex() == 71 )
                 || /*Cluster_4*/( BarIndex() == 82 )
                 || /*Cluster_5*/( BarIndex() == 103 || BarIndex() == 108 || BarIndex() == 115 )
                 || /*Cluster_6*/( BarIndex() == 117 || BarIndex() == 119 );

//cluster_1
PL = IIf( BarIndex() == 24, 15, PL );
PL = IIf( BarIndex() == 27, 20, PL );
PL = IIf( BarIndex() == 30, 21, PL );
PL = IIf( BarIndex() == 31, 26, PL );
PL = IIf( BarIndex() == 33, 15, PL );
PL = IIf( BarIndex() == 40, 15, PL );

//cluster_2
PL = IIf( BarIndex() == 55, 10, PL );
PL = IIf( BarIndex() == 57, 20, PL );
PL = IIf( BarIndex() == 58, 23, PL );

//cluster_3
PL = IIf( BarIndex() == 70, 10, PL );
PL = IIf( BarIndex() == 71, 11, PL );

//cluster_4
PL = IIf( BarIndex() == 82, 7, PL );

//cluster_5
PL = IIf( BarIndex() == 103, 20, PL );
PL = IIf( BarIndex() == 108, 25, PL );
PL = IIf( BarIndex() == 115, 26, PL );

//cluster_6
PL = IIf( BarIndex() == 117, 1, PL );
PL = IIf( BarIndex() == 119, 3, PL );



barsSincePF = Ref( BarsSince( PF ), -1 );
ClusterID = PF * Cum( PF );

previousClusterID = ValueWhen( PF, ClusterID, 2 );
//how do i remove this loop?
for (i=1;i<50;i++)
{
clusterID = IIf( PF, IIf( clusterID != 1, IIf( barsSincePF < PL, previousClusterID, clusterID ), 1 ), Null );
previousClusterID = ValueWhen( PF, ClusterID, 2 );
clusterID = IIf( PF, IIf( clusterID != 1, IIf( clusterID != previousClusterID, IIf( ( clusterID - previousClusterID ) > 1, previousClusterID + 1, clusterID ), clusterID ), clusterID ), Null );
}
clusterID = IIf( IsNull( clusterID ), largenumber, clusterID );

barsSinceLastCluster = BarsSinceCompare( clusterID, "<", clusterID );
clusterID = IIf( clusterID == largenumber, null, clusterID );
clusterSize = IIf( clusterID == 1, Cum( PF ), Sum( PF, barsSinceLastCluster ) );
SizeOfLargestCluster = LastValue( Highest( clusterSize ) );

_TRACE( "clusterID=" + clusterID + ", PL="+PL+", clusterSize=" + clusterSize + ", barsSinceLastCluster=" + barsSinceLastCluster + ", SizeOfLargestCluster=" + SizeOfLargestCluster );


atr1 = ATR( 5 ) * 0.8;
PlotShapes( PF * shapeCircle, colorPink, 0, L - atr1, 0 );
Plot( BarIndex(), "barindex", colorwhite, styleNoLine + styleNoLabel + styleNoRescale );

@AlgoEnthusiast , at present the loop seems useless since it does not change any value at each iteration. So you can comment it out... but probably this is not what you had in mind.
Unfortunately I do not understand the purpose and the use of so many hard-coded values so I cannot add anything more useful.

It seems to me that in the code with the commented loop (red arrow) the signals are identical to those of the posted formula.
Note that the code within the { } (that I left on purpose) is still executed once.

1 Like

Thank you for responding. Without the loop ClusterID, ClusterSize etc are not correct.

ChatGPT just gave up after 1 hour and 28 tries. anyone?

Hi,
Decided to have a go at this as an exercise in programming.
Some of your code makes no sense to me so I just started from scratch after trying to get an idea of what you are trying to do. I haven't tried to match your output exactly.
Making the assumption that a PL element value is the number of preceding bars required to define that element as the start of a new cluster, I came up with my ClusterStart calculation in my code.

You also seem to want to know, I assume on a continuous basis, how many bars since the end of a cluster. Effectively we need a ClusterEnd indicator. As I see it this would effectively need to 'look into the future' since you cant know if the cluster has actually ended until the next PL element and if its value signifies that enough bars have elapsed to initiate a new cluster.
IMHO this code needs to be treated with caution. Anyway, to continue the exercise this is what I came up with. It seems a bit messy but does what I intended.
Hopefully there might be useful ideas here for you. Went overboard with comments as usual.
John.

Plot( Close, "Close", colorBlack, styleBar );
_N( Title = StrFormat( "{{NAME}} - {{INTERVAL}} {{DATE}} Open %g, Hi %g, Lo %g, Close %g (%.1f%%) {{VALUES}} ", O, H, L, C, SelectedValue( ROC( C, 1 ) ) ) );

bi = BarIndex();
PL = 0;
Off = FirstVisibleValue( bi ) - 20;  // position it on any arbitrary chart
//cluster_1
PL = IIf( bi == ( Off + 24 ), 15, PL );
PL = IIf( bi == ( Off + 27 ), 20, PL );
PL = IIf( bi == ( Off + 30 ), 21, PL );
PL = IIf( bi == ( Off + 31 ), 26, PL );
PL = IIf( bi == ( Off + 33 ), 15, PL );
PL = IIf( bi == ( Off + 40 ), 15, PL );

//cluster_2
PL = IIf( bi == ( Off + 55 ), 10, PL );
PL = IIf( bi == ( Off + 57 ), 20, PL );
PL = IIf( bi == ( Off + 58 ), 23, PL );

//cluster_3
PL = IIf( bi == ( Off + 70 ), 10, PL );
PL = IIf( bi == ( Off + 71 ), 11, PL );

//cluster_4
PL = IIf( bi == ( Off + 82 ), 7, PL );

//cluster_5
PL = IIf( bi == ( Off + 103 ), 20, PL );
PL = IIf( bi == ( Off + 108 ), 25, PL );
PL = IIf( bi == ( Off + 115 ), 26, PL );

//cluster_6
PL = IIf( bi == ( Off + 117 ), 1, PL );
PL = IIf( bi == ( Off + 119 ), 3, PL );

PF = IsTrue( PL );  // create boolean version. PL could be used directly as boolean.

CumPF = Cum( PF );
ClusterStart = ( Ref( BarsSince( PF ), -1 ) >= ValueWhen( PF, PL ) OR CumPF == 1 ) AND PF;

PL_r = Reverse( PL );  // start of ClusterEnd calc
PL_r[0] = 1;  // add last bar signal to initiate backwards count
ClusterFinished = Reverse( Ref( BarsSince( PL_r ) >= ValueWhen( PL_r, PL_r ), -1 ) );  // state array
ClusterEnd = ExRem( ClusterFinished, NOT ClusterFinished );  // impulse array
ClusterEnd[0] = 0;  // clear spurious signal at first bar

InCluster = Flip( ClusterStart, ClusterEnd ) OR ClusterEnd;  // With start and end can confine outputs to within cluster

CurrentClusterID = IIf( InCluster, Cum( ClusterStart ), 0 );
ClusterPointCount = IIf( InCluster, Nz( CumPF - ValueWhen( ClusterStart, Ref( CumPF, -1 ) ) ), 0 );
HighestClusterPointCount = Highest( ClusterPointCount );
BarsSinceClusterEnd = Nz( BarsSince( ClusterEnd ) );

// Plots (I like to put stuff on the chart to visualize it better)
Title += "\n" + "CurrentClusterID=" + CurrentClusterID + ", ClusterPointCount=" + ClusterPointCount +
         ", BarsSinceClusterEnd=" + barsSinceClusterEnd + ", HighestClusterPointCount=" + HighestClusterPointCount;
PlotOffset = L - ATR(5) * 0.5;
PlotShapes( IIf( PF, shapeSmallCircle, shapeNone ), colorOrange, 0, PlotOffset );
// also decided to plot PL values directly on chart whilst building code
for( x = FirstVisibleValue( bi ); x <= LastVisibleValue( bi ); x++ )
{
    if( PF[x] )
    {
        PlotText( NumToStr( PL[x], 1.0 ), x, PlotOffset[x], colorBlack, colorDefault, -25 );
        //PlotText( NumToStr( ClusterPointCount[x], 1.0 ), x, PlotOffset[x], colorRed, colorDefault, -38 );
    }
}