Doing *huge* explorations? You can see upto 20x speed increase in next version

Typically explorations are run in a blink of the eye. But recently I found out that when you run exploration that produces hundreds of columns and millions of rows it may take significant amount of time just to produce huge result set.

Take for example the exploration like this:

for( i = 0; i < 650; i++ )
{
   AddColumn( C, "C " + i );
}
Filter = 1;

can easily produce gigabytes of data. In version 6.31 (and any earlier) such exploration when on just 20 years of DIJA components requires 37 seconds using 12 threads to produce 90 million data cells (500MB worth of text).

Although 37 seconds may not look too long, I wasn't happy with that and I profiled the code to find out what consumes the most time. As it turned out AFL execution took only 0.35 seconds. The rest was spent just in Microsoft C runtime library sprintf() function that formats numbers to string.

As it turned out, I wasn't alone not being too happy with floating point number formatting speed. Other engineers (including Google's) wrote their own conversions that are much faster, see: https://github.com/miloyip/dtoa-benchmark The best, state-of-the-art was approx. 9x faster than sprintf. I tried that one with exploration and it reduced exploration from 37 seconds down to 5 seconds. Impressive but I still wasn't happy. I thought that it must be done quicker.

So I wrote my own procedure - and ... my formatting function is more than 2 times faster than "state-of-the-art" and got the exploration time from 37 seconds down to 1.80 second. That is 20x faster than sprintf version.

I just thought that I will share with you as things like that are done in AmiBroker constantly but I almost never tell such stories because I am just too busy, but since this happens under the hood many people don't notice and they conclude that "not much changed".

(Note all measurements apply to 32-bit version. 64-bit version is even faster because SSE2 instructions are used)

40 Likes

I also want to share with you a curious thing: I have looked up the word "perfectionist" in the dictionary, and the definition that appears is: "Tomasz". :wink:

5 Likes

If Tomasz was to code Windows 10, it would fit on a single 1.44MB floppy disk :wink:

But seriously @Tomasz instead of rewritting certain parts of MS buggy operating system (As described here) it would be easier for you to create an alternative one from scratch --> AmiBroker OS :slight_smile:

7 Likes

Hi @Tomasz,

I would like to take this as an opportunity to thank you and appreciate your work.

AmiBroker is already accurate and fast in all aspects and kudos to you for making Explorations even faster.

What impresses me the most is AmiBroker's power to Optimize! Seriously, Tomasz there is nothing else on planet Earth that can achieve accurate optimization of complex Trading systems like AmiBroker does with such remarkable speed and ease.

Thank you

5 Likes

Thanks Tom sound exciting.

Curious, with the new chip architecture of AMD Epyc Rome and fairly affordable dual cpu board configs, do you see ever going past 32 threads at any point?

-S

@Sean I can unlock more than 32 threads per analysis window - no problem with that but
have no such hardware to test.

Oh it per analysis window... I forgot.

My build will start off, NOT with the flagship CPU's but might have the 2x 16 or 24 core Romes, until the prices come down and they start getting sold used and maybe someday go for those 64 core beasts.
-S

We've found something similar... binary-to-text conversions or vice-versa introduce a massive performance hit (minimum 0.5 but typically 1-2 orders of magnitude).

Floating point to ASCII presents some interesting rounding corner cases too.

I love Amibroker!
Thank you!

1 Like