Time consuming loading of a CSV file into a matrix

Hi Mates,
for my last project need to load some data from a csv file. The file is basically a time series of multiple (more than 60) parameters for each day.

I've written a function taking some parts from the knowledge base and the manual that works but I'm not completely satisfied of it because in my opinion it is quite slow (see the code below). Considering that every ticker has its own different parameter's file and that I'm gonna run portfolio optimization, I'm trying to tweak the code for optimal performance before starting the heavy job.

I've profiled the code but I don't understand exactly what's the most time consuming part hence where to focus my efforts (see screenshot). In particular it is not clear to me why if fgets+strToDatetime+StrExtract (the most time consuming functions) totally require around 20ms then the total time of the function is around 440ms.

So my questions are:

  • in your opinion is the function time efficient?
  • if not then how can I try to improve it?
  • considering the profile, where the time is wasted and how you understand it?

Many thanks for any insight or suggestion

SCREENSHOT
Screenshot

FORMULA

//	csvLoadProfiling.afl
function loadCSVIntoMx(sourcePath, fileName)
{
	sourceFile =  sourcePath + fileName;
	fh = fopen(sourceFile, "r");
	rowID = 0;
	
	dataMx = Matrix(BarCount, 68,  Null);
	
	if (fh != 0)
	{
		while (! feof(fh))
		{
			rowStr = fgets(fh);
			if (rowID < BarCount AND NOT StrMatch(rowStr, ""))
			{
				if (rowID == 0)
				{
					rowID ++;
					continue; //skip header
				}
				else
				{
					dataMx[rowID-1][0] = StrToDateTime(StrExtract(rowStr, 0));
					for (colID = 1; colID <= 66; colID++)
					{
						dataMx[rowID-1][colID] = StrToNum(StrExtract(rowStr, colID));
					}
					rowID ++;
				}
			}
		}
		fclose(fh);
	}
	else
	{
		printf("\nERROR   file not found");
		printf("\n" + sourceFile);
	}
	return dataMx;	
}

path =  "C:\\Program Files\\AmiBroker\\Formulas\\Tests\\";
file = "csvData.csv";
loadCSVIntoMx(path, file);

RAW CSV DATA HERE

Even though you posted a function code, it still seems partial.

What i don't understand is why MX has one axis size as BarCount.

If you had a WatchList and want to store data for each symbol, the max size would be Status("stocknum")

AFL Function Reference - STATUS (amibroker.com)

You currently have 68 columns and around 6500 rows in the CSV file. ( Quite large)
but depending on the timeframe, BarCount can run into 100s of K

1 Like

There are three solutions to make it fast:

  1. Use MxFromString (but then your file should be in proper matrix format, not CSV)
    AFL Function Reference - MXFROMSTRING

  2. Use existing code but DO NOT re-run it all the time. Instead run it once and store matrix to persistent static variable (StaticVarSet)

  3. Use ODBC plugin with CSV driver https://www.amibroker.com/odbc.html

  4. Store data into mySQL database and access using ODBC plugin

2 Likes

Hi nsm51, you're right the function is part of a larger program so I had to take it out and adapt it into a new file to make it "forum compatilble", so maybe this is the reason of the misunderstood.
I have one of this file for each ticker, but I don't need to aggregate together into a bigger 100s k line matrix, because every ticker will run on a dedicated instance of the program.

Thank you Tomasz for the insight, I'll try the solutions you suggested. I think that 1 and 2 should do the trick for my needs.

And regarding the profiling, have you any idea about why the timings are so different (between the sum of the elementary functions execution time and my custom function's execution time) ?

This topic was automatically closed 100 days after the last reply. New replies are no longer allowed.