Removing thousands separator inside of quotes in a CSV file

I thought I might have to do strange things with CSV files one day, and today was the day. I was importing some transactions from Fidelity into composites for charting. Here's a sample of the CSV file:

Date,Investment,Transaction Type,Amount,Shares/Unit
06/30/2021,VANG SM VAL IDX INST,Exchange In,"1,248.46","29.940"

I had to write code to get rid of the commas inside the quotes because I couldn't use StrReplace().

I created a function to do this, since maybe it's not the last time:

function RemoveCommaInQuotes( String )
{
	// Removes unwanted commas inside of quotes, e.g. in a CSV file where the commas are needed so can't use StrReplace().
	// Useful for CSV files, e.g. exported Fidelity transactions.
	// 20210812

	InQuotes = 0;
	NewString = "";
	
	for( i = 0; i < StrLen( String ) ; i++ )
	{
		Char = StrMid( String, i, 1 );
		
		if( Char == "\"" )
		{
			InQuotes = 1 - InQuotes;
			Char = "";
		}
		if( Char == "," AND InQuotes ) Char = "";
		
		NewString += Char;
	}

	return NewString;	
}

I hope this helps someone. :sunglasses:

1 Like

AFAICS, you would not need StrReplace.
But also you do not need to iterate entire length of a line for n-number of lines as in your case.
I think that could be quite slow on thousands of lines (e.g. 1,000 lines x 50 chars = 50,000 iterations of nested loop).

You would just have to iterate columns (via StrExtract() function).
So this one below could be faster on thousands of lines since it iterates columns (e.g. 1,000 lines x 7 columns = 7,000 iterations) so less number of substrings.

function RemoveCommaInQuotes(line, num_columns) {
	// https://forum.amibroker.com/t/removing-thousands-separator-inside-of-quotes-in-a-csv-file/27121/2
	// by fxshrat@gmail.com
	local i, col, col_new, cs_col, line_new;
	quote = "\"";
	for ( i = 0; i < num_columns; i++ ) {				
		VarSetText("col"+i, StrExtract(line,i));				
	}
	for ( i = 0, cs_cols = ""; i < num_columns; i++ ) {				
		col = VarGetText("col"+i);
		prev_col = VarGetText("col"+(i-1));	
		// skip columns that e.g. look like this: "1
		if (  StrFind(StrLeft(col,1), quote) AND 
			! StrFind(StrRight(col,1),quote) ){
			continue;
		}
		// if there is previous column that e.g. looks like this: "1
		// then combine it with current column
		if (  StrFind(StrLeft(prev_col,1), quote) AND 
			! StrFind(StrRight(prev_col,1),quote) ) {
			col_new = prev_col+col;	
		// else just use current column	
		} else col_new = col;
		//
		cs_cols += col_new+",";
	}
	line_new = StrTrim(cs_cols,",");
	return line_new;
}

So since in your file example you have overall 5 columns and 2 columns where there could be thousands separator you would have to insert not 5 but 7 (5+2) as num_columns argument of upper function.

num_columns = 7;
cs_lines = "";

if ( ParamTrigger("Create File", "Click HERE")) {

	fh = fopen( "C:\\filename.txt", "r" );	
	if( fh ) {		
		while( ! feof( fh ) ) {
			line = RemoveCommaInQuotes(fgets(fh),num_columns);	
			cs_lines += line;	
		}
		fclose( fh );
	} else	{
		Error( "ERROR: file can not be open" );
	}
	
	fh2 = fopen("C:\\filename_new.txt", "w" );
	if( fh2 ) { 
		fputs( cs_lines, fh2 );
		fclose( fh2 );
	} else 	{ 
		Error("ERROR: file can not be open"); 
	} 
}

Before:
26

After:
27

1 Like

Another version with two functions less:

function RemoveCommaInQuotes(line, num_columns) {
	// https://forum.amibroker.com/t/removing-thousands-separator-inside-of-quotes-in-a-csv-file/27121/3
	// by fxshrat@gmail.com
	local i, col, col_new, cs_col, line_new;
	quote = "\"";
	for ( i = 0; i < num_columns; i++ ) {				
		VarSetText("col"+i, StrExtract(line,i));				
	}
	for ( i = 0, cs_cols = ""; i < num_columns; i++ ) {				
		col = VarGetText("col"+i);
		prev_col = VarGetText("col"+(i-1));	
		// skip columns that look like this: "1
		if (  StrFind(col,quote) AND 
			! StrFind(StrRight(col,1),quote) ){
			continue;
		}
		// if there is previous column that e.g. looks like this: "1
		// then combine with current column
		if (  StrFind(prev_col,quote) AND 
			! StrFind(StrRight(prev_col,1),quote) ) {
			col_new = prev_col+col;
		// else just use current column	
		} else col_new = col;
		//
		cs_cols += col_new+",";
	}
	line_new = StrTrim(cs_cols,",");
	return line_new;
}
2 Likes

Wow, this is great. It was interesting code to read... I hadn't thought of skipping most of the string instead of examining every character. I'm sure this would be faster.

Also learned about "continue" which I hadn't seen before, so I'm grateful that you used it so I could go look it up.

Thanks! :slightly_smiling_face:

1 Like

@PeterD,

Here is shortest/fastest version so far:

function RemoveCommaInQuotes(line) {
	// https://forum.amibroker.com/t/removing-thousands-separator-inside-of-quotes-in-a-csv-file/27121/5
	// by fxshrat@gmail.com
	local i, col, cs_cols, quote, replace, result;
	quote = "\"";
	replace = StrReplace(line, ","+quote, ";");
	for ( i = 0, cs_cols = ""; i < 3; i++ ) {
		col = StrExtract(replace, i, ';'); 
		if ( i > 0 ) col = StrReplace(col, ",", "");
		cs_cols += col +",";
	}	
	result = StrTrim(cs_cols,",");
	return StrReplace(result, quote, "");
}

So you can use StrReplace very well too.

NOTE: upper function works on file data such as following one:

Before:

Date,Investment,Transaction Type,Amount,Shares/Unit
06/30/2021,VANG SM VAL IDX INST,Exchange In,"1,248.46","29.940"
06/31/2021,VANG SM VAL IDX INST,Exchange In,"159.12","1,126.83"

After:

Date,Investment,Transaction Type,Amount,Shares/Unit
06/30/2021,VANG SM VAL IDX INST,Exchange In,1248.46,29.940
06/31/2021,VANG SM VAL IDX INST,Exchange In,159.12,1126.83

NOTE: num_columns argument has been removed.

2 Likes

Amazing... I hadn't thought of replacing the comma and quote together and using the semicolon. That's a very clever way of doing it!

Thanks again for the education!

This topic was automatically closed 100 days after the last reply. New replies are no longer allowed.