On 2019-09-22 00:42, Markos wrote:
Hi,

I have a table.csv file with the following structure:

, Polyarene conc ,, mg L-1 ,,,,,,,
Spectrum, Py, Ace, Anth,
1, "0,456", "0,120", "0,168"
2, "0,456", "0,040", "0,280"
3, "0,152", "0,200", "0,280"

I open as dataframe with the command:

data = pd.read_csv ('table.csv', sep = ',', skiprows = 1)

and the variable "data" has the structure:

Spectrum,  Py,  Ace, Anth,
0  1         0,456  0,120  0,168
1  2         0,456 0,040 0,280
2  3         0,152 0,200 0,280

I copy the numeric fields to an array with the command:

data_array = data.values [:, 1:]

And the data_array variable gets the fields in string format:

[['0,456' '0,120' '0,168']
['0,456' '0,040' '0,280']
['0,152' '0,200' '0,280']]

The only way I found to change comma "," to dot "." was using the method
replace():

for i, line in enumerate (data_array):
data_array [i] = ([float (element.replace (',', '.')) for element in
data_array [i]])

But I'm wondering if there is another, more "efficient" way to make this
change without having to "iterate" all elements of the array with a loop
"for".

Also I'm also wondering if there would be any benefit of making this
modification in dataframe before extracting the numeric fields to the array.

Please, any comments or tip?

I'd suggest doing all of the replacements in the CSV file first, something like this:

import re

with open('table.csv') as file:
    csv_data = file.read()

# Convert the decimal points and also make them look numeric.
csv_data = re.sub(r'"(-?\d+),(\d+)"', r'\1.\2', csv_data)

with open('fixed_table.csv', 'w') as file:
    file.write(csv_data)
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to