On 8/2/2017 1:05 PM, MRAB wrote:
On 2017-08-02 16:05, Daiyue Weng wrote:
Hi, I am trying to removing extra quotes from a large set of strings (a
list of strings), so for each original string, it looks like,

"""str_value1"",""str_value2"",""str_value3"",1,""str_value4"""


I like to remove the start and end quotes and extra pairs of quotes on each
string value, so the result will look like,

"str_value1","str_value2","str_value3",1,"str_value4"


and then join each string by a new line.

I have tried the following code,

for line in str_lines[1:]:
             strip_start_end_quotes = line[1:-1]
             splited_line_rem_quotes =
strip_start_end_quotes.replace('\"\"', '"')
             str_lines[str_lines.index(line)] = splited_line_rem_quotes

for_pandas_new_headers_str = '\n'.join(splited_lines)

Do you actually need the list of strings joined up like that into one string, or will the one string just be split again into multiple strings?

but it is really slow (running for ages) if the list contains over 1
million string lines. I am thinking about a fast way to do that.

[snip]

The problem is the line:

     str_lines[str_lines.index(line)]

It does a linear search through str_lines until time finds a match for the line.

To find the 10th line it must search through the first 10 lines.

To find the 100th line it must search through the first 100 lines.

To find the 1000th line it must search through the first 1000 lines.

And so on.

In Big-O notation, the performance is O(n**2).

The Pythonic way of doing it is to put the results into a new list:


new_str_lines = str_lines[:1]

for line in str_lines[1:]:
     strip_start_end_quotes = line[1:-1]
     splited_line_rem_quotes = strip_start_end_quotes.replace('\"\"', '"')
     new_str_lines.append(splited_line_rem_quotes)


In Big-O notation, the performance is O(n).

Making a slice copy of all but the first member of the list is also unnecessary. Use an iterator instead.

lineit = iter(str_lines)
new_str_lines = [next(lineit)]
for line in lineit:
    ...


--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to