On Sunday, August 30, 2015 at 1:16:12 PM UTC-4, MRAB wrote:
> On 2015-08-30 17:31, kbtyo wrote:
> > On Saturday, August 29, 2015 at 10:50:18 PM UTC-4, MRAB wrote:
> >> On 2015-08-30 03:05, kbtyo wrote:
> >> > I am using Jupyter Notebook and Python 3.4. I have a data structure in 
> >> > the format, (type list):
> >> >
> >> > [{'AccountNumber': N,
> >> > 'Amount': '0',
> >> >   'Answer': '12:00:00 PM',
> >> >    'ID': None,
> >> >    'Type': 'WriteLetters',
> >> >    'Amount': '10',
> >> >    {'AccountNumber': Y,
> >> >        'Amount': '0',
> >> >        'Answer': ' 12:00:00 PM',
> >> >         'ID': None,
> >> >        'Type': 'Transfer',
> >> >        'Amount': '2'}]
> >> >
> >> > The end goal is to write this out to CSV.
> >> >
> >> > For the above example the output would look like:
> >> >
> >> > AccountNumber, Amount, Answer, ID, Type, Amount
> >> > N,0,12:00:00 PM,None,WriteLetters,10
> >> > Y,2,12:00:00 PM,None,Transfer,2
> >> >
> >> > Below is the function that I am using to write out this data structure. 
> >> > Please excuse any indentation formatting issues. The data structure is 
> >> > returned through the function "construct_results(get_just_xml_data)".
> >> >
> >> > The data that is returned is in the format as above. 
> >> > "construct_headers(get_just_xml_data)" returns a list of headers. 
> >> > Writing out the row for "headers_list" works.
> >> >
> >> > The list comprehension "data" is to maintain the integrity of the column 
> >> > headers and the values for each new instance of the data structure 
> >> > (where the keys in the dictionary are the headers and values - row 
> >> > instances). The keys in this specific data structure are meant to check 
> >> > if there is a value instance, and if there is not - place an ''.
> >> >
> >> > def write_to_csv(results, headers):
> >> >
> >> >      headers = construct_headers(get_just_xml_data)
> >> >      results = construct_results(get_just_xml_data)
> >> >      headers_list = list(headers)
> >> >
> >> >      with open('real_csv_output.csv', 'wt') as f:
> >> >          writer = csv.writer(f)
> >> >          writer.writerow(headers_list)
> >> >          for row in results:
> >> >              data = [row.get(index, '') for index in results]
> >> >          writer.writerow(data)
> >> >
> >> >
> >> >
> >> > However, when I run this, I receive this error:
> >> >
> >> > ---------------------------------------------------------------------------
> >> > TypeError                                 Traceback (most recent call 
> >> > last)
> >> > <ipython-input-747-7746797fc9a5> in <module>()
> >> > ----> 1 write_to_csv(results, headers)
> >> >
> >> > <ipython-input-746-c822437eeaf0> in write_to_csv(results, headers)
> >> >        9         writer.writerow(headers_list)
> >> >       10         for item in results:
> >> > ---> 11             data = [item.get(index, '') for index in results]
> >> >       12         writer.writerow(data)
> >> >
> >> > <ipython-input-746-c822437eeaf0> in <listcomp>(.0)
> >> >        9         writer.writerow(headers_list)
> >> >       10         for item in results:
> >> > ---> 11             data = [item.get(index, '') for index in results]
> >> >       12         writer.writerow(data)
> >> >
> >> > TypeError: unhashable type: 'dict'
> >> >
> >> >
> >> > I have done some research, namely, the following:
> >> >
> >> > https://mail.python.org/pipermail//tutor/2011-November/086761.html
> >> >
> >> > http://stackoverflow.com/questions/27435798/unhashable-type-dict-type-error
> >> >
> >> > http://stackoverflow.com/questions/1957396/why-dict-objects-are-unhashable-in-python
> >> >
> >> > However, I am still perplexed by this error. Any feedback is welcomed. 
> >> > Thank you.
> >> >
> >> You're taking the index values from 'results' instead of 'headers'.
> >
> > Would you be able to elaborate on this? I partially understand what you 
> > mean. However, each dictionary (of results) has the same keys to map to 
> > (aka, headers when written out to CSV). I am wondering if you would be able 
> > to explain how the index is being used in this case?
> >
> In the list comprehension on line 11, you have "item.get(index, '')".
> 
> What is 'index'?
> 
> You have "for index in results" in the list comprehension, and 'results' 
> is a list of dicts, therefore 'index' is a _dict_.
> 
> That means that you're trying to look up an entry in the 'item' dict
> using a _dict_ as the key.
> 
> Oh, and incidentally, line 12 should be indented to the same level as
> line 11.

Yes, as mentioned in my OP, please forgive formatting issues with indentation:

I feel that I need to provide some context to avoid any confusion over my 
motivations for choosing to do something. 

My original task was to parse an XML data structure stored in a CSV file with 
other data types and then add the elements back as headers and the text as row 
values. I went back to drawing board and creating a "results" list of 
dictionaries where the keys have values as lists using this. 

def convert_list_to_dict(get_just_xml_data): 
    d = {} 
    for item in get_just_xml_data(get_all_data): 
        for k, v in item.items(): 
            try: 
                d[k].append(v) 
            except KeyError: 
                d[k] = [v] 
    return d 

This creates a dictionary for each XML tag - for example: 
{ 
 'Number1': ['0'], 
 'Number2': ['0'], 
 'Number3': ['0'], 
 'Number4': ['0'], 
 'Number5': ['0'], 
 'RepgenName': [None], 
 'RTpes': ['Execution', 'Letters'], 
 'RTID': ['3', '5']} 


I then used this to create a "headers" set (to prevent duplicates to be added) 
and the list of dictionaries that I mentioned in my OP. 


I achieve this via: 

#just headers 
def construct_headers(convert_list_to_dict): 
    header = set() 
    with open('real.csv', 'rU') as infile: 
              reader = csv.DictReader(infile) 
              for row in reader: 
                    xml_data = convert_list_to_dict(get_just_xml_data) 
#get_just_xml_data(get_all_data) 
                    row.update(xml_data)                                 
                    header.update(row.keys()) 
    return header 

#get all of the results 
def construct_results(convert_list_to_dict): 
    header = set() 
    results = [] 
    with open('real.csv', 'rU') as infile: 
              reader = csv.DictReader(infile) 
              for row in reader: 
                    xml_data = convert_list_to_dict(get_just_xml_data) 
#get_just_xml_data(get_all_data) 
#                     print(row) 
                    row.update(xml_data) 
#                     print(row) 
                    results.append(row) 
#                     print(results) 
                    header.update(row.keys()) 
#     print(type(results)) 
    return results 


I guess I am using the headers list originally written out. My initial thought 
is to just write out the values corresponding with each transaction. For 
example, citing this data structure: 

{ 
 'Number1': ['0'], 
 'Number2': ['0'], 
 'Number3': ['0'], 
 'Number4': ['0'], 
 'Number5': ['0'], 
 'RPN': [None], 
 'RTypes': ['Execution', 'Letters'], 
 'RTID': ['3', '5']} 


I would get a CSV 

Number1, Number2, Number3, Number4, Number5, RPN, RTypes,RTID 

0, 0, 0, 0, 0, None, Execution, 3 
None, None, None,None,None, Letters, 5 

I am wondering how I would achieve this when all of the headers set is not 
sorted (should I do so before writing this out?). Also, since I have millions 
of transactions I want to make sure that the values for each of the headers is 
sequentially placed. Any guidance would be very helpful. Thanks.
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to