Write tables from Word (.docx) to Excel (.xlsx) using xlsxwriter

2020-05-27 Thread BBT
I am trying to parse a word (.docx) for tables, then copy these tables over to 
excel using xlsxwriter. This is my code:

from docx.api import Document
import xlsxwriter
 
document = Document('/Users/xxx/Documents/xxx/Clauses Sample - Copy v1 - for 
merge.docx')
tables = document.tables
 
wb = xlsxwriter.Workbook('C:/Users/xxx/Documents/xxx/test clause 
retrieval.xlsx')
Sheet1 = wb.add_worksheet("Compliance")
index_row = 0
 
print(len(tables))
 
for table in document.tables:
data = []
keys = None
for i, row in enumerate(table.rows):
text = (cell.text for cell in row.cells)
 
if i == 0:
keys = tuple(text)
continue
row_data = dict(zip(keys, text))
data.append(row_data)
#print (data)
#big_data.append(data)
Sheet1.write(index_row,0, str(row_data))  
index_row = index_row + 1
 
print(row_data)
 
wb.close()


This is my desired output: https://i.stack.imgur.com/9qnbw.png

However, here is my actual output: https://i.stack.imgur.com/vpXej.png

I am aware that my current output produces a list of string instead.

Is there anyway that I can get my desired output using xlsxwriter?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Write tables from Word (.docx) to Excel (.xlsx) using xlsxwriter

2020-05-27 Thread BBT
On Thursday, 28 May 2020 01:36:26 UTC+8, Peter Otten  wrote:
> BBT wrote:
> 
> > I am trying to parse a word (.docx) for tables, then copy these tables
> > over to excel using xlsxwriter. This is my code:
> > 
> > from docx.api import Document
> > import xlsxwriter
> >  
> > document = Document('/Users/xxx/Documents/xxx/Clauses Sample - Copy v1 -
> > for merge.docx') tables = document.tables
> >  
> > wb = xlsxwriter.Workbook('C:/Users/xxx/Documents/xxx/test clause
> > retrieval.xlsx') Sheet1 = wb.add_worksheet("Compliance")
> > index_row = 0
> >  
> > print(len(tables))
> >  
> > for table in document.tables:
> > data = []
> > keys = None
> > for i, row in enumerate(table.rows):
> > text = (cell.text for cell in row.cells)
> >  
> > if i == 0:
> > keys = tuple(text)
> > continue
> > row_data = dict(zip(keys, text))
> > data.append(row_data)
> > #print (data)
> > #big_data.append(data)
> > Sheet1.write(index_row,0, str(row_data))
> > index_row = index_row + 1
> >  
> > print(row_data)
> >  
> > wb.close()
> > 
> > 
> > This is my desired output: https://i.stack.imgur.com/9qnbw.png
> > 
> > However, here is my actual output: https://i.stack.imgur.com/vpXej.png
> > 
> > I am aware that my current output produces a list of string instead.
> > 
> > Is there anyway that I can get my desired output using xlsxwriter?
> 
> I had to simulate docx.api. With that caveat the following seems to work:
> 
> import xlsxwriter
>  
> # begin simulation of
> # from docx.api import Document
> 
> class Cell:
> def __init__(self, text):
> self.text = text
> 
> class Row:
> def __init__(self, cells):
> self.cells = [Cell(c) for c in cells]
> 
> class Table:
> def __init__(self, data):
> self.rows = [
> Row(row) for row in data
> ]
> 
> class Document:
> def __init__(self):
> self.tables = [
> Table([
> ["Hello", "Test"],
> ["est", "ing"],
> ["gg", "ff"]
> ]),
> Table([
> ["Foo", "Bar", "Baz"],
> ["ham", "spam", "jam"]
> ])
> ]
> 
> document = Document()
> 
> # end simulation
> 
> wb = xlsxwriter.Workbook("tmp.xlsx")
> sheet = wb.add_worksheet("Compliance")
>  
> offset = 0
> for table in document.tables:
> for y, row in enumerate(table.rows):
> for x, cell in enumerate(row.cells):
> sheet.write(y + offset, x, cell.text)
> offset +=  len(table.rows) + 1  # one empty row between tables
> 
> wb.close()


Hi Peter, thank you for your efforts :) 

However, what if there are many tables in the word document, it would be 
tedious to have to code the texts in the tables one by one. Can I instead, call 
on the word document and let Python do the parsing for tables and its contents? 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Write tables from Word (.docx) to Excel (.xlsx) using xlsxwriter

2020-05-27 Thread BBT
On Thursday, 28 May 2020 02:40:49 UTC+8, Peter Otten  wrote:
> BBT wrote:
> 
> > On Thursday, 28 May 2020 01:36:26 UTC+8, Peter Otten  wrote:
> >> BBT wrote:
> >> 
> >> > I am trying to parse a word (.docx) for tables, then copy these tables
> >> > over to excel using xlsxwriter. This is my code:
> >> > 
> >> > from docx.api import Document
> >> > import xlsxwriter
> >> >  
> >> > document = Document('/Users/xxx/Documents/xxx/Clauses Sample - Copy v1
> >> > - for merge.docx') tables = document.tables
> >> >  
> >> > wb = xlsxwriter.Workbook('C:/Users/xxx/Documents/xxx/test clause
> >> > retrieval.xlsx') Sheet1 = wb.add_worksheet("Compliance")
> >> > index_row = 0
> >> >  
> >> > print(len(tables))
> >> >  
> >> > for table in document.tables:
> >> > data = []
> >> > keys = None
> >> > for i, row in enumerate(table.rows):
> >> > text = (cell.text for cell in row.cells)
> >> >  
> >> > if i == 0:
> >> > keys = tuple(text)
> >> > continue
> >> > row_data = dict(zip(keys, text))
> >> > data.append(row_data)
> >> > #print (data)
> >> > #big_data.append(data)
> >> > Sheet1.write(index_row,0, str(row_data))
> >> > index_row = index_row + 1
> >> >  
> >> > print(row_data)
> >> >  
> >> > wb.close()
> >> > 
> >> > 
> >> > This is my desired output: https://i.stack.imgur.com/9qnbw.png
> >> > 
> >> > However, here is my actual output: https://i.stack.imgur.com/vpXej.png
> >> > 
> >> > I am aware that my current output produces a list of string instead.
> >> > 
> >> > Is there anyway that I can get my desired output using xlsxwriter?
> >> 
> >> I had to simulate docx.api. With that caveat the following seems to work:
> >> 
> >> import xlsxwriter
> >>  
> >> # begin simulation of
> >> # from docx.api import Document
> >> 
> >> class Cell:
> >> def __init__(self, text):
> >> self.text = text
> >> 
> >> class Row:
> >> def __init__(self, cells):
> >> self.cells = [Cell(c) for c in cells]
> >> 
> >> class Table:
> >> def __init__(self, data):
> >> self.rows = [
> >> Row(row) for row in data
> >> ]
> >> 
> >> class Document:
> >> def __init__(self):
> >> self.tables = [
> >> Table([
> >> ["Hello", "Test"],
> >> ["est", "ing"],
> >> ["gg", "ff"]
> >> ]),
> >> Table([
> >> ["Foo", "Bar", "Baz"],
> >> ["ham", "spam", "jam"]
> >> ])
> >> ]
> >> 
> >> document = Document()
> >> 
> >> # end simulation
> >> 
> >> wb = xlsxwriter.Workbook("tmp.xlsx")
> >> sheet = wb.add_worksheet("Compliance")
> >>  
> >> offset = 0
> >> for table in document.tables:
> >> for y, row in enumerate(table.rows):
> >> for x, cell in enumerate(row.cells):
> >> sheet.write(y + offset, x, cell.text)
> >> offset +=  len(table.rows) + 1  # one empty row between tables
> >> 
> >> wb.close()
> > 
> > 
> > Hi Peter, thank you for your efforts :)
> > 
> > However, what if there are many tables in the word document, it would be
> > tedious to have to code the texts in the tables one by one. Can I instead,
> > call on the word document and let Python do the parsing for tables and its
> > contents?
> 
> I don't understand. You have docx.api available, so you can replace
> 
> the "simulation" part of my example with just these two lines: 
> 
> from docx.api import Document
> document = Document('/Users/xxx/Documents/xxx/Clauses Sample - Copy v1 - for 
> merge.docx')
> 
> I should work -- it's just that I cannot be sure it will work because I could
> not test it.

I tried your code by replacing the Document portion:

import x

Re: Write tables from Word (.docx) to Excel (.xlsx) using xlsxwriter

2020-05-27 Thread BBT
On Thursday, 28 May 2020 02:40:49 UTC+8, Peter Otten  wrote:
> BBT wrote:
> 
> > On Thursday, 28 May 2020 01:36:26 UTC+8, Peter Otten  wrote:
> >> BBT wrote:
> >> 
> >> > I am trying to parse a word (.docx) for tables, then copy these tables
> >> > over to excel using xlsxwriter. This is my code:
> >> > 
> >> > from docx.api import Document
> >> > import xlsxwriter
> >> >  
> >> > document = Document('/Users/xxx/Documents/xxx/Clauses Sample - Copy v1
> >> > - for merge.docx') tables = document.tables
> >> >  
> >> > wb = xlsxwriter.Workbook('C:/Users/xxx/Documents/xxx/test clause
> >> > retrieval.xlsx') Sheet1 = wb.add_worksheet("Compliance")
> >> > index_row = 0
> >> >  
> >> > print(len(tables))
> >> >  
> >> > for table in document.tables:
> >> > data = []
> >> > keys = None
> >> > for i, row in enumerate(table.rows):
> >> > text = (cell.text for cell in row.cells)
> >> >  
> >> > if i == 0:
> >> > keys = tuple(text)
> >> > continue
> >> > row_data = dict(zip(keys, text))
> >> > data.append(row_data)
> >> > #print (data)
> >> > #big_data.append(data)
> >> > Sheet1.write(index_row,0, str(row_data))
> >> > index_row = index_row + 1
> >> >  
> >> > print(row_data)
> >> >  
> >> > wb.close()
> >> > 
> >> > 
> >> > This is my desired output: https://i.stack.imgur.com/9qnbw.png
> >> > 
> >> > However, here is my actual output: https://i.stack.imgur.com/vpXej.png
> >> > 
> >> > I am aware that my current output produces a list of string instead.
> >> > 
> >> > Is there anyway that I can get my desired output using xlsxwriter?
> >> 
> >> I had to simulate docx.api. With that caveat the following seems to work:
> >> 
> >> import xlsxwriter
> >>  
> >> # begin simulation of
> >> # from docx.api import Document
> >> 
> >> class Cell:
> >> def __init__(self, text):
> >> self.text = text
> >> 
> >> class Row:
> >> def __init__(self, cells):
> >> self.cells = [Cell(c) for c in cells]
> >> 
> >> class Table:
> >> def __init__(self, data):
> >> self.rows = [
> >> Row(row) for row in data
> >> ]
> >> 
> >> class Document:
> >> def __init__(self):
> >> self.tables = [
> >> Table([
> >> ["Hello", "Test"],
> >> ["est", "ing"],
> >> ["gg", "ff"]
> >> ]),
> >> Table([
> >> ["Foo", "Bar", "Baz"],
> >> ["ham", "spam", "jam"]
> >> ])
> >> ]
> >> 
> >> document = Document()
> >> 
> >> # end simulation
> >> 
> >> wb = xlsxwriter.Workbook("tmp.xlsx")
> >> sheet = wb.add_worksheet("Compliance")
> >>  
> >> offset = 0
> >> for table in document.tables:
> >> for y, row in enumerate(table.rows):
> >> for x, cell in enumerate(row.cells):
> >> sheet.write(y + offset, x, cell.text)
> >> offset +=  len(table.rows) + 1  # one empty row between tables
> >> 
> >> wb.close()
> > 
> > 
> > Hi Peter, thank you for your efforts :)
> > 
> > However, what if there are many tables in the word document, it would be
> > tedious to have to code the texts in the tables one by one. Can I instead,
> > call on the word document and let Python do the parsing for tables and its
> > contents?
> 
> I don't understand. You have docx.api available, so you can replace
> 
> the "simulation" part of my example with just these two lines: 
> 
> from docx.api import Document
> document = Document('/Users/xxx/Documents/xxx/Clauses Sample - Copy v1 - for 
> merge.docx')
> 
> I should work -- it's just that I cannot be sure it will work because I could
> not test it.

I tried your code by replacing the Document portion:

impo

Re: Write tables from Word (.docx) to Excel (.xlsx) using xlsxwriter

2020-05-28 Thread BBT
On Thursday, 28 May 2020 03:07:48 UTC+8, Peter Otten  wrote:
> BBT wrote:
> 
> > I tried your code by replacing the Document portion:
> 
> > But I received an error:
> > TypeError: __init__() takes 1 positional argument but 2 were given
> 
> We seem to have different ideas of what replacing means. 
> Here is the suggested script spelt out:
> 
> import xlsxwriter
>  
> from docx.api import Document
> document = Document('/Users/xxx/Documents/xxx/Clauses Sample - Copy v1 - for 
> merge.docx')
> 
> wb = xlsxwriter.Workbook('C:/Users/xxx/Documents/xx/test clause 
> retrieval.xlsx')
> sheet = wb.add_worksheet("Compliance")
>  
> offset = 0
> for table in document.tables:
> for y, row in enumerate(table.rows):
> for x, cell in enumerate(row.cells):
> sheet.write(y + offset, x, cell.text)
> offset +=  len(table.rows) + 1  # one empty row between tables
> 
> wb.close()

Yes! That works well, thank you Peter! And I'm impressed by your 
object-oriented simulation too. Would look into it again :)
-- 
https://mail.python.org/mailman/listinfo/python-list