[ https://issues.apache.org/jira/browse/TIKA-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17412016#comment-17412016 ]
Tim Allison commented on TIKA-3544: ----------------------------------- Y, I just tried bumping 15->16, and we get this output: <tr> <td>Credit Card Numbers (Source: http://www.getcreditcardnumbers.com/)</td></tr> <tr> <td>6480195344642780</td></tr> <tr> <td>30295201231669</td></tr> <tr> <td>30082494556063</td></tr> <tr> <td>344850003945824</td></tr> <tr> <td>3583387923333630</td></tr> <tr> <td>3587385370593640</td></tr> > Extraction of long sequences of digits from Excel spreadsheets using Tika > 1.20 doesn’t yield the expected results > ----------------------------------------------------------------------------------------------------------------- > > Key: TIKA-3544 > URL: https://issues.apache.org/jira/browse/TIKA-3544 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.20 > Reporter: Jitin Jindal > Priority: Major > Attachments: Credit Card Numbers.xlsx > > > If an Excel spreadsheet contains a long sequence of digits, such as a credit > card number, Tika 1.13 will emit the said sequence in scientific notation. > For example, the credit card number “6011799905775830” is extracted from the > attached spreadsheet as 6.480195344642784E15, which clearly is not the > desired output. > I think the impact of this issue is significant. There’s plenty of > information that can no longer be reliably extracted from spreadsheets. Think > credit card numbers, telephone numbers and product identifiers to name a few. -- This message was sent by Atlassian Jira (v8.3.4#803005)