[ 
https://issues.apache.org/jira/browse/TIKA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539007#comment-17539007
 ] 

ASF GitHub Bot commented on TIKA-1735:
--------------------------------------

monkmachine commented on PR #558:
URL: https://github.com/apache/tika/pull/558#issuecomment-1130370795

   If I read byte by byte (i.e. byte[] bytes = new byte[1];) I get the correct 
result:
   
![image](https://user-images.githubusercontent.com/36521886/169118333-e9a5509e-8fb4-4b28-9be4-6d326a03059a.png)
   
   If I read with anything other than byte by byte I get added bytes/strings 
from some other part of the file:
   
![image](https://user-images.githubusercontent.com/36521886/169118508-6bd9559c-ffe9-4146-a74b-38141c585fbc.png)
   
   It's only doing it on this one json file, every time I can reproduce it 
every time on this one.
   
   
   ``` @Test
       public void jsonConvert() throws FileNotFoundException, IOException {
   
   
   
         try (FileInputStream fis = new 
FileInputStream("c:\\temp1\\dwgreadout.json");
                    FileOutputStream fos = new 
FileOutputStream("c:\\temp1\\dwgreadoutClean.json")) {
                byte[] bytes = new byte[1000];
                while (fis.read(bytes) != -1) {
                    byte[] fixedBytes = new String(bytes, 
StandardCharsets.UTF_8)
                                
                            //.replaceAll(dwgc.getCleanDwgReadRegexToReplace(), 
dwgc.getCleanDwgReadReplaceWith())
                            //.replaceAll(" nan ", " 0 ")
                            //.replaceAll(" nan,", " 0,")
                            .getBytes(StandardCharsets.UTF_8);
                    String st = new String(fixedBytes, StandardCharsets.UTF_8);
                    fos.write(fixedBytes, 0, fixedBytes.length);
                    
                    
                }
            } 
       }




> Unsupported AutoCAD drawing version: AC1027
> -------------------------------------------
>
>                 Key: TIKA-1735
>                 URL: https://issues.apache.org/jira/browse/TIKA-1735
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Luca Perico
>            Priority: Major
>         Attachments: testDWG-AC1027.dwg
>
>
> Trying to index .dwg file (version AC1027) I get 500 error response. 
> "<?xml version=""1.0"" encoding=""UTF-8""?>
> <response>
> <lst name=""responseHeader""><int name=""status"">500</int><int 
> name=""QTime"">3</int></lst><lst name=""error""><str A1:F378 Unsupported 
> AutoCAD drawing version: AC1027</str><str 
> name=""trace"">org.apache.solr.common.SolrException: 
> org.apache.tika.exception.TikaException: Unsupported AutoCAD drawing version: 
> AC1027
>       at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227)
>       at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>       at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>       at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
>       at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
>       at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
>       at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
>       at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>       at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>       at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>       at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>       at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>       at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>       at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>       at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>       at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>       at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>       at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>       at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>       at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>       at org.eclipse.jetty.server.Server.handle(Server.java:497)
>       at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>       at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>       at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>       at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>       at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.tika.exception.TikaException: Unsupported AutoCAD 
> drawing version: AC1027
>       at org.apache.tika.parser.dwg.DWGParser.parse(DWGParser.java:131)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:256)
>       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>       at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:221)
>       ... 27 more
> </str><int name=""code"">500</int></lst>
> </response>"



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to