[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16178402#comment-16178402
 ] 

ASF GitHub Bot commented on TIKA-2400:
--------------------------------------

thammegowda commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r140670497
 
 

 ##########
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 ##########
 @@ -140,29 +133,17 @@ public synchronized void parse(InputStream stream, 
ContentHandler handler, Metad
             for (RecognisedObject object : objects) {
                 if (object instanceof CaptionObject) {
                     if (xhtmlStartVal == null) xhtmlStartVal = "captions";
-                    LOG.debug("Add {}", object);
-                    String mdValue = String.format(Locale.ENGLISH, "%s (%.5f)",
-                            object.getLabel(), object.getConfidence());
-                    metadata.add(MD_KEY_IMG_CAP, mdValue);
-                    acceptedObjects.add(object);
+                    String mdVal = String.format(Locale.ENGLISH, "%s (%.5f)", 
object.getLabel(), object.getConfidence());
+                    metadata.add(MD_KEY_IMG_CAP, mdVal);
                     xhtmlIds.add(String.valueOf(count++));
                 } else {
                     if (xhtmlStartVal == null) xhtmlStartVal = "objects";
-                    if (object.getConfidence() >= minConfidence) {
-                        count++;
-                        LOG.info("Add {}", object);
-                        String mdValue = String.format(Locale.ENGLISH, "%s 
(%.5f)",
-                                object.getLabel(), object.getConfidence());
-                        metadata.add(MD_KEY_OBJ_REC, mdValue);
-                        acceptedObjects.add(object);
-                        xhtmlIds.add(object.getId());
-                        if (count >= topN) {
-                            break;
-                        }
-                    } else {
-                        LOG.warn("Object {} confidence {} less than min {}", 
object, object.getConfidence(), minConfidence);
-                    }
+                    String mdVal = String.format(Locale.ENGLISH, "%s (%.5f)", 
object.getLabel(), object.getConfidence());
+                    metadata.add(MD_KEY_OBJ_REC, mdVal);
+                    xhtmlIds.add(object.getId());
                 }
+                LOG.info("Add {}", object);
 
 Review comment:
   > will be great if you can remove String concatenation from 
RecognisedObject.toString to use StringBuffer or String format 
   
   If you suggested this for performance gain, Let's take a deeper look.  
`RecognisedObject.toString()` does not run over a loop. Its just one giant 
concatenation with `+`. I remember reading somewhere that JDK can easily 
optimize such statement, but I couldn't find the source of this knowledge now 
so I am giving you this test : 
   ```java
   class Main {
     
     public static long concat(int n){
       long st = System.nanoTime();
       for (int i = 0; i < n; i++) {
          String s = "a" + "b" + "c" + "d" + "e" + "f" +
                     "g" + "h" + "i" + "j" +"k";
       }
       return System.nanoTime() - st;
     }
     
     public static long builder(int n){
       long st = System.nanoTime();
       for (int i = 0; i < n; i++) {
         String s = new StringBuilder().append("a").append("b")
               .append("c").append("d").append("e").append("f")
               .append("g").append("h").append("i").append("j")
               .append("k").toString();
       }
       return System.nanoTime() - st;
     }
     
     public static void main(String[] args) {
       int n = 1_000_000;
       System.out.printf("Builder Time in ns : %10d\n", builder(n));
       System.out.printf(" Concat Time in ns : %10d\n", concat(n));
     }
   }
   ```
   I ran it on  https://repl.it/languages/java
   
   ```
   java version "1.8.0_31"
   Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
   Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
      
   Builder Time in ns :   50614748
    Concat Time in ns :    2500615
   ```
   see, it's in fact better!!
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -----------------------------------------------------
>
>                 Key: TIKA-2400
>                 URL: https://issues.apache.org/jira/browse/TIKA-2400
>             Project: Tika
>          Issue Type: Sub-task
>          Components: parser
>            Reporter: Thejan Wijesinghe
>            Priority: Minor
>             Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to