On 31.7.2012, at 22.46, to...@starbridge.org wrote:

>>> 21500/59363doveadm(c...@spamguard.fr): Error: fts_solr: Invalid XML
>>> input at line 1: mismatched tag
>> No idea. You can reproduce this? What does it log with this patch? 
>> http://hg.dovecot.org/dovecot-2.1/rev/817b69b2b21f
> 
> It happens every time on the same mailboxes (very few) around the same
> uid number (I think I can find the exact uid with strace and send the
> email message to you if it helps)
> 
> catalina.out show this at this time:
> 
> INFO: {} 0 1
> 31 juil. 2012 21:19:56 org.apache.solr.common.SolrException log
> GRAVE: org.apache.solr.common.SolrException: Illegal character
> ((CTRL-CHAR, code 4))
..
> After a quick google search , it seems related to invalid Control
> Character sent to SOLR.

So it seems, but Dovecot already has code to filter out all control characters 
when sending data to Solr. I just looked through the source and did a few tests 
and I couldn't get it to send a control char to Solr.

> I've applied your last patch and the message is now:
> Error: fts_solr: Invalid XML input at 4:113: mismatched tag (near:
> <html><head><title>Apache Tomcat/6.0.35 - Rapport
> d'erreur</title><style><!--H1
> {font-family:Tahoma,Arial,sans-serif;color:white)

I don't get this either. Instead I get a clean error (if I explicitly change 
the code to allow control chars):

Jul 31 23:41:14 indexer-worker(tss 16345 ): Error: fts_solr: Indexing failed: 
400 Illegal character ((CTRL-CHAR, code 4))  at [row,col {unknown-source}]: 
[858,254]

Reply via email to