PostgreSQL 8.3.10 (on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46)) OS: Linux Redhat EL 5.4 Database encoding: LATIN9
Using the default tsearch configuration, for 'english', text is being wrongly parsed into the tsvector type. The fail condition is shown with the following example, using the ts_headline function to highlight the issue. SELECT ts_headline('english', 'The annual financial report will shortly be posted on the Company’s web-site at <span lang="EN-GB">http://www.harewoodsolutions.co.uk/press.aspx</span><span lang="EN-GB" style=""></span><span style=""> and a further announcement will be made once the annual financial report is available to be downloaded. </span>', to_tsquery(''), 'MaxWords=101, MinWords=100'); Output: "The annual financial report will shortly be posted on the Company’s web-site at http://www.harewoodsolutions.co.uk/press.aspx</span><span lang="EN-GB" style=""> and a further announcement will be made once the annual financial report is available to be downloaded. " Expected output: "The annual financial report will shortly be posted on the Company’s web-site at http://www.harewoodsolutions.co.uk/press.aspx and a further announcement will be made once the annual financial report is available to be downloaded. " Regards Donald Fraser