Ah, sorry. Just saw the bit about the free text query too.

A FieldCache is the answer here I suspect in order to quickly retrieve the date 
values for arbitrary queries.



----- Original Message ----
From: mark harwood <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Friday, 10 October, 2008 10:40:32
Subject: Re: Buzz measurement - Aggregate functions

Assuming your date data is held as YYYYMMDD and you want daily totals....

        Term startTerm=new Term("date","20080101");
        TermEnum termEnum = indexReader.terms(startTerm);
        do
        {
            Term currentTerm = termEnum.term();
            if(currentTerm.field()!=startTerm.field())
            {
                break;
            }
            System.out.println(currentTerm+" "+termEnum.docFreq());
        }while(termEnum.next());

Should be plenty fast but if you need to avoid counting any deleted docs you'll 
need to look at using "TermDocs" in this loop (or optimize your index in 
advance)

Cheers,
Mark



----- Original Message ----
From: Marcus Herou <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Friday, 10 October, 2008 10:12:35
Subject: Buzz measurement - Aggregate functions

Hi.

Anyone have an idea of how I would create a query which finds the data
backing a trend graph where date is X and num(docs) is on Y axis ?

This is quite a common use case in "buzz" analysis and currently I'm doing a
stupid query which iterates over the date range and queries lucene for every
date. Not very fast and not very flexible.

More specifically something like this but I need to add free text query as
well and then I cannot use MySQL for performance reasons. Any ideas ?

--clip--
mysql> select count(id) as Y,publishDate as X from FeedItem where
publishDate between "2008-08-01" and "2008-08-31" group by DAY(publishDate)
order by publishDate asc;
+-------+---------------------+
| Y     | X                   |
+-------+---------------------+
| 26663 | 2008-08-01 00:00:00 |
| 22478 | 2008-08-02 00:00:00 |
| 25745 | 2008-08-03 00:00:00 |
| 30576 | 2008-08-04 00:00:00 |
| 31351 | 2008-08-05 00:00:00 |
| 31084 | 2008-08-06 00:00:00 |
| 31245 | 2008-08-07 00:00:00 |
| 29518 | 2008-08-08 00:00:00 |
| 26001 | 2008-08-09 00:00:00 |
| 28687 | 2008-08-10 00:00:00 |
| 32957 | 2008-08-11 00:00:00 |
| 33251 | 2008-08-12 00:00:00 |
| 33062 | 2008-08-13 00:00:00 |
| 33960 | 2008-08-14 00:00:00 |
| 31034 | 2008-08-15 00:00:00 |
| 26726 | 2008-08-16 00:00:00 |
| 27543 | 2008-08-17 00:00:00 |
| 36887 | 2008-08-18 00:00:00 |
| 35376 | 2008-08-19 00:00:00 |
| 34573 | 2008-08-20 00:00:00 |
| 33889 | 2008-08-21 00:00:00 |
| 30604 | 2008-08-22 00:00:00 |
| 26875 | 2008-08-23 00:00:00 |
| 27356 | 2008-08-24 00:00:00 |
| 33438 | 2008-08-25 00:00:00 |
| 33102 | 2008-08-26 00:00:00 |
| 31720 | 2008-08-27 00:00:00 |
| 26133 | 2008-08-28 00:00:00 |
| 22781 | 2008-08-29 00:00:00 |
| 20198 | 2008-08-30 00:00:00 |
|    20 | 2008-08-31 00:00:00 |
+-------+---------------------+


-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
[EMAIL PROTECTED]
http://www.tailsweep.com/
http://blogg.tailsweep.com/





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to