Something I didn't see mentioned of, does your data need to be made available at real time? Just because you're sampling 20/s doesn't have to mean that the data is made available at 20 samples per second or even 1 sample per 1/20th of a second. I mention this because you might find that it's a little more resource friendly to stuff many samples into the same transaction and commit them together. As an example, take 60-seconds (1200 samples) worth of data and send it to the database while continuing to sample your data. I mention this because it may allow for more flexibility in your implementation.
At any rate, I agree with the other comments. Maintenance issues are probably going to be the gotcha if you're not forward looking. Regards, Greg Copeland On Sun, 2003-03-02 at 19:33, Mark Jones wrote: > > The real question is, the data collection is in real-time, but can you > > have a maintenance window (6 hours a week?) to do things like REINDEX? > > The database has to have the last 24 hours of data online and be acessable 24 > hours a day 7 days a week 365 days a year basicly no downtime. My application > is weather related system, which must be avaliable at any moment. > > > Will the data purge be spread over time or come in large lumps? > > The database that is online would only contain the last 24 hours of data. > Once 24 hours of data has been collected the data would begin to be purged say > purging of 1 hours worth of data every hour. > > > But if it's 20 packets with parts being inserted into 50 tables (now > > 1000 inserts / second) things could become a little trickier -- but > > still easily doable. > > No there will be only one table with each packet having its own record. > Each packet contains a few header fields that I would like to be able to > search on. The rest of the data is purely binary data that goes along with > the header. I no intertest in searching through it only being able to retrive > it based on its header. One important way of retriving data would be based on > time saying that every packet has a timestamp, I would like to select a range > of records out of the database. There maybe cases were I want to stream the > data back out just the way as it was recorded. > > > > Can you use a parallel loader or does it need to be loaded sequentially? > > I am not exactly sure what you mean by parallel loader but the data would be > loaded sequentially into the database just as I receive it from my external > device. As mentioned above I may want to be able to sequentially read records > from the database to be able to recreate the data stream that I received and > recored. I would need to be able to read the records back fast enough to > replay the at the same rate at which I received them. > > > Thanks Again > Mark > > > On Sunday 02 March 2003 06:35 pm, Rod Taylor wrote: > > On Sun, 2003-03-02 at 18:52, Mark Jones wrote: > > > Hello > > > > > > I am working on a project that acquires real-time data from an external > > > device that I need to store and be able to search through and retrieve > > > quickly. My application receives packets of data ranging in size from 300 > > > to 5000 bytes every 50 milliseconds for the minimum duration of 24 hours > > > before the data is purged or archived off disk. There are several fields > > > in the data that I like to be able to search on to retrieve the data at > > > later time. By using a SQL database such as Postgresql or Mysql it seams > > > that it would make this task much easier. My questions are, is a SQL > > > database such as Postgresql able to handle this kind of activity saving a > > > record of 5000 bytes at rate of 20 times a second, also how well will it > > > perform at searching through a database which contains nearly two million > > > records at a size of about 8 - 9 gigabytes of data, assuming that I have > > > adequate computing hardware. I am trying to determine if a SQL database > > > would work well for this or if I need to write my own custom database for > > > this project. If anyone has any experience in doing anything similar with > > > Postgresql I would love to know about your findings. > > > > Not such a big load depending on the complexity of the queries being > > performed. From the sounds of it, they're pretty simple. > > > > The real question is, the data collection is in real-time, but can you > > have a maintenance window (6 hours a week?) to do things like REINDEX? > > > > Can you use a parallel loader or does it need to be loaded sequentially? > > > > Will the data purge be spread over time or come in large lumps? > > > > > > Simple queries on 2 million records happen in low double digit > > milliseconds on relatively low end hardware with a decent index -- but > > complex queries can take minutes on 1/10000th the dataset. > > > > 20 inserts / second shouldn't be difficult to achieve even on the lowest > > end hardware -- my laptop can do that -- giving each packet it's own > > transaction. > > > > But if it's 20 packets with parts being inserted into 50 tables (now > > 1000 inserts / second) things could become a little trickier -- but > > still easily doable. > > > > The most important question is the maintenance window. -- Greg Copeland <[EMAIL PROTECTED]> Copeland Computer Consulting ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org