Madhu Reddy Wrote: > We are trying to load date into teradata [which is > data warehousing, stores Terabytes of data, and which > is 10 times faster than any other database..)
Data warehousing is always an exciting subject! However, I'd be surprised to see this kind of performance increase. A major factor in database performance is the database design. Many database designers do not know how to build data warehouses, they are stuck on normal relational concepts. Anyway, sorry to be off topic... I just can't turn down a database debate! :) > before loading data into Teradata, we need to do some > massaging on data..basically eliminating..duplicate > rows and invalid rows... I don't know anything about the Teradata database system, but I know how I would do this on other systems: 1. Load the data as it is into a temporary database 2. Do a select (or a report), returning unique (distinct) rows. This same select could also filter out your invalid rows and massage data. 3. Load the result of the select into the final database. If you are really looking to do this with Perl, I guess you load the data into a hash, sort it, and then print the unique values. I have no idea how long this would take to run, but the code would be fairly straight-forward: Just load the data into a hash using each column as a key. Then sort the hash (this may take a little while). Finally, write a conditional that cycles through the hash, checking the first key. If the hash record you last read is the same as the current one, don't print it to a file. Otherwise, do print it to a file. At this point you could also do some formatting, etc. I guess it all just depends on which you are more comfortable with. Hope this helps, Jared -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]