The thing is that in the text file there is a column that is something like xyz_12 and in the table i just need the integer part of it that's what the query is used for. The problem though is not really in the select part, because running the select part on 8 million lines takes about 3 minutes, but i don't know why the insert is taking so long.
"Merlin Moncure" <[EMAIL PROTECTED]> wrote on 07/18/06 9:39 pm: > On 7/18/06, Christian Rengstl > <[EMAIL PROTECTED]> wrote: >> now finally after a long time i have the query plan for the whole filled >> table. I hope somebody can explain me why it takes so much longer... >> >> QUERY PLAN >> >> -------------------------------------------------------------------------------- >> -------------------------------------------------------------------------------- >> ----- >> Result (cost=0.09..254643.09 rows=8044000 width=39) (actual >> time=0.043..0.043 >> rows=0 loops=1) >> One-Time Filter: ((((split_part(($1)::text, '_'::text, 2))::smallint = 1) >> IS >> NOT TRUE) AND (((split_part(($2)::text, '_'::text, 2))::smallint = 22) IS >> NOT TR >> UE)) >> InitPlan >> -> Limit (cost=0.00..0.02 rows=1 width=10) (never executed) >> -> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 >> width= >> 10) (never executed) >> -> Limit (cost=0.00..0.02 rows=1 width=10) (actual time=0.006..0.006 >> rows >> =1 loops=1) >> -> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 >> width= >> 10) (actual time=0.004..0.004 rows=1 loops=1) >> -> Limit (cost=0.00..0.02 rows=1 width=10) (never executed) >> -> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 >> width= >> 10) (never executed) >> -> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=39) >> (nev >> er executed) >> Total runtime: 0.238 ms >> >> Result (cost=0.06..254643.06 rows=8044000 width=39) (actual >> time=0.056..655772 >> .273 rows=8044000 loops=1) >> One-Time Filter: ((split_part(($1)::text, '_'::text, 2))::smallint = 1) >> InitPlan >> -> Limit (cost=0.00..0.02 rows=1 width=10) (actual time=0.003..0.003 >> rows >> =1 loops=1) >> -> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 >> width= >> 10) (actual time=0.002..0.002 rows=1 loops=1) >> -> Limit (cost=0.00..0.02 rows=1 width=10) (actual time=0.006..0.007 >> rows >> =1 loops=1) >> -> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 >> width= >> 10) (actual time=0.004..0.004 rows=1 loops=1) >> -> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=39) >> (act >> ual time=0.002..191672.344 rows=8044000 loops=1) >> Total runtime: 62259544.896 ms >> >> Result (cost=0.06..254643.06 rows=8044000 width=39) (actual >> time=1.245..1.245 >> rows=0 loops=1) >> One-Time Filter: ((split_part(($1)::text, '_'::text, 2))::smallint = 22) >> InitPlan >> -> Limit (cost=0.00..0.02 rows=1 width=10) (never executed) >> -> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 >> width= >> 10) (never executed) >> -> Limit (cost=0.00..0.02 rows=1 width=10) (actual time=0.013..0.014 >> rows >> =1 loops=1) >> -> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 >> width= >> 10) (actual time=0.009..0.009 rows=1 loops=1) >> -> Seq Scan on temp_table (cost=0.00..194313.00 rows=8044000 width=39) >> (nev >> er executed) >> Total runtime: 22.270 ms >> (31 Zeilen) >> >> >> >> >>> "Christian Rengstl" <[EMAIL PROTECTED]> 13.07.06 8.37 Uhr >>> >> Good morning list, >> >> the following query takes about 15 to 20 minutes for around 2 million lines >> in the file myfile.txt, but with 8 million lines it takes around 5 hours and >> i just don't understand why there is such a huge discrepancy in performance. >> >> COPY public.temp_table FROM 'myfile.txt' WITH DELIMITER '\t'; >> >> INSERT INTO public.master(pid,smid, val1, val2, chr) >> SELECT pid, smid, val1, val12, CAST(split_part((SELECT chr from >> public.temp_table LIMIT 1), '_', 2) as int2) >> FROM public.temp_table; >> >> INSERT INTO public.values(smid, pos, chr) >> SELECT DISTINCT smid, pos, CAST(split_part((SELECT chr from >> public.temp_table LIMIT 1), '_', 2) as int2) >> FROM public.temp_table; >> > > what is this phrase doing exactly? > CAST(split_part((SELECT chr from public.temp_table LIMIT 1), '_', 2) > > it looks fishy. > merlin > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Christian Rengstl M.A. Klinik und Poliklinik für Innere Medizin II Kardiologie - Forschung Universitätsklinikum Regensburg B3 1.388 Franz-Josef-Strauss-Allee 11 93053 Regensburg Tel.: +49-941-944-7230 ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster