It really depends on how you are loafing data. If you are going line by line then it's going to be very slow. Upi shpuld load datasets like this with a copy from. If you have any issues with your csv file though, it's going to be a problem. Things like commas without quotes tend to make up the most common problem I deal with. Ypu could also split it up into smaller files for processing so if one fails then you have a record of where pick it up.
Check out this https://www.postgresql.org/docs/current/sql-copy.html Thanks, Ben On Sat, Nov 11, 2023, 3:55 PM Vince McMahon <sippingonesandze...@gmail.com> wrote: > I'm not querying with catch_all at the moment, but, other developers may. > > I am new. Mind sharing how it matters, esp. How it makes loading n idx > fast? > > > > On Sat, Nov 11, 2023, 3:05 PM Benedict Holland < > benedict.m.holl...@gmail.com> > wrote: > > > Are you using copy from? > > > > On Sat, Nov 11, 2023, 2:33 PM Vince McMahon < > sippingonesandze...@gmail.com > > > > > wrote: > > > > > Hi, > > > > > > I have a CVS file with 200 fields and 100 million rows of historical > and > > > latest data. > > > > > > The current processing is taking 20+ hours. > > > > > > The schema is liked: > > > <field name ="column1" type="string" indexed="true" stored="true"> > > > ... > > > <field name ="column200" type="string" indexed="true" stored="true"> > > > <copyField source="column1" dest="_text_"/> > > > <copyField source="column1" dest="_fuzzy_"/> > > > ... > > > <copyField source="column50" dest="_text_"/> > > > <copyField source="column50" dest="_fuzzy_"/> > > > > > > In terms of hardware, I have 3 identical servers. One of them is used > to > > > load this CSV to create a core. > > > > > > What is the fastest way to load and index this large and wide CSV file? > > It > > > is taking too long, 20+ hours, now. > > > > > >