It really depends on how you are loafing data. If you are going line by
line then it's going to be very slow. Upi shpuld load datasets like this
with a copy from. If you have any issues with your csv file though, it's
going to be a problem. Things like commas without quotes tend to make up
the most common problem I deal with. Ypu could also split it up into
smaller files for processing so if one fails then you have a record of
where pick it up.

Check out this

https://www.postgresql.org/docs/current/sql-copy.html

Thanks,
Ben

On Sat, Nov 11, 2023, 3:55 PM Vince McMahon <sippingonesandze...@gmail.com>
wrote:

> I'm not querying with catch_all at the moment, but, other developers may.
>
> I am new.  Mind sharing how it matters, esp. How it makes loading n idx
> fast?
>
>
>
> On Sat, Nov 11, 2023, 3:05 PM Benedict Holland <
> benedict.m.holl...@gmail.com>
> wrote:
>
> > Are you using copy from?
> >
> > On Sat, Nov 11, 2023, 2:33 PM Vince McMahon <
> sippingonesandze...@gmail.com
> > >
> > wrote:
> >
> > > Hi,
> > >
> > > I have a CVS file with 200 fields and 100 million rows of historical
> and
> > > latest data.
> > >
> > > The current processing is taking 20+ hours.
> > >
> > > The schema is liked:
> > > <field name ="column1" type="string" indexed="true" stored="true">
> > > ...
> > > <field name ="column200" type="string" indexed="true" stored="true">
> > > <copyField source="column1" dest="_text_"/>
> > > <copyField source="column1" dest="_fuzzy_"/>
> > > ...
> > > <copyField source="column50" dest="_text_"/>
> > > <copyField source="column50" dest="_fuzzy_"/>
> > >
> > > In terms of hardware, I have 3 identical servers.  One of them is used
> to
> > > load this CSV to create a core.
> > >
> > > What is the fastest way to load and index this large and wide CSV file?
> > It
> > > is taking too long, 20+ hours, now.
> > >
> >
>

Reply via email to