Re: How to do fastest loading and indexing

Benedict Holland Sat, 11 Nov 2023 13:39:22 -0800

It really depends on how you are loafing data. If you are going line by
line then it's going to be very slow. Upi shpuld load datasets like this
with a copy from. If you have any issues with your csv file though, it's
going to be a problem. Things like commas without quotes tend to make up
the most common problem I deal with. Ypu could also split it up into
smaller files for processing so if one fails then you have a record of
where pick it up.


Check out this

https://www.postgresql.org/docs/current/sql-copy.html

Thanks,
Ben

On Sat, Nov 11, 2023, 3:55 PM Vince McMahon <sippingonesandze...@gmail.com>
wrote:

> I'm not querying with catch_all at the moment, but, other developers may.
>
> I am new.  Mind sharing how it matters, esp. How it makes loading n idx
> fast?
>
>
>
> On Sat, Nov 11, 2023, 3:05 PM Benedict Holland <
> benedict.m.holl...@gmail.com>
> wrote:
>
> > Are you using copy from?
> >
> > On Sat, Nov 11, 2023, 2:33 PM Vince McMahon <
> sippingonesandze...@gmail.com
> > >
> > wrote:
> >
> > > Hi,
> > >
> > > I have a CVS file with 200 fields and 100 million rows of historical
> and
> > > latest data.
> > >
> > > The current processing is taking 20+ hours.
> > >
> > > The schema is liked:
> > > <field name ="column1" type="string" indexed="true" stored="true">
> > > ...
> > > <field name ="column200" type="string" indexed="true" stored="true">
> > > <copyField source="column1" dest="_text_"/>
> > > <copyField source="column1" dest="_fuzzy_"/>
> > > ...
> > > <copyField source="column50" dest="_text_"/>
> > > <copyField source="column50" dest="_fuzzy_"/>
> > >
> > > In terms of hardware, I have 3 identical servers.  One of them is used
> to
> > > load this CSV to create a core.
> > >
> > > What is the fastest way to load and index this large and wide CSV file?
> > It
> > > is taking too long, 20+ hours, now.
> > >
> >
>

Re: How to do fastest loading and indexing

Reply via email to