HI!
We are developing scoring system for recruitment. Recruiter enters vacancy
requirements, and we score tens of thousands of CVs to this requirements,
and return e.g. top 10 matches.
We do not use fulltext search and sometimes even dont filter input CVs
prior to scoring (some vacancies do not have mandatory requirements that
can be used as a filter effectively).

So we have scoring function F(CV,VACANCY) that is currently inplemented in
SQL and runs on Postgresql cluster. In worst case F is executed once on
every CV in database. VACANCY part is fixed for one query, but changes
between queries and there's very little we can process in advance.

We expect to have about 100 000 000 CVs in next year, and do not expect our
current implementation to offer desired low latency responce (<1 s) on 100M
CVs. So we look for a horizontaly scaleable and fault-tolerant in-memory
solution.

Will Spark be usefull for our task? All tutorials I could find describe
stream processing, or ML applications. What Spark extensions/backends can
be useful?


With best regards, Segey Melekhin

Reply via email to