Re: [orientdb] Index creation speed is too slow

Suhas Tue, 14 May 2019 00:26:50 -0700

Response inline.

On Thursday, May 9, 2019 at 3:21:47 PM UTC, Jérôme Mainaud wrote:
>
> OK, I'm not surprised by the SB-Tree insert cost increase as adding a key 
> complexity in such a Tree is O(log(n)).
>
> For your first case, I see no other solution as build an index but you can 
> do it with a UNIQUE_HASH_INDEX. If the implementation is good, adding a 
> key should be mean time constant (some keys are punctually more expensive, 
> when the index storage base has to grow).
>
> Tried it. There is no difference. Initially beginning with 20000 
items/sec, after about one and a half days, the speed decreased down to 500 
items/sec.


For other cases, have you tried to query directly from the vertex ?
>
> Suppose we have this data:
> create class Person extends V;
> create property Person.name string;
>
> create class Company extends V;
> create property Company.name string;
>
> create class WorkedAt extends E;
>
> /* Add constraints on the edge. */
> create property WorkedAt.out link Person;
> create property WorkedAt.in link Company;
>
> insert into Person (name) values ('jerome');
> insert into Person (name) values ('john doe');
>
> insert into Company (name) values ('Zeenea');
> insert into Company (name) values ('Ippon Technologies');
> insert into Company (name) values ('Klee Group');
> insert into Company (name) values ('World Big Company');
>
> create edge WorkedAt from (select from Person where name = 'jerome') to 
> (select from Company where name = 'Zeenea');
> create edge WorkedAt from (select from Person where name = 'jerome') to 
> (select from Company where name = 'Ippon Technologies');
> create edge WorkedAt from (select from Person where name = 'jerome') to 
> (select from Company where name = 'Klee Group');
> create edge WorkedAt from (select from Person where name = 'john doe') to 
> (select from Company where name = 'World Big Company');
>
> *Use case 2*
> I can count out going link from Person with this query:
>
> orientdb {db=tdb}> select name, out('WorkedAt').size() from Person
>
> +----+--------+----------------------+
> |#   |name    |out('WorkedAt').size()|
> +----+--------+----------------------+
> |0   |jerome  |3                     |
> |1   |john doe|1                     |
> +----+--------+----------------------+
>
> Which can be further optimized as (if not already done by the optimizer):
>
> orientdb {db=tdb}> select name, out_WorkedAt.size() from Person
>
> +----+--------+-------------------+
> |#   |name    |out_WorkedAt.size()|
> +----+--------+-------------------+
> |0   |jerome  |3                  |
> |1   |john doe|1                  |
> +----+--------+-------------------+
>
> Those queries use direct links and don't need index, the last one just 
> don't need the edge at all.
>
> *Use case 3*
> I can test if a person work in a company with this query:
>
> orientdb {db=tdb}> select count() from Person where name = 'jerome' and 
> out('WorkedAt') contains (name = 'Zeenea')
>
> +----+-------+
> |#   |count()|
> +----+-------+
> |0   |1      |
> +----+-------+
>
> If count result is one or more items are linked.
> This query use direct links and don't need index.
>
> Of course that just a way to give you the idea. You have to adapt it to 
> your use case.
>
> Last but not least, just don't trust me. Test! 
> I don't have billions of edges.
> Give me some feedback if I'm wrong or if I miss something. (I am learning 
> while I respond to you.)
>
> my 2 cents,
>
> -- 
> Jérôme Mainaud
> jer...@mainaud.com <javascript:>
>
>
> Le mer. 8 mai 2019 à 23:37, Suhas <suhass...@gmail.com <javascript:>> a 
> écrit :
>
>> Hey Jerome,
>>
>> Here are a few reasons why I needed an index:
>>
>> 1. Apply unique constraint on the edge. (no more than a single edge 
>> between a pair of vertices)
>> 2. Compute incoming and outgoing edge count faster.
>> 3. Whether two vertices are connected or not.
>>
>> Meanwhile, I'm using an SB-Tree Index
>>
>>
>> On Wednesday, May 8, 2019 at 7:15:25 PM UTC, Jérôme Mainaud wrote:
>>>
>>> Hello,
>>>
>>> I don't know the exact implementation used by OrientDB, and it depends 
>>> of the type of index you choose.
>>> But it's not a big surprise that the time to include a key increase with 
>>> the number of entries in the index.
>>> Hash indexes should be less sensible to cost increase.
>>>
>>> What the purpose of indexing in and ou keys of your edge ? 
>>> Queries won't benefit from them as they use links from vertex to the 
>>> edge to traverse the graph which is far more efficient.
>>> Tell me if I'm wrong about that.
>>>
>>> -- 
>>> Jérôme Mainaud
>>> jer...@mainaud.com
>>>
>>>
>>> Le mer. 8 mai 2019 à 16:04, Suhas <suhass...@gmail.com> a écrit :
>>>
>>>> I’m creating indexes for an Edge class containing about 500 million 
>>>> records on keys (in, out). The index creation progressed well in the 
>>>> beginning at about 20,000 items/sec. But then after some time has 
>>>> decreased 
>>>> to <1000 items/sec.
>>>>
>>>>
>>>> 2019-05-08 08:43:25:885 INFO  {db=cgraph} --> 37.00% progress, 177,405,476 
>>>> indexed so far (855 items/sec) [OIndexRebuildOutputListener]
>>>> 2019-05-08 08:43:35:899 INFO  {db=cgraph} --> 37.00% progress, 177,415,347 
>>>> indexed so far (987 items/sec) [OIndexRebuildOutputListener] 
>>>> 2019-05-08 08:43:45:902 INFO  {db=cgraph} --> 37.00% progress, 177,427,464 
>>>> indexed so far (1,211 items/sec) [OIndexRebuildOutputListener]
>>>>
>>>>
>>>> At this speed, it’ll take like 3-4 days!!
>>>> Settings used on 16GB RAM and 300GB SSD
>>>> java -server -Xms2G -Xmx7G -Dstorage.diskCache.bufferSize=7200
>>>>
>>>>
>>>> [image: Screenshot from 2019-05-08 09-06-47.png]
>>>>
>>>> Any idea why the speed of indexing decreased so drastically? And how 
>>>> can I increase the speed of indexing?
>>>>
>>>> Orientdb 3.0.15
>>>>
>>>> -- 
>>>>
>>>> --- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "OrientDB" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to orient-...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/orient-database/95597c3e-632b-4570-af51-f07227dc1965%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/orient-database/95597c3e-632b-4570-af51-f07227dc1965%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> -- 
>>
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to orient-...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/orient-database/52f2837f-0663-4abf-9ed2-1715cda3c97b%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/orient-database/52f2837f-0663-4abf-9ed2-1715cda3c97b%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to orient-database+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/orient-database/21389bd0-d014-4b25-ba4c-af685f55974f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Index creation speed is too slow

Reply via email to