from:"Suresh Pillai"

Execution speed question

2008-07-25 Thread Suresh Pillai

I am performing simulations on networks (graphs).  I have a question on 
speed of execution (assuming very ample memory for now).  I simplify the 
details of my simulation below, as the question I ask applies more 
generally than my specific case.  I would greatly appreciate general 
feedback in terms of computing and of course considerations specific to 
implementation in Python.

The nodes in my network may be ON or OFF.  The network starts off with 
all nodes in the OFF state.  I loop through the nodes.  For each node 
that is OFF, I consider some probability of it turning ON based on the 
states of its neighbours.  I MUST GO THROUGH ALL NODES BEFORE DECIDING 
WHICH ONES TO TURN ON.

So my question is whether it is faster to 

1. loop through a list of ALL nodes and check for OFF nodes using ifs 

or to 

2. loop through a container of OFF nodes and remove from this when they  
turn ON

The second would result in looping through less nodes, especially as the 
simulation progresses, but how does the cost of removal compare with 
cheap ifs and would the extra memory usage affect performance.

I an appreciate that the cost of the if check, the number of nodes, and 
the type of container I use will come into the answer.

In my case, the ifs are cheap boolean queries (whether the node is ON or 
OFF).  The number of nodes is very large: millions for sure, maybe tens 
of millions.  If considering (2), take note of my BOLD text above, which 
means I can't remove nodes as I iterate through them in the main loop.

I naturally started coding with (2), but couldn't decide on the best data 
structure for python.  A set seemed ideal for speedy removal, but then I 
can't iterate through them with out popping.  An ordered list?  Some 
creative solution with numpy arrays?

There is also the complication that since we are in interpreted python, 
what is theoretically the best data structure may not in reality be 
optimal unless it is a default system object or coded externally in a 
compiled module.

Of course, I will start experimenting to see what the execution 
difference is, but I would appreciate some suggestions from others re 
which is best and also on best data structure for (2).  

I'm not a newbie, so you can get technical with me python-wise and 
algorithm wise.  I realise it is a 'basic' question, but it is something 
that I have always wondered about (cheap ifs versus extra structure) and 
with the number of nodes I am considering, it actually becomes an issue.

Many Thanks,
Suresh
--
http://mail.python.org/mailman/listinfo/python-list

Re: Execution speed question

2008-07-25 Thread Suresh Pillai

That's a good comparison for the general question I posed.  Thanks.  
Although I do believe lists are less than ideal here and a different data 
structure should be used.

To be more specific to my case:
As mentioned in my original post, I also have the specific condition that 
one does not know which nodes to turn ON until after all the 
probabilities are calculated (lets say we take the top m for example).  
In this case, the second and third will perform worse as the second one 
will require a remove from the list after the fact and the third will 
require another loop through the nodes to build the new list.  
--
http://mail.python.org/mailman/listinfo/python-list

Re: Execution speed question

2008-07-25 Thread Suresh Pillai

On Fri, 25 Jul 2008 16:51:42 +0200, Fredrik Lundh wrote:

> Unless I'm missing something, your example keeps going until it's
> flagged *all* nodes as "on", which, obviously, kills performance for the
> first version as the probability goes down.  The OP's question was about
> a single pass (but he did mention "as the simulation progresses", so I
> guess it's fair to test a complete simulation.)

I was referring to multiple passes as in Iain' test cases.  Although not 
necessarily till all nodes are ON, let's say to to a large proportion at 
least. 
--
http://mail.python.org/mailman/listinfo/python-list

Re: Execution speed question

2008-07-28 Thread Suresh Pillai

On Fri, 25 Jul 2008 08:08:57 -0700, Iain King wrote:

> On Jul 25, 3:39 pm, Suresh Pillai <[EMAIL PROTECTED]> wrote:
>> That's a good comparison for the general question I posed.  Thanks.
>> Although I do believe lists are less than ideal here and a different
>> data structure should be used.
>>
>> To be more specific to my case:
>> As mentioned in my original post, I also have the specific condition
>> that one does not know which nodes to turn ON until after all the
>> probabilities are calculated (lets say we take the top m for example).
>> In this case, the second and third will perform worse as the second one
>> will require a remove from the list after the fact and the third will
>> require another loop through the nodes to build the new list.
> 
> So you need to loops through twice regardless?  i.e. loop once to gather
> data on off nodes, do some calculation to work out what to turn on, then
> loop again to turn on the relevant nodes?  If so, then I think the
> functions above remain the same, becoming the 2nd loop. Every iteration
> you do a first loop over the off_nodes (or them all for (1)) to gather
> the data on them, perform your calculation, and then perform one of the
> above functions (minus the setup code at the begining; basically
> starting at the 'for') as a second loop, with the goes_on function now
> returning a value based on the calculation (rather than the calculation
> itself as I had it).  Performance should be similar.
> 
> Iain

If do I settle on an explicit loop to remove the nodes turned ON, then I 
realised this weekend that I could do this in the next iteration of the 
simulation (first loop above) and save some iteration overhead (the if 
checking will still be there of course).

And thanks for pointing out that constructing a new list, for long lists, 
is faster than simple removal.  It's obvious but I never really thought 
of it; good tip.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Execution speed question

2008-07-28 Thread Suresh Pillai

On Fri, 25 Jul 2008 09:22:06 -0600, Matthew Fitzgibbons wrote:

> As for different data structures, it largely depends on how you need to
> access the data. If you don't need to index the data, just loop through
> it, you might try a linked list. The performance hit in (2) is coming
> from the list del; using a linked list would make the removal constant
> rather than O(n), and may even execute faster than (3) as well.
> 
> -Matt

Yes, this was my first inclination.  So my question, as alluded to in my 
original post, is whether there are C compiled modules for linked lists, 
doubly linked lists, ordered lists ... (the standard data structures) 
somewhere, to get the extra performance out of them.

With python we have all built up creative ways of using the native 
structures for efficiency reasons.  This project was the first time (due 
to its extreme use of resources) that I've had to worry about these 
minute considerations of native vs new structure but also take into 
account native vs python level construct vs compiled module.

[P.S.  The linked list does compare well with (3) as expected.]
--
http://mail.python.org/mailman/listinfo/python-list

Re: Execution speed question

2008-07-28 Thread Suresh Pillai

On Fri, 25 Jul 2008 05:46:56 -0700, Iain King wrote:

> or 3. build a new list every iteration intead of deleting from the old
> one:
> 
> while processing:
> new_off_list = []
> for x in off_list:
> if goes_on(x):
> on_list.append(x)
> else:
> new_off_list.append(x)
> off_list = new_off_list
> generation += 1
> 
> Iain

Or 4, since the order of my nodes doesn't matter:  swap the node to be 
deleted with the last node in the list and then remove the last node of 
the list.  This is the fastest to date, if using native structures, for 
low number nodes being deleted per cycle (def if only deleting one).
--
http://mail.python.org/mailman/listinfo/python-list

Re: Execution speed question

2008-07-28 Thread Suresh Pillai

On Fri, 25 Jul 2008 17:05:27 -0400, Terry Reedy wrote:

> If the nodes do not have to be processed in any particular order, then
> you could keep them either in a dict, with the value being either On or
> Off (True,False)(plus connection data) or a pair of sets, one for On and
> one for Off.  The advantage of the dict is that the items would be fixed
> and only their values would change, but you needlessly scan through On
> items.   The advantage of the set pair is that you only scan through Off
> items but have to move some from Off to On.  I will not guess which
> would be faster over a complete run, or how this will compare with using
> lists.
> 
> tjr

Thanks for the reply.  As mentioned in my original post, sets came to 
mind straight way, doing it the way you suggest.  I alluded to, but 
didn't directly ask:

Since I am doing A LOT of loops over the nodes and the number of nodes is 
also huge, my concern using sets is that in order to iterate over the set 
in each step of my simulation, the set items need to be converted to a 
list every time.  So while removal from a set is much cheaper than say 
from a list, what about this conversion overhead in order to iterate over 
the items.

The dict suggestion is good.  Originally I had my nodes as objects, with 
a networkx object for the graph (which is a dict).  Since efficiency is 
the most important this for this piece of code, I may decide to forget 
about abstract nodes and put all attributes in a dict as you suggest.  
Too many permutations, which is why I made the original post, hoping 
wiser python coders could eliminate a few possibilities. :)
--
http://mail.python.org/mailman/listinfo/python-list

Re: Execution speed question

2008-07-28 Thread Suresh Pillai

On Mon, 28 Jul 2008 10:44:18 +0200, Suresh Pillai wrote:

> Since I am doing A LOT of loops over the nodes and the number of nodes
> is also huge, my concern using sets is that in order to iterate over the
> set in each step of my simulation, the set items need to be converted to
> a list every time.  So while removal from a set is much cheaper than say
> from a list, what about this conversion overhead in order to iterate
> over the items.

I could of course use the old trick of using a dictionary with 'None' 
values and then using iterkeys().  But I thought sets were supposed to 
replace this.  So maybe I should be asking a more basic question: is 
there any way to iterate over the items in a set other than converting to 
a list or using the pop() method.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Execution speed question

2008-07-28 Thread Suresh Pillai

On Mon, 28 Jul 2008 15:04:43 +0200, Suresh Pillai wrote:

> On Mon, 28 Jul 2008 10:44:18 +0200, Suresh Pillai wrote:
> 
>> Since I am doing A LOT of loops over the nodes and the number of nodes
>> is also huge, my concern using sets is that in order to iterate over
>> the set in each step of my simulation, the set items need to be
>> converted to a list every time.  So while removal from a set is much
>> cheaper than say from a list, what about this conversion overhead in
>> order to iterate over the items.
> 
> I could of course use the old trick of using a dictionary with 'None'
> values and then using iterkeys().  But I thought sets were supposed to
> replace this.  So maybe I should be asking a more basic question: is
> there any way to iterate over the items in a set other than converting
> to a list or using the pop() method.

Okay, please consider this my one absolutely stupid post for the year.  
I'd like to pretend it never happened but unfortunately the web doesn't 
allow that.  Having never used sets, I unfort read something that lead to 
it, but ...
--
http://mail.python.org/mailman/listinfo/python-list

Re: Execution speed question

2008-07-29 Thread Suresh Pillai

On Mon, 28 Jul 2008 16:48:28 +0200, Suresh Pillai wrote:

> Okay, please consider this my one absolutely stupid post for the year.
> I'd like to pretend it never happened but unfortunately the web doesn't
> allow that.  Having never used sets, I unfort read something that lead
> to it, but ...

Okay, got some sleep and what I meant to ask, although equally basic, but 
not silly:

For sets, I presume they are built on top of or like dicts, and there is 
nothing crazy in the low level implementation so that I can be guaranteed 
that if I don't alter the set, then the order, although arbitrary, will 
be maintained in successive iterations over the contents?
--
http://mail.python.org/mailman/listinfo/python-list

Execution speed question

Re: Execution speed question

Re: Execution speed question

Re: Execution speed question

Re: Execution speed question

Re: Execution speed question

Re: Execution speed question

Re: Execution speed question

Re: Execution speed question

Re: Execution speed question

10 matches

Site Navigation

Mail list logo

Footer information