Jack wrote: > "John Nagle" <[EMAIL PROTECTED]> wrote in message > news:[EMAIL PROTECTED] >> Jack wrote: >>> I need to process large amount of data. The data structure fits well >>> in a dictionary but the amount is large - close to or more than the size >>> of physical memory. I wonder what will happen if I try to load the data >>> into a dictionary. Will Python use swap memory or will it fail? >>> >>> Thanks. >> What are you trying to do? At one extreme, you're implementing >> something >> like a search engine that needs gigabytes of bitmaps to do joins fast as >> hundreds of thousands of users hit the server, and need to talk seriously >> about 64-bit address space machines. At the other, you have no idea how >> to either use a database or do sequential processing. Tell us more. >> > I have tens of millions (could be more) of document in files. Each of them > has other > properties in separate files. I need to check if they exist, update and > merge properties, etc. > And this is not a one time job. Because of the quantity of the files, I > think querying and > updating a database will take a long time... > And I think you are wrong. But of course the only way to find out who's right and who's wrong is to do some experiments and get some benchmark timings.
All I *would* say is that it's unwise to proceed with a memory-only architecture when you only have assumptions about the limitations of particular architectures, and your problem might actually grow to exceed the memory limits of a 32-bit architecture anyway. Swapping might, depending on access patterns, cause you performance to take a real nose-dive. Then where do you go? Much better to architect the application so that you anticipate exceeding memory limits from the start, I'd hazard. > Let's say, I want to do something a search engine needs to do in terms of > the amount of > data to be processed on a server. I doubt any serious search engine would > use a database > for indexing and searching. A hash table is what I need, not powerful > queries. > You might be surprised. Google, for example, use a widely-distributed and highly-redundant storage format, but they certainly don't keep the whole Internet in memory :-) Perhaps you need to explain the problem in more detail if you still need help. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://del.icio.us/steve.holden ------------------ Asciimercial --------------------- Get on the web: Blog, lens and tag your way to fame!! holdenweb.blogspot.com squidoo.com/pythonology tagged items: del.icio.us/steve.holden/python All these services currently offer free registration! -------------- Thank You for Reading ---------------- -- http://mail.python.org/mailman/listinfo/python-list