1) Yaps it means fetching > 1000 entities, how many exactly ? it depends (entity size probably is the main factor here) but something in the range of 5000 + is achievable. 2) A task does some background work i.e. fetching - creating - modifying records doing calculations and preparing intermediary results etc. can run for 10 minutes, sure you can call a url when while or after a task executes. The 30'' limit applies to client facing requests.
On Apr 28, 5:35 am, Nischal Shetty <[email protected]> wrote: > @Nick > > 1) the 1000 entities (rows) limit has been lifted long time ago. > > I thought by lifting the limit it meant I could go ahead and fetech 1001- > 2000 using a cursor. So I guess, it means pulling more than 1000 rows at a > time, stupid me :) > > 2) tasks are not limited by the 30s limit - can run for 10 minutes. > > We provide URLs that would be called when the task executes. Those would > stop in 30s right? So, what exactly is this 10 minute limit, I haven't been > able to wrap my head around the 10 minute thingy. > > On 26 April 2011 00:58, nickmilon <[email protected]> wrote: > > > > > > > > > 1) the 1000 entities (rows) limit has been lifted long time ago. > > 2) tasks are not limited by the 30s limit - can run for 10 minutes. > > > Happy coding ;-) > > Nick > > On Apr 25, 9:01 am, Nischal Shetty <[email protected]> wrote: > > > I will indeed try a few ways to do this. But pulling all rows > > individually > > > would be an overkill because every query gives us 1000 rows at a time > > which > > > means I would hit the 30s limit while I'm at it :( > > > > For searching the IDs that I have at hand, I would not need to > > deserialize > > > the array of ids. I would be making use of Bloom Filter which I think > > would > > > speed things up. I would need to deserialize all the ids occasionally for > > > some rare computational purposes. > > > > So my use case would consist 80% search a bunch of IDs and 20% > > deserialize > > > all the IDs. > > > > On 25 April 2011 10:24, David Parks <[email protected]> wrote: > > > > > I did indeed mean pulling back a result set of say 200,000 rows. If I’m > > > > following the conversation correctly then what you described was > > storing all > > > > IDs, querying that one field and de-serializing all IDs into an array > > that > > > > you can then search for the ID’s you need. > > > > > I like that idea. But I certainly can’t tell you if the overhead of > > reading > > > > all values, and deserializing them will be better or worse than the > > overhead > > > > of scrolling through a large result set and loading the database with > > > > hundreds of millions of rows. Of all databases you could be using, > > googles > > > > big table is certainly well designed for large data sets. > > > > > It seems that your proposed method makes great sense when you need the > > > > entire result set (or close to it) for one or more users. But when you > > only > > > > need 100 results of 150,000, then the deserialization process is going > > to > > > > constitute a measurable overhead. Also, I can’t say for sure how the > > google > > > > datastore will perform when you commit hundreds of millions of rows to > > it. > > > > Of course, if small queries like are rare, then maybe it’s not so > > important > > > > to consider them. > > > > > Anyway, I guess you could write, in perhaps a day or less, a very > > simple > > > > test case that populate the datastore with both scenarios and profile > > them. > > > > > Doing the profiling work will probably give you some very useful > > insight > > > > and experience on how things will really perform in reality. > > > > > I’d also suggest that you encapsulate this functionality so that you > > can > > > > easily replace one strategy with another without changing code > > unrelated to > > > > the data store (e.g. design your code using proper data access objects > > to > > > > keep this code separate from the rest of your code, and code to > > interfaces > > > > up front). > > > > > *From:* [email protected] [mailto: > > > > [email protected]] *On Behalf Of *Nischal Shetty > > > > *Sent:* Monday, April 25, 2011 10:34 AM > > > > > *To:* [email protected] > > > > *Subject:* Re: [google-appengine] Appropriate way to save hundreds of > > > > thousands of ids per user > > > > > @David > > > > > Querying the whole group would mean having 200,000 results for few of > > my > > > > users. Pulling all that and then searching, wouldn't that be > > inefficient? or > > > > are you talking about sharded ListProperty here? > > > > > On 25 April 2011 05:41, David Parks <[email protected]> wrote: > > > > > That seems like a reasonable approach. But I think you should do both > > > > tests. 1) let google do the work and store a lot of records, 2) query > > the > > > > whole group and parse it into an array and search the array. It > > wouldn’t be > > > > too hard to created a simple test case that populates the data for > > whatever > > > > # of users you need to plan for and profile the lookup and storage > > speeds of > > > > both. > > > > > I’d love to know your results if you do test both approaches. > > > > > *From:* [email protected] [mailto: > > > > [email protected]] *On Behalf Of *Nischal Shetty > > > > *Sent:* Friday, April 22, 2011 3:10 PM > > > > > *To:* [email protected] > > > > > *Subject:* Re: [google-appengine] Appropriate way to save hundreds of > > > > thousands of ids per user > > > > > @David > > > > > Thanks for the input. Every reply gives me some more insight into how I > > > > achieve this. My use case is as below : > > > > > 1. At times I would need all the IDs at the same time in memory > > > > > 2. Most of the times I would need to check if a set of IDs as input by > > the > > > > user (say 100 IDs) are present in the datastore > > > > > I've been thinking of doing the following : > > > > > 1. Persisting all the IDs by putting them into an array (I will > > probably > > > > have shards where each array would hold 50k IDs) > > > > > 2. Implementing a bloom filter to search for the set of IDs if they > > exist > > > > in the datastore. > > > > > On 22 April 2011 09:34, David Parks <[email protected]> wrote: > > > > > I don’t know your intended use of these ID’s, my thoughts here are > > limited > > > > to assumed use, feel free to ignore thoughts that are off base for your > > use > > > > case. > > > > > If, when you query for the IDs you are looking for **all** the IDs, > > then > > > > just serialize them into one field and retrieve them as one record and > > > > de-serialize them in a way that doesn’t require they all fit into > > memory at > > > > the same time (a tokenized CSV list is most straight forward example, > > but > > > > you can do more compact serializations). > > > > > If you need to query for some subset of these IDs, then storing them in > > the > > > > datastore is indeed the way to go I suspect. You can batch many > > > > inserts/updates. You’ll have a large table, but that isn’t likely to be > > a > > > > problem with this data store, but do test it. If lookup times degrade > > with > > > > size you could consider partitioning your users into different groups > > > > (simple example: 1 group of users IDs that end in even #’s, another > > that > > > > ends in odd #’s), this can reduce the size of indexes and improve > > > > performance on some systems (I don’t have personal experience to tell > > you > > > > whether this is necessary in this system, but it’s a thought to > > consider). > > > > > Again, I just offer this as food for thought. If you describe your > > intended > > > > access patterns it will probably help guide the discussion. Good luck. > > > > > *From:* [email protected] [mailto: > > > > [email protected]] *On Behalf Of *nischalshetty > > > > *Sent:* Tuesday, April 19, 2011 1:15 PM > > > > *To:* [email protected] > > > > *Subject:* [google-appengine] Appropriate way to save hundreds of > > > > thousands of ids per user > > > > > Every user in my app would have thousands of ids corresponding to them. > > I > > > > would need to look up these ids often. > > > > > Two things I could think of: > > > > > 1. Put them into Lists - (drawback is that lists have a maximum > > capacity of > > > > 5000(hope I'm right here) and I have users who would need to save more > > than > > > > 150,000 ids) > > > > 2. Insert each id as a unique record in the datastore (too much of > > data? as > > > > it would be user * ids of all users). Can I batch put 5000 records at a > > > > time? Can I batch get at least 100 - 500 records at a time? > > > > > Is there any other way to do this? I hope my question's clear. Your > > > > suggestions are greatly appreciated. > > > > > -- > > > > You received this message because you are subscribed to the Google > > Groups > > > > "Google App Engine" group. > > > > To post to this group, send email to [email protected] > > . > > > > To unsubscribe from this group, send email to > > > > [email protected]. > > > > For more options, visit this group at > > > >http://groups.google.com/group/google-appengine?hl=en. > > > > ------------------------------ > > > > > No virus found in this message. > > > > Checked by AVG -www.avg.com > > > > Version: 10.0.1209 / Virus Database: 1500/3582 - Release Date: 04/18/11 > > > > > -- > > > > You received this message because you are subscribed to the Google > > Groups > > > > "Google App Engine" group. > > > > To post to this group, send email to [email protected] > > . > > > > To unsubscribe from this group, send email to > > > > [email protected]. > > > > For more options, visit this group at > > > >http://groups.google.com/group/google-appengine?hl=en. > > > > > -- > > > > -Nischal > > > > > +91-9920240474 > > > > > twitter: NischalShetty <http://twitter.com/nischalshetty> > > > > > facebook: Nischal <http://facebook.com/nischal> > > > > > <http://www.justunfollow.com> > > > > > -- > > > > You received this message because you are subscribed to the Google > > Groups > > > > "Google App Engine" group. > > > > To post to this group, send email to [email protected] > > . > > > > To unsubscribe from this group, send email to > > > > [email protected]. > > > > For more options, visit this group at > > > >http://groups.google.com/group/google-appengine?hl=en. > > > > ------------------------------ > > > > > No virus found in this message. > > > > Checked by AVG -www.avg.com > > > > > Version: 10.0.1209 / Virus Database: 1500/3589 - Release > > ... > > read more » -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
