Comparing sequences with range objects
I am working with a list of data from which I have to weed out duplicates. At the moment I keep for each entry a container with the other entries that are still possible duplicates. The problem is sometimes that is all the rest. I thought to use a range object for these cases. Unfortunatly I sometimes want to sort things and a range object is not comparable with a list or a tuple. So I have a list of items where each item is itself a list or range object. I of course could sort this by using list as a key function but that would defeat the purpose of using range objects for these cases. So what would be a relatively easy way to get the same result without wasting too much memory on entries that haven't any weeding done on them. -- Antoon Pardon. -- https://mail.python.org/mailman/listinfo/python-list
Re: Comparing sequences with range objects
On Thu, Apr 7, 2022 at 7:19 AM Antoon Pardon wrote: > > I am working with a list of data from which I have to weed out duplicates. > At the moment I keep for each entry a container with the other entries > that are still possible duplicates. > > The problem is sometimes that is all the rest. I thought to use a range > object for these cases. Unfortunatly I sometimes want to sort things > and a range object is not comparable with a list or a tuple. > > So I have a list of items where each item is itself a list or range object. > I of course could sort this by using list as a key function but that > would defeat the purpose of using range objects for these cases. > > So what would be a relatively easy way to get the same result without wasting > too much memory on entries that haven't any weeding done on them. > > -- > Antoon Pardon. > -- > https://mail.python.org/mailman/listinfo/python-list I'm not sure I understand what you are trying to do, but if your data has no order, you can use set to remove the duplicates -- Joel Goldstick -- https://mail.python.org/mailman/listinfo/python-list
Re: Sharing part of a function
Cecil Westerhof writes: > To show why it is often easy, but wrong to use recursive functions I > wrote the following two Fibonacci functions: > def fib_ite(n): > if not type(n) is int: > raise TypeError(f'Need an integer ({n})') > if n < 0: > raise ValueError(f'Should not be negative ({n})') > > if n in [0, 1]: > return n > > # a is previous fibonacy (starts with fib(0)) > # b is current fibonaccy (starts with fib(1)) > a, b = 0, 1 > # range goes to n - 1, so after loop b contains fib(n) > for i in range(1, n): > a, b = b, a + b > return b > > > def fib_rec(n): > if not type(n) is int: > raise TypeError(f'Need an integer ({n})') > if n < 0: > raise ValueError(f'Should not be negative ({n})') > > if n in [0, 1]: > return n > > return fib_rec(n - 2) + fib_rec(n - 1) > > The first eight lines are the same. And I did change the description > of the errors, which had to be done in both functions. What would be > the best way to circumvent this? > Two options are: > - Call an init function. > - Call the 'master' function with a lambda. > > What is the preferable way, or is there a better way? I have chosen this implementation with inner functions: def fibonacci(n, implementation = 'iterative'): def ite(n): # a is previous fibonacy (starts with fib(0)) # b is current fibonaccy (starts with fib(1)) a, b = 0, 1 # range goes to n - 1, so after loop b contains fib(n) for i in range(1, n): a, b = b, a + b return b def rec(n): if n in [0, 1]: return n return rec(n - 2) + rec(n - 1) if not type(n) is int: raise TypeError(f'Need an integer ({n})') if n < 0: raise ValueError(f'Should not be negative ({n})') if n in [0, 1]: return n if implementation == 'iterative': return ite(n) elif implementation == 'recursive': return rec(n) raise ValueError(f'Got a wrong function implementation type: {type}') -- Cecil Westerhof Senior Software Engineer LinkedIn: http://www.linkedin.com/in/cecilwesterhof -- https://mail.python.org/mailman/listinfo/python-list
Re: Comparing sequences with range objects
Op 7/04/2022 om 16:08 schreef Joel Goldstick: On Thu, Apr 7, 2022 at 7:19 AM Antoon Pardon wrote: I am working with a list of data from which I have to weed out duplicates. At the moment I keep for each entry a container with the other entries that are still possible duplicates. The problem is sometimes that is all the rest. I thought to use a range object for these cases. Unfortunatly I sometimes want to sort things and a range object is not comparable with a list or a tuple. So I have a list of items where each item is itself a list or range object. I of course could sort this by using list as a key function but that would defeat the purpose of using range objects for these cases. So what would be a relatively easy way to get the same result without wasting too much memory on entries that haven't any weeding done on them. -- Antoon Pardon. -- https://mail.python.org/mailman/listinfo/python-list I'm not sure I understand what you are trying to do, but if your data has no order, you can use set to remove the duplicates Sorry I wasn't clear. The data contains information about persons. But not all records need to be complete. So a person can occur multiple times in the list, while the records are all different because they are missing different bits. So all records with the same firstname can be duplicates. But if I have a record in which the firstname is missing, it can at that point be a duplicate of all other records. -- Antoon Pardon -- https://mail.python.org/mailman/listinfo/python-list
Re: Comparing sequences with range objects
Antoon Pardon wrote at 2022-4-7 17:16 +0200: > ... >Sorry I wasn't clear. The data contains information about persons. But not >all records need to be complete. So a person can occur multiple times in >the list, while the records are all different because they are missing >different bits. > >So all records with the same firstname can be duplicates. But if I have >a record in which the firstname is missing, it can at that point be >a duplicate of all other records. The description is still not clear enough. Especially, it does not show where `range` objects come into play. Answering on a more abstract level: Apparently, you want to sort a list the elements of which can be other lists or range objects. List objects are ordered lexicographically, i.e. for lists l1, l2: l1 <= l2 iff not l1 or (l2 and l1[0] <= l2[0] and l1[1:] <= l2[1:]). If you want to sort a list containing list elements are compared using this order. For your case, you would need to use a `key` parameter for `sort` that implements this order for `range` objects, too. (Note that Python provides a function which transforms an order definition into an appropriate `key` function). A corresponding `sort` call may expand your range objects completely. An alternative might be to not expand `range` objects but to put them all at the start or end of the sorted list. Of course, this would imply that their expansion does not influence their order in the list -- which may or may not be acceptable (depending on your use case). If it is acceptable, it is likely possible to not put range objects into the list to be sorted in the first place. -- https://mail.python.org/mailman/listinfo/python-list
Re: Comparing sequences with range objects
On 2022-04-07 16:16, Antoon Pardon wrote: Op 7/04/2022 om 16:08 schreef Joel Goldstick: On Thu, Apr 7, 2022 at 7:19 AM Antoon Pardon wrote: I am working with a list of data from which I have to weed out duplicates. At the moment I keep for each entry a container with the other entries that are still possible duplicates. The problem is sometimes that is all the rest. I thought to use a range object for these cases. Unfortunatly I sometimes want to sort things and a range object is not comparable with a list or a tuple. So I have a list of items where each item is itself a list or range object. I of course could sort this by using list as a key function but that would defeat the purpose of using range objects for these cases. So what would be a relatively easy way to get the same result without wasting too much memory on entries that haven't any weeding done on them. -- Antoon Pardon. -- https://mail.python.org/mailman/listinfo/python-list I'm not sure I understand what you are trying to do, but if your data has no order, you can use set to remove the duplicates Sorry I wasn't clear. The data contains information about persons. But not all records need to be complete. So a person can occur multiple times in the list, while the records are all different because they are missing different bits. So all records with the same firstname can be duplicates. But if I have a record in which the firstname is missing, it can at that point be a duplicate of all other records. This is how I'd approach it: # Make a list of groups, where each group is a list of potential duplicates. # Initially, all of the records are potential duplicates of each other. records = [list_of_records] # Split the groups into subgroups according to the first name. new_records = [] for group in records: subgroups = defaultdict(list) for record in group: subgroups[record['first_name']].append(record) # Records without a first name could belong to any of the subgroups. missing = subgroups.pop(None, []) for record in missing: for subgroup in subgroups.values(): subgroup.extend(missing) new_records.extend(subgroups.values()) records = new_records # Now repeat for the last name, etc. -- https://mail.python.org/mailman/listinfo/python-list
Re: 'äÄöÖüÜ' in Unicode (utf-8)
Dennis Lee Bieber writes: > On Fri, 1 Apr 2022 03:59:32 +1100, Chris Angelico > declaimed the following: > > >>That's jmf. Ignore him. He knows nothing about Unicode and is >>determined to make everyone aware of that fact. >> >>He got blocked from the mailing list ages ago, and I don't think >>anyone's regretted it. > Ah yes... Unfortunately, when gmane made the mirror read-only, I had to > revert to comp.lang.python... and all the junk that gets in via that and > Google Groups... Hm. I just configured my news reader to send follow-ups to the mailing list when that happened. -- https://mail.python.org/mailman/listinfo/python-list
Re: Comparing sequences with range objects
On 2022-04-07 17:16:41 +0200, Antoon Pardon wrote: > Op 7/04/2022 om 16:08 schreef Joel Goldstick: > > On Thu, Apr 7, 2022 at 7:19 AM Antoon Pardon wrote: > > > I am working with a list of data from which I have to weed out duplicates. > > > At the moment I keep for each entry a container with the other entries > > > that are still possible duplicates. [...] > Sorry I wasn't clear. The data contains information about persons. But not > all records need to be complete. So a person can occur multiple times in > the list, while the records are all different because they are missing > different bits. > > So all records with the same firstname can be duplicates. But if I have > a record in which the firstname is missing, it can at that point be > a duplicate of all other records. There are two problems. The first one is how do you establish identity. The second is how do you ween out identical objects. In your first mail you only asked about the second, but that's easy. The first is really hard. Not only may information be missing, no single single piece of information is unique or immutable. Two people may have the same name (I know about several other "Peter Holzer"s), a single person might change their name (when I was younger I went by my middle name - how would you know that "Peter Holzer" and "Hansi Holzer" are the same person?), they will move (= change their address), change jobs, etc. Unless you have a unique immutable identifier that's enforced by some authority (like a social security number[1]), I don't think there is a chance to do that reliably in a program (although with enough data, a heuristic may be good enough). hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature -- https://mail.python.org/mailman/listinfo/python-list
Re: dict.get_deep()
On 2022-04-03 23:17:04 +0200, Marco Sulla wrote: > On Sun, 3 Apr 2022 at 21:46, Peter J. Holzer wrote: > > > > data.get_deep("users", 0, "address", "street", default="second star") > > > > Yep. Did that, too. Plus pass the final result through a function before > > returning it. > > I didn't understand. Have you added a func parameter? Yes. Look at the code I posted (it's only 9 lines long). > > I'm not sure whether I considered this when I wrote it, but a function > > has the advantage of working with every class which can be indexed. A > > method must be implemented on any class (so at least dict and list to be > > useful). > > You're right, but where to put it? I don't know if an iterableutil package > exists. If included in the stdlib, I don't know where to put it. In > collections maybe? Yes, that seems like the least bad choice. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature -- https://mail.python.org/mailman/listinfo/python-list
Re: Comparing sequences with range objects
On Fri, 8 Apr 2022 at 16:26, Peter J. Holzer wrote: > Unless you have a unique immutable identifier that's enforced by > some authority (like a social security number[1]), I don't think there > is a chance to do that reliably in a program (although with enough data, > a heuristic may be good enough). > Not sure what your footnote was supposed to be, but I'll offer two useful footnotes to that: [1] [a] Though they're not actually immutable either, just less frequently changing [1[ [b] Of course, that has privacy implications. ChrisA -- https://mail.python.org/mailman/listinfo/python-list