Re: Comparing sequences with range objects

2022-04-11 Thread Antoon Pardon
Op 11/04/2022 om 02:31 schreef Dan Stromberg: It sounds a little like you're looking for interval arithmetic. Maybe https://pypi.org/project/python-intervals/1.5.3/ ? Not completely but it suggested an idea to explore. -- Antoon. -- https://mail.python.org/mailman/listinfo/python-list

Re: Comparing sequences with range objects

2022-04-11 Thread Antoon Pardon
Op 11/04/2022 om 02:01 schreef duncan smith: On 10/04/2022 21:20, Antoon Pardon wrote: Op 9/04/2022 om 02:01 schreef duncan smith: On 08/04/2022 22:08, Antoon Pardon wrote: Well my first thought is that a bitset makes it less obvious to calulate the size of the set or to iterate over it

Re: Comparing sequences with range objects

2022-04-10 Thread Dan Stromberg
Two more things: 1) There are also modules that do interval arithmetic with reals, not just integers. EG: https://pyinterval.readthedocs.io/en/latest/ 2) If you don't mind a small amount of inaccuracy you can often get things down to less than one bit per element for presence/absence, using a bloo

Re: Comparing sequences with range objects

2022-04-10 Thread 2QdxY4RzWzUUiLuE
On 2022-04-10 at 22:20:33 +0200, Antoon Pardon wrote: > > > Op 9/04/2022 om 02:01 schreef duncan smith: > > On 08/04/2022 22:08, Antoon Pardon wrote: > > > > > > Well my first thought is that a bitset makes it less obvious to calulate > > > the size of the set or to iterate over its elements.

Re: Comparing sequences with range objects

2022-04-10 Thread Dan Stromberg
It sounds a little like you're looking for interval arithmetic. Maybe https://pypi.org/project/python-intervals/1.5.3/ ? On Thu, Apr 7, 2022 at 4:19 AM Antoon Pardon wrote: > I am working with a list of data from which I have to weed out duplicates. > At the moment I keep for each entry a conta

Re: Comparing sequences with range objects

2022-04-10 Thread duncan smith
On 10/04/2022 21:20, Antoon Pardon wrote: Op 9/04/2022 om 02:01 schreef duncan smith: On 08/04/2022 22:08, Antoon Pardon wrote: Well my first thought is that a bitset makes it less obvious to calulate the size of the set or to iterate over its elements. But it is an idea worth exploring.

Re: Comparing sequences with range objects

2022-04-10 Thread Antoon Pardon
Op 9/04/2022 om 02:01 schreef duncan smith: On 08/04/2022 22:08, Antoon Pardon wrote: Well my first thought is that a bitset makes it less obvious to calulate the size of the set or to iterate over its elements. But it is an idea worth exploring. def popcount(n):     """     Returns the

Re: Comparing sequences with range objects

2022-04-09 Thread Ian Hobson
On 09/04/2022 13:14, Christian Gollwitzer wrote: Am 08.04.22 um 09:21 schrieb Antoon Pardon: The first is really hard. Not only may information be missing, no single single piece of information is unique or immutable. Two people may have the same name (I know about several other "Peter Holzer"

Re: Comparing sequences with range objects

2022-04-09 Thread Christian Gollwitzer
Am 08.04.22 um 09:21 schrieb Antoon Pardon: The first is really hard. Not only may information be missing, no single single piece of information is unique or immutable. Two people may have the same name (I know about several other "Peter Holzer"s), a single person might change their name (when I

Re: Comparing sequences with range objects

2022-04-09 Thread duncan smith
On 08/04/2022 22:08, Antoon Pardon wrote: Op 8/04/2022 om 16:28 schreef duncan smith: On 08/04/2022 08:21, Antoon Pardon wrote: Yes I know all that. That is why I keep a bucket of possible duplicates per "identifying" field that is examined and use some heuristics at the end of all the compar

Re: Comparing sequences with range objects

2022-04-08 Thread Antoon Pardon
Op 8/04/2022 om 16:28 schreef duncan smith: On 08/04/2022 08:21, Antoon Pardon wrote: Yes I know all that. That is why I keep a bucket of possible duplicates per "identifying" field that is examined and use some heuristics at the end of all the comparing instead of starting to weed out the du

Re: Comparing sequences with range objects

2022-04-08 Thread duncan smith
On 08/04/2022 08:21, Antoon Pardon wrote: Op 8/04/2022 om 08:24 schreef Peter J. Holzer: On 2022-04-07 17:16:41 +0200, Antoon Pardon wrote: Op 7/04/2022 om 16:08 schreef Joel Goldstick: On Thu, Apr 7, 2022 at 7:19 AM Antoon Pardon wrote: I am working with a list of data from which I have t

Re: Comparing sequences with range objects

2022-04-08 Thread Antoon Pardon
Op 8/04/2022 om 08:24 schreef Peter J. Holzer: On 2022-04-07 17:16:41 +0200, Antoon Pardon wrote: Op 7/04/2022 om 16:08 schreef Joel Goldstick: On Thu, Apr 7, 2022 at 7:19 AM Antoon Pardon wrote: I am working with a list of data from which I have to weed out duplicates. At the moment I ke

Re: Comparing sequences with range objects

2022-04-07 Thread Chris Angelico
On Fri, 8 Apr 2022 at 16:26, Peter J. Holzer wrote: > Unless you have a unique immutable identifier that's enforced by > some authority (like a social security number[1]), I don't think there > is a chance to do that reliably in a program (although with enough data, > a heuristic may be good enoug

Re: Comparing sequences with range objects

2022-04-07 Thread Peter J. Holzer
On 2022-04-07 17:16:41 +0200, Antoon Pardon wrote: > Op 7/04/2022 om 16:08 schreef Joel Goldstick: > > On Thu, Apr 7, 2022 at 7:19 AM Antoon Pardon wrote: > > > I am working with a list of data from which I have to weed out duplicates. > > > At the moment I keep for each entry a container with the

Re: Comparing sequences with range objects

2022-04-07 Thread MRAB
On 2022-04-07 16:16, Antoon Pardon wrote: Op 7/04/2022 om 16:08 schreef Joel Goldstick: On Thu, Apr 7, 2022 at 7:19 AM Antoon Pardon wrote: I am working with a list of data from which I have to weed out duplicates. At the moment I keep for each entry a container with the other entries that are

Re: Comparing sequences with range objects

2022-04-07 Thread Dieter Maurer
Antoon Pardon wrote at 2022-4-7 17:16 +0200: > ... >Sorry I wasn't clear. The data contains information about persons. But not >all records need to be complete. So a person can occur multiple times in >the list, while the records are all different because they are missing >different bits. > >So all

Re: Comparing sequences with range objects

2022-04-07 Thread Antoon Pardon
Op 7/04/2022 om 16:08 schreef Joel Goldstick: On Thu, Apr 7, 2022 at 7:19 AM Antoon Pardon wrote: I am working with a list of data from which I have to weed out duplicates. At the moment I keep for each entry a container with the other entries that are still possible duplicates. The problem is

Re: Comparing sequences with range objects

2022-04-07 Thread Joel Goldstick
On Thu, Apr 7, 2022 at 7:19 AM Antoon Pardon wrote: > > I am working with a list of data from which I have to weed out duplicates. > At the moment I keep for each entry a container with the other entries > that are still possible duplicates. > > The problem is sometimes that is all the rest. I tho

Comparing sequences with range objects

2022-04-07 Thread Antoon Pardon
I am working with a list of data from which I have to weed out duplicates. At the moment I keep for each entry a container with the other entries that are still possible duplicates. The problem is sometimes that is all the rest. I thought to use a range object for these cases. Unfortunatly I some