Op 11/04/2022 om 02:31 schreef Dan Stromberg:
It sounds a little like you're looking for interval arithmetic.
Maybe https://pypi.org/project/python-intervals/1.5.3/ ?
Not completely but it suggested an idea to explore.
--
Antoon.
--
https://mail.python.org/mailman/listinfo/python-list
Op 11/04/2022 om 02:01 schreef duncan smith:
On 10/04/2022 21:20, Antoon Pardon wrote:
Op 9/04/2022 om 02:01 schreef duncan smith:
On 08/04/2022 22:08, Antoon Pardon wrote:
Well my first thought is that a bitset makes it less obvious to
calulate
the size of the set or to iterate over it
Two more things:
1) There are also modules that do interval arithmetic with reals, not just
integers. EG: https://pyinterval.readthedocs.io/en/latest/
2) If you don't mind a small amount of inaccuracy you can often get things
down to less than one bit per element for presence/absence, using a bloo
On 2022-04-10 at 22:20:33 +0200,
Antoon Pardon wrote:
>
>
> Op 9/04/2022 om 02:01 schreef duncan smith:
> > On 08/04/2022 22:08, Antoon Pardon wrote:
> > >
> > > Well my first thought is that a bitset makes it less obvious to calulate
> > > the size of the set or to iterate over its elements.
It sounds a little like you're looking for interval arithmetic.
Maybe https://pypi.org/project/python-intervals/1.5.3/ ?
On Thu, Apr 7, 2022 at 4:19 AM Antoon Pardon wrote:
> I am working with a list of data from which I have to weed out duplicates.
> At the moment I keep for each entry a conta
On 10/04/2022 21:20, Antoon Pardon wrote:
Op 9/04/2022 om 02:01 schreef duncan smith:
On 08/04/2022 22:08, Antoon Pardon wrote:
Well my first thought is that a bitset makes it less obvious to calulate
the size of the set or to iterate over its elements. But it is an idea
worth exploring.
Op 9/04/2022 om 02:01 schreef duncan smith:
On 08/04/2022 22:08, Antoon Pardon wrote:
Well my first thought is that a bitset makes it less obvious to calulate
the size of the set or to iterate over its elements. But it is an idea
worth exploring.
def popcount(n):
"""
Returns the
On 09/04/2022 13:14, Christian Gollwitzer wrote:
Am 08.04.22 um 09:21 schrieb Antoon Pardon:
The first is really hard. Not only may information be missing, no single
single piece of information is unique or immutable. Two people may have
the same name (I know about several other "Peter Holzer"
Am 08.04.22 um 09:21 schrieb Antoon Pardon:
The first is really hard. Not only may information be missing, no single
single piece of information is unique or immutable. Two people may have
the same name (I know about several other "Peter Holzer"s), a single
person might change their name (when I
On 08/04/2022 22:08, Antoon Pardon wrote:
Op 8/04/2022 om 16:28 schreef duncan smith:
On 08/04/2022 08:21, Antoon Pardon wrote:
Yes I know all that. That is why I keep a bucket of possible duplicates
per "identifying" field that is examined and use some heuristics at the
end of all the compar
Op 8/04/2022 om 16:28 schreef duncan smith:
On 08/04/2022 08:21, Antoon Pardon wrote:
Yes I know all that. That is why I keep a bucket of possible duplicates
per "identifying" field that is examined and use some heuristics at the
end of all the comparing instead of starting to weed out the du
On 08/04/2022 08:21, Antoon Pardon wrote:
Op 8/04/2022 om 08:24 schreef Peter J. Holzer:
On 2022-04-07 17:16:41 +0200, Antoon Pardon wrote:
Op 7/04/2022 om 16:08 schreef Joel Goldstick:
On Thu, Apr 7, 2022 at 7:19 AM Antoon Pardon
wrote:
I am working with a list of data from which I have t
Op 8/04/2022 om 08:24 schreef Peter J. Holzer:
On 2022-04-07 17:16:41 +0200, Antoon Pardon wrote:
Op 7/04/2022 om 16:08 schreef Joel Goldstick:
On Thu, Apr 7, 2022 at 7:19 AM Antoon Pardon wrote:
I am working with a list of data from which I have to weed out duplicates.
At the moment I ke
On Fri, 8 Apr 2022 at 16:26, Peter J. Holzer wrote:
> Unless you have a unique immutable identifier that's enforced by
> some authority (like a social security number[1]), I don't think there
> is a chance to do that reliably in a program (although with enough data,
> a heuristic may be good enoug
On 2022-04-07 17:16:41 +0200, Antoon Pardon wrote:
> Op 7/04/2022 om 16:08 schreef Joel Goldstick:
> > On Thu, Apr 7, 2022 at 7:19 AM Antoon Pardon wrote:
> > > I am working with a list of data from which I have to weed out duplicates.
> > > At the moment I keep for each entry a container with the
On 2022-04-07 16:16, Antoon Pardon wrote:
Op 7/04/2022 om 16:08 schreef Joel Goldstick:
On Thu, Apr 7, 2022 at 7:19 AM Antoon Pardon wrote:
I am working with a list of data from which I have to weed out duplicates.
At the moment I keep for each entry a container with the other entries
that are
Antoon Pardon wrote at 2022-4-7 17:16 +0200:
> ...
>Sorry I wasn't clear. The data contains information about persons. But not
>all records need to be complete. So a person can occur multiple times in
>the list, while the records are all different because they are missing
>different bits.
>
>So all
Op 7/04/2022 om 16:08 schreef Joel Goldstick:
On Thu, Apr 7, 2022 at 7:19 AM Antoon Pardon wrote:
I am working with a list of data from which I have to weed out duplicates.
At the moment I keep for each entry a container with the other entries
that are still possible duplicates.
The problem is
On Thu, Apr 7, 2022 at 7:19 AM Antoon Pardon wrote:
>
> I am working with a list of data from which I have to weed out duplicates.
> At the moment I keep for each entry a container with the other entries
> that are still possible duplicates.
>
> The problem is sometimes that is all the rest. I tho
I am working with a list of data from which I have to weed out duplicates.
At the moment I keep for each entry a container with the other entries
that are still possible duplicates.
The problem is sometimes that is all the rest. I thought to use a range
object for these cases. Unfortunatly I some
20 matches
Mail list logo