Comparing sequences with range objects

2022-04-07 Thread Antoon Pardon

I am working with a list of data from which I have to weed out duplicates.
At the moment I keep for each entry a container with the other entries
that are still possible duplicates.

The problem is sometimes that is all the rest. I thought to use a range
object for these cases. Unfortunatly I sometimes want to sort things
and a range object is not comparable with a list or a tuple.

So I have a list of items where each item is itself a list or range object.
I of course could sort this by using list as a key function but that
would defeat the purpose of using range objects for these cases.

So what would be a relatively easy way to get the same result without wasting
too much memory on entries that haven't any weeding done on them.

--
Antoon Pardon.
--
https://mail.python.org/mailman/listinfo/python-list


Re: Comparing sequences with range objects

2022-04-07 Thread Joel Goldstick
On Thu, Apr 7, 2022 at 7:19 AM Antoon Pardon  wrote:
>
> I am working with a list of data from which I have to weed out duplicates.
> At the moment I keep for each entry a container with the other entries
> that are still possible duplicates.
>
> The problem is sometimes that is all the rest. I thought to use a range
> object for these cases. Unfortunatly I sometimes want to sort things
> and a range object is not comparable with a list or a tuple.
>
> So I have a list of items where each item is itself a list or range object.
> I of course could sort this by using list as a key function but that
> would defeat the purpose of using range objects for these cases.
>
> So what would be a relatively easy way to get the same result without wasting
> too much memory on entries that haven't any weeding done on them.
>
> --
> Antoon Pardon.
> --
> https://mail.python.org/mailman/listinfo/python-list

I'm not sure I understand what you are trying to do, but if your data
has no order, you can use set to remove the duplicates

-- 
Joel Goldstick
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Sharing part of a function

2022-04-07 Thread Cecil Westerhof via Python-list
Cecil Westerhof  writes:

> To show why it is often easy, but wrong to use recursive functions I
> wrote the following two Fibonacci functions:
> def fib_ite(n):
> if not type(n) is int:
> raise TypeError(f'Need an integer ({n})')
> if n < 0:
> raise ValueError(f'Should not be negative ({n})')
>
> if n in [0, 1]:
> return n
>
> # a is previous fibonacy (starts with fib(0))
> # b is current fibonaccy (starts with fib(1))
> a, b = 0, 1
> # range goes to n - 1, so after loop b contains fib(n)
> for i in range(1, n):
> a, b = b, a + b
> return b
>
>
> def fib_rec(n):
> if not type(n) is int:
> raise TypeError(f'Need an integer ({n})')
> if n < 0:
> raise ValueError(f'Should not be negative ({n})')
>
> if n in [0, 1]:
> return n
>
> return fib_rec(n - 2) + fib_rec(n - 1)
>
> The first eight lines are the same. And I did change the description
> of the errors, which had to be done in both functions. What would be
> the best way to circumvent this?
> Two options are:
> - Call an init function.
> - Call the 'master' function with a lambda.
>
> What is the preferable way, or is there a better way?

I have chosen this implementation with inner functions:
def fibonacci(n, implementation = 'iterative'):
def ite(n):
# a is previous fibonacy (starts with fib(0))
# b is current fibonaccy (starts with fib(1))
a, b = 0, 1
# range goes to n - 1, so after loop b contains fib(n)
for i in range(1, n):
a, b = b, a + b
return b


def rec(n):
if n in [0, 1]:
return n

return rec(n - 2) + rec(n - 1)


if not type(n) is int:
raise TypeError(f'Need an integer ({n})')
if n < 0:
raise ValueError(f'Should not be negative ({n})')

if n in [0, 1]:
return n

if implementation == 'iterative':
return ite(n)
elif implementation == 'recursive':
return rec(n)
raise ValueError(f'Got a wrong function implementation type: {type}')

-- 
Cecil Westerhof
Senior Software Engineer
LinkedIn: http://www.linkedin.com/in/cecilwesterhof
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Comparing sequences with range objects

2022-04-07 Thread Antoon Pardon

Op 7/04/2022 om 16:08 schreef Joel Goldstick:

On Thu, Apr 7, 2022 at 7:19 AM Antoon Pardon  wrote:

I am working with a list of data from which I have to weed out duplicates.
At the moment I keep for each entry a container with the other entries
that are still possible duplicates.

The problem is sometimes that is all the rest. I thought to use a range
object for these cases. Unfortunatly I sometimes want to sort things
and a range object is not comparable with a list or a tuple.

So I have a list of items where each item is itself a list or range object.
I of course could sort this by using list as a key function but that
would defeat the purpose of using range objects for these cases.

So what would be a relatively easy way to get the same result without wasting
too much memory on entries that haven't any weeding done on them.

--
Antoon Pardon.
--
https://mail.python.org/mailman/listinfo/python-list

I'm not sure I understand what you are trying to do, but if your data
has no order, you can use set to remove the duplicates


Sorry I wasn't clear. The data contains information about persons. But not
all records need to be complete. So a person can occur multiple times in
the list, while the records are all different because they are missing
different bits.

So all records with the same firstname can be duplicates. But if I have
a record in which the firstname is missing, it can at that point be
a duplicate of all other records.

--
Antoon Pardon

--
https://mail.python.org/mailman/listinfo/python-list


Re: Comparing sequences with range objects

2022-04-07 Thread Dieter Maurer
Antoon Pardon wrote at 2022-4-7 17:16 +0200:
> ...
>Sorry I wasn't clear. The data contains information about persons. But not
>all records need to be complete. So a person can occur multiple times in
>the list, while the records are all different because they are missing
>different bits.
>
>So all records with the same firstname can be duplicates. But if I have
>a record in which the firstname is missing, it can at that point be
>a duplicate of all other records.

The description is still not clear enough. Especially, it does
not show where `range` objects come into play.


Answering on a more abstract level:
Apparently, you want to sort a list the elements of which can
be other lists or range objects.

List objects are ordered lexicographically, i.e. for lists l1, l2:
l1 <= l2 iff not l1 or (l2 and l1[0] <= l2[0] and l1[1:] <= l2[1:]).
If you want to sort a list containing list elements are compared using
this order.

For your case, you would need to use a `key` parameter for `sort` that
implements this order for `range` objects, too.
(Note that Python provides a function which transforms an order
definition into an appropriate `key` function).
A corresponding `sort` call may expand your range objects completely.

An alternative might be to not expand `range` objects but to put them
all at the start or end of the sorted list.
Of course, this would imply that their expansion does not influence
their order in the list -- which may or may not be acceptable
(depending on your use case).
If it is acceptable, it is likely possible to not put range objects
into the list to be sorted in the first place.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Comparing sequences with range objects

2022-04-07 Thread MRAB

On 2022-04-07 16:16, Antoon Pardon wrote:

Op 7/04/2022 om 16:08 schreef Joel Goldstick:

On Thu, Apr 7, 2022 at 7:19 AM Antoon Pardon  wrote:

I am working with a list of data from which I have to weed out duplicates.
At the moment I keep for each entry a container with the other entries
that are still possible duplicates.

The problem is sometimes that is all the rest. I thought to use a range
object for these cases. Unfortunatly I sometimes want to sort things
and a range object is not comparable with a list or a tuple.

So I have a list of items where each item is itself a list or range object.
I of course could sort this by using list as a key function but that
would defeat the purpose of using range objects for these cases.

So what would be a relatively easy way to get the same result without wasting
too much memory on entries that haven't any weeding done on them.

--
Antoon Pardon.
--
https://mail.python.org/mailman/listinfo/python-list

I'm not sure I understand what you are trying to do, but if your data
has no order, you can use set to remove the duplicates


Sorry I wasn't clear. The data contains information about persons. But not
all records need to be complete. So a person can occur multiple times in
the list, while the records are all different because they are missing
different bits.

So all records with the same firstname can be duplicates. But if I have
a record in which the firstname is missing, it can at that point be
a duplicate of all other records.


This is how I'd approach it:

# Make a list of groups, where each group is a list of potential duplicates.
# Initially, all of the records are potential duplicates of each other.
records = [list_of_records]

# Split the groups into subgroups according to the first name.
new_records = []

for group in records:
subgroups = defaultdict(list)

for record in group:
subgroups[record['first_name']].append(record)

# Records without a first name could belong to any of the subgroups.
missing = subgroups.pop(None, [])

for record in missing:
for subgroup in subgroups.values():
subgroup.extend(missing)

new_records.extend(subgroups.values())

records = new_records

# Now repeat for the last name, etc.
--
https://mail.python.org/mailman/listinfo/python-list


Re: 'äÄöÖüÜ' in Unicode (utf-8)

2022-04-07 Thread Anssi Saari
Dennis Lee Bieber  writes:

> On Fri, 1 Apr 2022 03:59:32 +1100, Chris Angelico 
> declaimed the following:
>
>
>>That's jmf. Ignore him. He knows nothing about Unicode and is
>>determined to make everyone aware of that fact.
>>
>>He got blocked from the mailing list ages ago, and I don't think
>>anyone's regretted it.

>   Ah yes... Unfortunately, when gmane made the mirror read-only, I had to
> revert to comp.lang.python... and all the junk that gets in via that and
> Google Groups...

Hm. I just configured my news reader to send follow-ups to the mailing
list when that happened.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Comparing sequences with range objects

2022-04-07 Thread Peter J. Holzer
On 2022-04-07 17:16:41 +0200, Antoon Pardon wrote:
> Op 7/04/2022 om 16:08 schreef Joel Goldstick:
> > On Thu, Apr 7, 2022 at 7:19 AM Antoon Pardon  wrote:
> > > I am working with a list of data from which I have to weed out duplicates.
> > > At the moment I keep for each entry a container with the other entries
> > > that are still possible duplicates.
[...]
> Sorry I wasn't clear. The data contains information about persons. But not
> all records need to be complete. So a person can occur multiple times in
> the list, while the records are all different because they are missing
> different bits.
> 
> So all records with the same firstname can be duplicates. But if I have
> a record in which the firstname is missing, it can at that point be
> a duplicate of all other records.

There are two problems. The first one is how do you establish identity.
The second is how do you ween out identical objects. In your first mail
you only asked about the second, but that's easy.

The first is really hard. Not only may information be missing, no single
single piece of information is unique or immutable. Two people may have
the same name (I know about several other "Peter Holzer"s), a single
person might change their name (when I was younger I went by my middle
name - how would you know that "Peter Holzer" and "Hansi Holzer" are the
same person?), they will move (= change their address), change jobs,
etc. Unless you have a unique immutable identifier that's enforced by
some authority (like a social security number[1]), I don't think there
is a chance to do that reliably in a program (although with enough data,
a heuristic may be good enough).

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: dict.get_deep()

2022-04-07 Thread Peter J. Holzer
On 2022-04-03 23:17:04 +0200, Marco Sulla wrote:
> On Sun, 3 Apr 2022 at 21:46, Peter J. Holzer  wrote:
> > > > data.get_deep("users", 0, "address", "street", default="second star")
> >
> > Yep. Did that, too. Plus pass the final result through a function before
> > returning it.
> 
> I didn't understand. Have you added a func parameter?

Yes. Look at the code I posted (it's only 9 lines long).

> > I'm not sure whether I considered this when I wrote it, but a function
> > has the advantage of working with every class which can be indexed. A
> > method must be implemented on any class (so at least dict and list to be
> > useful).
> 
> You're right, but where to put it? I don't know if an iterableutil package
> exists. If included in the stdlib, I don't know where to put it. In
> collections maybe?

Yes, that seems like the least bad choice.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Comparing sequences with range objects

2022-04-07 Thread Chris Angelico
On Fri, 8 Apr 2022 at 16:26, Peter J. Holzer  wrote:
> Unless you have a unique immutable identifier that's enforced by
> some authority (like a social security number[1]), I don't think there
> is a chance to do that reliably in a program (although with enough data,
> a heuristic may be good enough).
>

Not sure what your footnote was supposed to be, but I'll offer two
useful footnotes to that:

[1] [a] Though they're not actually immutable either, just less
frequently changing

[1[ [b] Of course, that has privacy implications.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list