It is part of a larger project aiming at processing data according to a given algorithm Do you have any comments or any enhancing recommendations on the code?
Thanks. Le dim. 21 janv. 2024 à 18:28, Thomas Passin via Python-list < python-list@python.org> a écrit : > On 1/21/2024 11:54 AM, marc nicole wrote: > > Thanks for the reply, > > > > I think using a Pandas (or a Numpy) approach would optimize the > > execution of the program. > > > > Target cells could be up to 10% the size of the dataset, a good example > > to start with would have from 10 to 100 values. > > Thanks for the reformatted code. It's much easier to read and think about. > > For say 100 points, it doesn't seem that "optimization" would be much of > an issue. On my laptop machine and Python 3.12, your example takes > around 5 seconds to run and print(). OTOH if you think you will go to > much larger datasets, certainly execution time could become a factor. > > I would think that NumPy arrays and/or matrices would have good potential. > > Is this some kind of a cellular automaton, or an image filtering process? > > > Let me know your thoughts, here's a reproducible example which I > formatted: > > > > > > > > from numpy import random > > import pandas as pd > > import numpy as np > > import operator > > import math > > from collections import deque > > from queue import * > > from queue import Queue > > from itertools import product > > > > > > def select_target_values(dataframe, number_of_target_values): > > target_cells = [] > > for _ in range(number_of_target_values): > > row_x = random.randint(0, len(dataframe.columns) - 1) > > col_y = random.randint(0, len(dataframe) - 1) > > target_cells.append((row_x, col_y)) > > return target_cells > > > > > > def select_contours(target_cells): > > contour_coordinates = [(0, 1), (1, 0), (0, -1), (-1, 0)] > > contour_cells = [] > > for target_cell in target_cells: > > # random contour count for each cell > > contour_cells_count = random.randint(1, 4) > > try: > > contour_cells.append( > > [ > > tuple( > > map( > > lambda i, j: i + j, > > (target_cell[0], target_cell[1]), > > contour_coordinates[iteration_], > > ) > > ) > > for iteration_ in range(contour_cells_count) > > ] > > ) > > except IndexError: > > continue > > return contour_cells > > > > > > def create_zipf_distribution(): > > zipf_dist = random.zipf(2, size=(50, 5)).reshape((50, 5)) > > > > zipf_distribution_dataset = pd.DataFrame(zipf_dist).round(3) > > > > return zipf_distribution_dataset > > > > > > def apply_contours(target_cells, contour_cells): > > target_cells_with_contour = [] > > # create one single list of cells > > for idx, target_cell in enumerate(target_cells): > > target_cell_with_contour = [target_cell] > > target_cell_with_contour.extend(contour_cells[idx]) > > target_cells_with_contour.append(target_cell_with_contour) > > return target_cells_with_contour > > > > > > def create_possible_datasets(dataframe, target_cells_with_contour): > > all_datasets_final = [] > > dataframe_original = dataframe.copy() > > > > list_tuples_idx_cells_all_datasets = list( > > filter( > > lambda x: x, > > [list(tuples) for tuples in > > list(product(*target_cells_with_contour))], > > ) > > ) > > target_original_cells_coordinates = list( > > map( > > lambda x: x[0], > > [ > > target_and_contour_cell > > for target_and_contour_cell in target_cells_with_contour > > ], > > ) > > ) > > for dataset_index_values in list_tuples_idx_cells_all_datasets: > > all_datasets = [] > > for idx_cell in range(len(dataset_index_values)): > > dataframe_cpy = dataframe.copy() > > dataframe_cpy.iat[ > > target_original_cells_coordinates[idx_cell][1], > > target_original_cells_coordinates[idx_cell][0], > > ] = dataframe_original.iloc[ > > dataset_index_values[idx_cell][1], > > dataset_index_values[idx_cell][0] > > ] > > all_datasets.append(dataframe_cpy) > > all_datasets_final.append(all_datasets) > > return all_datasets_final > > > > > > def main(): > > zipf_dataset = create_zipf_distribution() > > > > target_cells = select_target_values(zipf_dataset, 5) > > print(target_cells) > > contour_cells = select_contours(target_cells) > > print(contour_cells) > > target_cells_with_contour = apply_contours(target_cells, > contour_cells) > > datasets = create_possible_datasets(zipf_dataset, > > target_cells_with_contour) > > print(datasets) > > > > > > main() > > > > Le dim. 21 janv. 2024 à 16:33, Thomas Passin via Python-list > > <python-list@python.org <mailto:python-list@python.org>> a écrit : > > > > On 1/21/2024 7:37 AM, marc nicole via Python-list wrote: > > > Hello, > > > > > > I have an initial dataframe with a random list of target cells > > (each cell > > > being identified with a couple (x,y)). > > > I want to yield four different dataframes each containing the > > value of one > > > of the contour (surrounding) cells of each specified target cell. > > > > > > the surrounding cells to consider for a specific target cell are > > : (x-1,y), > > > (x,y-1),(x+1,y);(x,y+1), specifically I randomly choose 1 to 4 > > cells from > > > these and consider for replacement to the target cell. > > > > > > I want to do that through a pandas-specific approach without > > having to > > > define the contour cells separately and then apply the changes on > the > > > dataframe > > > > 1. Why do you want a Pandas-specific approach? Many people would > > rather > > keep code independent of special libraries if possible; > > > > 2. How big can these collections of target cells be, roughly > speaking? > > The size could make a big difference in picking a design; > > > > 3. You really should work on formatting code for this list. Your > code > > below is very complex and would take a lot of work to reformat to the > > point where it is readable, especially with the nearly impenetrable > > arguments in some places. Probably all that is needed is to replace > > all > > tabs by (say) three spaces, and to make sure you intentionally break > > lines well before they might get word-wrapped. Here is one example I > > have reformatted (I hope I got this right): > > > > list_tuples_idx_cells_all_datasets = list(filter( > > lambda x: utils_tuple_list_not_contain_nan(x), > > [list(tuples) for tuples in list( > > itertools.product(*target_cells_with_contour)) > > ])) > > > > 4. As an aside, it doesn't look like you need to convert all those > > sequences and iterators to lists all over the place; > > > > > > > (but rather using an all in one approach): > > > for now I have written this example which I think is not Pandas > > specific: > > [snip] > > > > -- > > https://mail.python.org/mailman/listinfo/python-list > > <https://mail.python.org/mailman/listinfo/python-list> > > > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list