new sorting algorithm

2022-05-01 Thread Nas Bayedil
*Dear, Sir/Madam*


Let me first tell you briefly who we are and where we are from, what we do.

My name is Nas (full name Nasipa Bayedil) from Kazakhstan.

In December 2020, we registered a company online in Dover, Delaware, the
United States, because all major corporations are based in the United
States. Direction of the company: research and development technologies of
sorting and searching in databases and in Big Data. Our research and
developments should be of interest to a rapidly growing tech market.

My father Nurgali has a mathematics education, he has been interested in
mathematics' and physics all his life, now he is retired and continues to
study his favorite mathematics, when he became a pensioner to this love an
interest in programming was added. And this new interest in programming led
him to invent his own method of developing sorting algorithms.

*In a nutshell about what we are doing.*

We have developed several variants of sorting algorithms based on new ideas
and improvements to well-known ideas.

Our development is carried out in the application of the idea of the
article Peter McIlroy's 1993 paper "Optimistic Sorting and Information
Theoretic Complexity", which was implemented by *Tim Peters TimSort in a
sorting algorithm in 2002 for use in Python.*

The variant implemented by Tim Peters is improved by us as it does not take
full advantage of Peter McIlroy's idea.

This has been achieved through the development of new methods:

  1. data recognition

  2. data fusion



We also used the research work Sorting N-Elements Using Natural Order: A
New Adaptive Sorting Approach** but added some new ideas of our own. As a
result, we got a hybrid NDsort algorithm, which uses the above particular
algorithms.



Existing sorting algorithms that are used in practice, very fast, I will
not list their advantages; for improvement let's just indicate that some of
them start to work slowly on certain types of data. The author tried to
deal with this from a general, *mathematical position*, and not from the
point of view of programming techniques, *and achieved the goal*.

Fragments of the general theory of sorting by comparison have been
developed, *several facts have been discovered that are subject to
patenting *and which allow developing various variants of sorting
algorithms.

We believe that using this method to develop completely new, fast
algorithms, approaching the speed of the famous *QuickSort*, the speed of
which cannot be surpassed, but its drawback can be circumvented, in the
sense of stack overflow, on some data.

Additionally our general sorting algorithm can be applied to design data
sorting chips. This is, of course, the business of specialists in this
field, but we believe that our algorithm opens up a new approach to this
problem. The fact is that there are currently two ways to develop data
sorting chips. The first is that data is entered in parallel; the second is
that data is entered sequentially. Our sorting algorithm allows
parallel-sequential data input, which would speed up the entire sorting
process.

I hope our research will pique your interest. The distinctive features of
our sorting algorithms are the minimization of the number of comparisons, a
new approach to the sorting problem, the maximum use of common sense, and
the use of a principle from philosophy "Occam's razor".

We use C# (Visual Studio 2019 demo version) for research and write
fragments of algorithms. We would like to ask for help from Python.org
engineers in implementation of our idea for the Python language together.

We will be glad to hear from you any comments and questions about our idea.





Kind regards,

Nas Bayedil



Twitter: @NasBayedil

Website: www.bazony.com.kz



**June 2010 Journal of Computer Science 6(2) Project: Algorithm Analysis,

Authors: Shamim Akhter, International University of Business Agriculture
and Technology M. Tanveer Hasan.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-01 Thread Marco Sulla
Something like this is OK?

import os

def tail(f):
chunk_size = 100
size = os.stat(f.fileno()).st_size

positions = iter(range(size, -1, -chunk_size))
next(positions)

chunk_line_pos = -1
pos = 0

for pos in positions:
f.seek(pos)
chars = f.read(chunk_size)
chunk_line_pos = chars.rfind(b"\n")

if chunk_line_pos != -1:
break

if chunk_line_pos == -1:
nbytes = pos
pos = 0
f.seek(pos)
chars = f.read(nbytes)
chunk_line_pos = chars.rfind(b"\n")

if chunk_line_pos == -1:
line_pos = pos
else:
line_pos = pos + chunk_line_pos + 1

f.seek(line_pos)

return f.readline()

This is simply for one line and for utf8.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: new sorting algorithm

2022-05-01 Thread Marco Sulla
I suppose you should write to python-...@python.org , or in
https://discuss.python.org/ under the section Core development
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: new sorting algorithm

2022-05-01 Thread Dan Stromberg
This probably should start out as a module on Pypi.

Is the sorting stable? Python guarantees that.

On Sun, May 1, 2022 at 8:53 AM Nas Bayedil  wrote:

> *Dear, Sir/Madam*
>
>
> Let me first tell you briefly who we are and where we are from, what we do.
>
> My name is Nas (full name Nasipa Bayedil) from Kazakhstan.
>
> In December 2020, we registered a company online in Dover, Delaware, the
> United States, because all major corporations are based in the United
> States. Direction of the company: research and development technologies of
> sorting and searching in databases and in Big Data. Our research and
> developments should be of interest to a rapidly growing tech market.
>
> My father Nurgali has a mathematics education, he has been interested in
> mathematics' and physics all his life, now he is retired and continues to
> study his favorite mathematics, when he became a pensioner to this love an
> interest in programming was added. And this new interest in programming led
> him to invent his own method of developing sorting algorithms.
>
> *In a nutshell about what we are doing.*
>
> We have developed several variants of sorting algorithms based on new ideas
> and improvements to well-known ideas.
>
> Our development is carried out in the application of the idea of the
> article Peter McIlroy's 1993 paper "Optimistic Sorting and Information
> Theoretic Complexity", which was implemented by *Tim Peters TimSort in a
> sorting algorithm in 2002 for use in Python.*
>
> The variant implemented by Tim Peters is improved by us as it does not take
> full advantage of Peter McIlroy's idea.
>
> This has been achieved through the development of new methods:
>
>   1. data recognition
>
>   2. data fusion
>
>
>
> We also used the research work Sorting N-Elements Using Natural Order: A
> New Adaptive Sorting Approach** but added some new ideas of our own. As a
> result, we got a hybrid NDsort algorithm, which uses the above particular
> algorithms.
>
>
>
> Existing sorting algorithms that are used in practice, very fast, I will
> not list their advantages; for improvement let's just indicate that some of
> them start to work slowly on certain types of data. The author tried to
> deal with this from a general, *mathematical position*, and not from the
> point of view of programming techniques, *and achieved the goal*.
>
> Fragments of the general theory of sorting by comparison have been
> developed, *several facts have been discovered that are subject to
> patenting *and which allow developing various variants of sorting
> algorithms.
>
> We believe that using this method to develop completely new, fast
> algorithms, approaching the speed of the famous *QuickSort*, the speed of
> which cannot be surpassed, but its drawback can be circumvented, in the
> sense of stack overflow, on some data.
>
> Additionally our general sorting algorithm can be applied to design data
> sorting chips. This is, of course, the business of specialists in this
> field, but we believe that our algorithm opens up a new approach to this
> problem. The fact is that there are currently two ways to develop data
> sorting chips. The first is that data is entered in parallel; the second is
> that data is entered sequentially. Our sorting algorithm allows
> parallel-sequential data input, which would speed up the entire sorting
> process.
>
> I hope our research will pique your interest. The distinctive features of
> our sorting algorithms are the minimization of the number of comparisons, a
> new approach to the sorting problem, the maximum use of common sense, and
> the use of a principle from philosophy "Occam's razor".
>
> We use C# (Visual Studio 2019 demo version) for research and write
> fragments of algorithms. We would like to ask for help from Python.org
> engineers in implementation of our idea for the Python language together.
>
> We will be glad to hear from you any comments and questions about our idea.
>
>
>
>
>
> Kind regards,
>
> Nas Bayedil
>
>
>
> Twitter: @NasBayedil
>
> Website: www.bazony.com.kz
>
>
>
> **June 2010 Journal of Computer Science 6(2) Project: Algorithm Analysis,
>
> Authors: Shamim Akhter, International University of Business Agriculture
> and Technology M. Tanveer Hasan.
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: new sorting algorithm

2022-05-01 Thread Chris Angelico
On Mon, 2 May 2022 at 01:53, Nas Bayedil  wrote:
> We believe that using this method to develop completely new, fast
> algorithms, approaching the speed of the famous *QuickSort*, the speed of
> which cannot be surpassed, but its drawback can be circumvented, in the
> sense of stack overflow, on some data.

Hmm, actually TimSort *does* exceed the speed of quicksort for a lot
of real-world data. For instance, if you take a large sorted list,
append a handful of (unsorted) items to it, and then sort the list,
TimSort can take advantage of the fact that the bulk of the list is
sorted. It ends up significantly faster than re-sorting the entire
list.

How well does your new variant handle this case?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-01 Thread Cameron Simpson
On 01May2022 18:55, Marco Sulla  wrote:
>Something like this is OK?
[...]
>def tail(f):
>chunk_size = 100
>size = os.stat(f.fileno()).st_size

I think you want os.fstat().

>positions = iter(range(size, -1, -chunk_size))
>next(positions)

I was wondering about the iter, but this makes sense. Alternatively you 
could put a range check in the for-loop.

>chunk_line_pos = -1
>pos = 0
>
>for pos in positions:
>f.seek(pos)
>chars = f.read(chunk_size)
>chunk_line_pos = chars.rfind(b"\n")
>
>if chunk_line_pos != -1:
>break

Normal text file _end_ in a newline. I'd expect this to stop immediately 
at the end of the file.

>if chunk_line_pos == -1:
>nbytes = pos
>pos = 0
>f.seek(pos)
>chars = f.read(nbytes)
>chunk_line_pos = chars.rfind(b"\n")

I presume this is because unless you're very lucky, 0 will not be a 
position in the range(). I'd be inclined to avoid duplicating this code 
and special case and instead maybe make the range unbounded and do 
something like this:

if pos < 0:
pos = 0
... seek/read/etc ...
if pos == 0:
break

around the for-loop body.

>if chunk_line_pos == -1:
>line_pos = pos
>else:
>line_pos = pos + chunk_line_pos + 1
>f.seek(line_pos)
>return f.readline()
>
>This is simply for one line and for utf8.

And anything else where a newline is just an ASCII newline byte (10) and 
can't be mistaken otherwise. So also ASCII and all the ISO8859-x single 
byte encodings. But as Chris has mentioned, not for other encodings.

Seems sane. I haven't tried to run it.

Cheers,
Cameron Simpson 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-01 Thread Dan Stromberg
On Sun, May 1, 2022 at 3:19 PM Cameron Simpson  wrote:

> On 01May2022 18:55, Marco Sulla  wrote:
> >Something like this is OK?
>

Scanning backward for a byte == 10 in ASCII or ISO-8859 seems fine.

But what about Unicode?  Are all 10 bytes newlines in Unicode encodings?

If not, and you have a huge file to reverse, it might be better to use a
temporary file.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: new sorting algorithm

2022-05-01 Thread Dan Stromberg
On Sun, May 1, 2022 at 1:44 PM Chris Angelico  wrote:

> On Mon, 2 May 2022 at 06:43, Dan Stromberg  wrote:
> > On Sun, May 1, 2022 at 11:10 AM Chris Angelico  wrote:
> >>
> >> On Mon, 2 May 2022 at 01:53, Nas Bayedil  wrote:
> >> > We believe that using this method to develop completely new, fast
> >> > algorithms, approaching the speed of the famous *QuickSort*, the
> speed of
> >> > which cannot be surpassed, but its drawback can be circumvented, in
> the
> >> > sense of stack overflow, on some data.
> >>
> >> Hmm, actually TimSort *does* exceed the speed of quicksort for a lot
> >> of real-world data. For instance, if you take a large sorted list,
> >> append a handful of (unsorted) items to it, and then sort the list,
> >> TimSort can take advantage of the fact that the bulk of the list is
> >> sorted. It ends up significantly faster than re-sorting the entire
> >> list.
> >
> >
> > In fact, Timsort is O(n) for already-sorted data, while many quicksorts
> are O(n^2) for already-sorted data.
> >
> > Quicksort can be salvaged by using a median-of-3 partitioning, but it's
> still O(n^2) in the (less common) worst case.
> >
>
> This is true, but purely sorted data isn't a very practical case. The
> case of mostly-sorted data IS practical, though, so it's a quite big
> win that it can be close to O(n), and still faster than inserting each
> item individually.
>

You seem to be of the impression that nearly-sorted data isn't an uphill
battle with a straightforward quicksort.

I'm having a hard time convincing myself of that.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-01 Thread Chris Angelico
On Mon, 2 May 2022 at 09:19, Dan Stromberg  wrote:
>
> On Sun, May 1, 2022 at 3:19 PM Cameron Simpson  wrote:
>
> > On 01May2022 18:55, Marco Sulla  wrote:
> > >Something like this is OK?
> >
>
> Scanning backward for a byte == 10 in ASCII or ISO-8859 seems fine.
>
> But what about Unicode?  Are all 10 bytes newlines in Unicode encodings?

Most absolutely not. "Unicode" isn't an encoding, but of the Unicode
Transformation Formats and Universal Character Set encodings, most
don't make that guarantee:

* UTF-8 does, as mentioned. It sacrifices some efficiency and
consistency for a guarantee that ASCII characters are represented by
ASCII bytes, and ASCII bytes only ever represent ASCII characters.
* UCS-2 and UTF-16 will both represent BMP characters with two bytes.
Any character U+xx0A or U+0Axx will include an 0x0A in its
representation.
* UTF-16 will also encode anything U+000xxx0A with an 0x0A. (And I
don't think any codepoints have been allocated that would trigger
this, but UTF-16 can also use 0x0A in the high surrogate.)
* UTF-32 and UCS-4 will use 0x0A for any character U+xx0A, U+0Axx, and
U+A (though that plane has no characters on it either)

So, of all the available Unicode standard encodings, only UTF-8 makes
this guarantee.

Of course, if you look at documents available on the internet, UTF-8
the encoding used by the vast majority of them (especially if you
include seven-bit files, which can equally be considered ASCII,
ISO-8859-x, and UTF-8), so while it might only be one encoding out of
many, it's probably the most important :)

In general, you can *only* make this parsing assumption IF you know
for sure that your file's encoding is UTF-8, ISO-8859-x, some OEM
eight-bit encoding (eg Windows-125x), or one of a handful of other
compatible encodings. But it probably will be.

> If not, and you have a huge file to reverse, it might be better to use a
> temporary file.

Yeah, or an in-memory deque if you know how many lines you want.
Either way, you can read the file forwards, guaranteeing correct
decoding even of a shifted character set (where a byte value can
change in meaning based on arbitrarily distant context).

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: new sorting algorithm

2022-05-01 Thread Chris Angelico
On Mon, 2 May 2022 at 09:20, Dan Stromberg  wrote:
>
>
> On Sun, May 1, 2022 at 1:44 PM Chris Angelico  wrote:
>>
>> On Mon, 2 May 2022 at 06:43, Dan Stromberg  wrote:
>> > On Sun, May 1, 2022 at 11:10 AM Chris Angelico  wrote:
>> >>
>> >> On Mon, 2 May 2022 at 01:53, Nas Bayedil  wrote:
>> >> > We believe that using this method to develop completely new, fast
>> >> > algorithms, approaching the speed of the famous *QuickSort*, the speed 
>> >> > of
>> >> > which cannot be surpassed, but its drawback can be circumvented, in the
>> >> > sense of stack overflow, on some data.
>> >>
>> >> Hmm, actually TimSort *does* exceed the speed of quicksort for a lot
>> >> of real-world data. For instance, if you take a large sorted list,
>> >> append a handful of (unsorted) items to it, and then sort the list,
>> >> TimSort can take advantage of the fact that the bulk of the list is
>> >> sorted. It ends up significantly faster than re-sorting the entire
>> >> list.
>> >
>> >
>> > In fact, Timsort is O(n) for already-sorted data, while many quicksorts 
>> > are O(n^2) for already-sorted data.
>> >
>> > Quicksort can be salvaged by using a median-of-3 partitioning, but it's 
>> > still O(n^2) in the (less common) worst case.
>> >
>>
>> This is true, but purely sorted data isn't a very practical case. The
>> case of mostly-sorted data IS practical, though, so it's a quite big
>> win that it can be close to O(n), and still faster than inserting each
>> item individually.
>
>
> You seem to be of the impression that nearly-sorted data isn't an uphill 
> battle with a straightforward quicksort.
>
> I'm having a hard time convincing myself of that.
>

The median-of-three partitioning technique makes that work reasonably
well, so it won't be pathologically slow. It's hardly Quicksort's best
feature, but it could easily be a lot worse. I'd have to check, but I
think it still manages to be O(n log n). Merge sort, of course, is a
lot more consistent, but the asymptotic cost is still broadly the
same.

But Timsort manages to be close to O(n) for sorted data, reversed
data, nearly-sorted or nearly-reversed data, etc. Makes it very handy
for jobs like "click a heading to sort by it", where you might add
multiple sort keys.

(Plus, Python's implementation has some cool tricks for small
collections that make it quite efficient.)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-01 Thread Cameron Simpson
On 01May2022 23:30, Stefan Ram  wrote:
>Dan Stromberg  writes:
>>But what about Unicode?  Are all 10 bytes newlines in Unicode encodings?
>  It seems in UTF-8, when a value is above U+007F, it will be
>  encoded with bytes that always have their high bit set.

Aye. Design festure enabling easy resync-to-char-boundary at an 
arbitrary point in the file.

>  But Unicode has NEL "Next Line" U+0085 and other values that
>  conforming applications should recognize as line terminators.

I disagree. Maybe for printing things. But textual data records? I would 
hope to end them with NL, and only NL (code 10).

Cheers,
Cameron Simpson 
-- 
https://mail.python.org/mailman/listinfo/python-list


Difference in Setup Between Windows 10 Running Python 3.9 and Windows 11 Running Python 3.10

2022-05-01 Thread Brent Hunter
Hello,



I was recently running a Windows 10 machine Python 3.9.  I simply created a 
batch file titled "Start-AIG.bat" which simply contained the following: 
"pythonw AIG.py".  It started a python program titled "AIG.py" and the Python 
dialog box was displayed on my screen, running all day and night.  I set up 
Windows to run this batch file upon startup and it worked fine.  I remember 
having to do a bunch of research before I learned that I needed to put "pythonw 
AIG.py" in the batch file as opposed to "python AIG.py".



However, my old windows 10 desktop crashed and I have a new Windows 11 machine. 
 I installed the latest stable version of Python, which is 3.10.  Now, when I 
try to execute the same batch file, it doesn't launch the app.  It could be 
because I'm using Windows 11 or could be because I now have a newer version of 
Python.



Does anyone have any ideas what I should do to get the Python script running on 
my new machine?



Thank you!



Brent

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Difference in Setup Between Windows 10 Running Python 3.9 and Windows 11 Running Python 3.10

2022-05-01 Thread MRAB

On 2022-05-02 02:56, Brent Hunter wrote:

Hello,

I was recently running a Windows 10 machine Python 3.9.  I simply created a batch file titled "Start-AIG.bat" which 
simply contained the following: "pythonw AIG.py".  It started a python program titled "AIG.py" and the Python 
dialog box was displayed on my screen, running all day and night.  I set up Windows to run this batch file upon startup and it 
worked fine.  I remember having to do a bunch of research before I learned that I needed to put "pythonw AIG.py" in the 
batch file as opposed to "python AIG.py".

However, my old windows 10 desktop crashed and I have a new Windows 11 machine. 
 I installed the latest stable version of Python, which is 3.10.  Now, when I 
try to execute the same batch file, it doesn't launch the app.  It could be 
because I'm using Windows 11 or could be because I now have a newer version of 
Python.

Does anyone have any ideas what I should do to get the Python script running on 
my new machine?

The problem is probably that the Python folder isn't in Windows' search 
path, but the recommended thing to do nowadays on Windows is to use the 
Python Launcher "py.exe" (or "pyw.exe" for no console window) instead:


pyw AIG.py
--
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-01 Thread Chris Angelico
On Mon, 2 May 2022 at 11:54, Cameron Simpson  wrote:
>
> On 01May2022 23:30, Stefan Ram  wrote:
> >Dan Stromberg  writes:
> >>But what about Unicode?  Are all 10 bytes newlines in Unicode encodings?
> >  It seems in UTF-8, when a value is above U+007F, it will be
> >  encoded with bytes that always have their high bit set.
>
> Aye. Design festure enabling easy resync-to-char-boundary at an
> arbitrary point in the file.

Yep - and there's also a distinction between "first byte of multi-byte
character" and "continuation byte, keep scanning backwards". So you're
guaranteed to be able to resynchronize.

(If you know whether it's little-endian or big-endian, UTF-16 can also
resync like that, since "high surrogate" and "low surrogate" look
different.)

> >  But Unicode has NEL "Next Line" U+0085 and other values that
> >  conforming applications should recognize as line terminators.
>
> I disagree. Maybe for printing things. But textual data records? I would
> hope to end them with NL, and only NL (code 10).
>

I'm with you on that - textual data records should end with 0x0A only.
But if there are text entities in there, they should be allowed to
include any Unicode characters, potentially including other types of
whitespace.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list