====== Optimal Sort for any number of 16-bit elements ======
by Mats Rosengren
====== General Discussion ======
The extension of the "[[optimal_sort_8-bit_elements|Optimal Sort]]" algorithm to "an unlimited number" (i.e. more then 255 elements) of 16 bit elements is straightforward using standard 6502 technique.
Consider first the loop using the Y register to sequentially access the elements starting with the second last and ending with the first in a sequence of up to 255 elements (note that the last access here is with 1 in the Y register, i.e. the "ZPADD" should point to the byte before the first element):
L1 DEY
BEQ L3
LDA (ZPADD),Y
"some code"
BRA L1
L3
To access a longer sequence one could use nested loops with the X register for the outer and the Y register for the inner loop as follows:
L1 TYA
BNE L2
TXA
BEQ L3
DEX
DEC ZPADD+1
L2 DEY
LDA (ZPADD),Y
"some code"
BRA L1
L3
If initially the registers X and Y are given the values x and y and the value y is added to high part of the pointer ZPADD pointing to the first element of the sequence to be sorted the loop will be run through y + 256 * x times starting with the second last and ending with the first in a sequence of 1 + y + 256 * x elements. And when the loop is finished the pointer ZPADD will again point to the first element of the sequence while the X and the Y registers will take the value zero. Note that as here the last access is with 0 in the Y register the "ZPADD" should indeed point to the first element, not to the byte before the first element
To work with 16 bit values one has to handle the 8 less significant bits and the 8 more significant bits separately. The comparison is then made such that first the 8 most significant bits are compared. The 8 less significant bits only need to be compared if the 8 more significant bits were equal. The complete routine is then as follows:
SORT CLC
LDA ZPADL+1 ;INITIALISE WORK4L
ADC NVALH
STA WORK4L
BCS LER ;BAD INPUT VALUES (TOO FAR TO BRANCH DIRECTLY)
LDA ZPADH+1 ;INITIALISE WORK4H
ADC NVALH
STA WORK4H
LER BCS L3B ;BAD INPUT VALUES
L0 LDY NVALL ;NUMBER OF DECREMENTS, LOW
LDX NVALH ;NUMBER OF DECREMENTS, HIGH
LDA WORK4L
STA ZPADL+1
LDA (ZPADL),Y ;LAST VALUE IN (WHAT IS LEFT OF) SEQUENCE TO BE SORTED
STA WORK3L ;SAVE LEAST SIGNIFICANT PART OF VALUE. WILL BE OVER-WRITTEN BY LARGEST NUMBER
LDA WORK4H
STA ZPADH+1
LDA (ZPADH),Y ;LAST VALUE IN (WHAT IS LEFT OF) SEQUENCE TO BE SORTED
STA WORK3H ;SAVE MOST SIGNIFICANT PART OF VALUE. WILL BE OVER-WRITTEN BY LARGEST NUMBER
BRA L2
L1 TYA ;START INNER LOOP
BNE L1A
TXA
BEQ L3 ;EXIT INNER LOOP
DEX
DEC ZPADL+1
DEC ZPADH+1
L1A DEY
LDA (ZPADH),Y
CMP WORK2H
BNE L1B
LDA (ZPADL),Y
CMP WORK2L
L1B BCC L1
L2 LDA (ZPADL),Y
STA WORK2L ;POTENTIALLY LARGEST VALUE, LEAST SIGNIFICANT PART
LDA (ZPADH),Y
STA WORK2H ;POTENTIALLY LARGEST VALUE, MOST SIGNIFICANT PART
STY WORK1L ;INDEX OF POTENTIALLY LARGEST VALUE, LOW
LDA ZPADL+1
STA WORK1LH ;INDEX HIGH OF POTENTIALLY LARGEST VALUE, LEAST SIGNIFICANT PART
LDA ZPADH+1
STA WORK1HH ;INDEX HIGH OF POTENTIALLY LARGEST VALUE, MOST SIGNIFICANT PART
BRA L1 ;END INNER LOOP
L3 LDY NVALL ;WHERE THE LARGEST VALUE SHALL BE PUT
LDA WORK4L
STA ZPADL+1
LDA WORK2L ;THE LARGEST VALUE, LEAST SIGNIFICANT PART
STA (ZPADL),Y
LDA WORK4H
STA ZPADH+1
LDA WORK2H ;THE LARGEST VALUE, MOST SIGNIFICANT PART
STA (ZPADH),Y
LDY WORK1L ;INDEX LOW OF FREE SPACE
LDA WORK1LH ;INDEX HIGH OF FREE SPACE, LEAST SIGNIFICANT PART
STA ZPADL+1
LDA WORK3L ;THE OVER-WRITTEN VALUE, LEAST SIGNIFICANT PART
STA (ZPADL),Y ;PUT THE OVER-WRITTEN VALUE, LEAST SIGNIFICANT PART, IN THE FREE SPACE
LDA WORK1HH ;INDEX HIGH OF FREE SPACE, MOST SIGNIFICANT PART
STA ZPADH+1
LDA WORK3H ;THE OVER-WRITTEN VALUE, MOST SIGNIFICANT PART
STA (ZPADH),Y ;PUT THE OVER-WRITTEN VALUE, MOST SIGNIFICANT PART, IN THE FREE SPACE
LDA NVALL
BNE L3A
LDA NVALH
BEQ L3B ;EXIT OUTER LOOP
DEC NVALH
DEC WORK4L
DEC WORK4H
L3A DEC NVALL ;END OF THE SHORTER SEQUENCE STILL LEFT
BRA L0 ;START WORKING WITH THE SHORTER SEQUENCE
L3B RTS
The calling program has to set the pointers ZPADL and ZPADH pointing to the start of the (separately stored) sequences of the 8 less and the 8 more significant bits of the values. The total number of values are 1 + y + 256 * x where the value x has to be given to variable NVALL and the value y has to be given to variable NVALH. At the end these variables will both be zero.
In addition the following 9 bytes are needed internally
WORK1L
WORK1LH
WORK1HH
WORK2L
WORK2H
WORK3L
WORK3H
WORK4L
WORK4H
The following is a template for a program calling the routine:
;
ZPADL = $30 ;2 BYTE POINTER IN PAGE ZERO TO THE "LESS SIGNIFICANT" SEQUENCE. SET BY CALLING PROGRAM
ZPADH = $32 ;2 BYTE POINTER IN PAGE ZERO TO THE "MORE SIGNIFICANT" SEQUENCE. SET BY CALLING PROGRAM
NVALL = $34 ;< "START ADDRESS" - "END ADDRESS" OF SEQUENCE. SET BY CALLING PROGRAM
NVALH = $35 ;> "START ADDRESS" - "END ADDRESS" OF SEQUENCE. SET BY CALLING PROGRAM
;
;9 BYTES USED AS WORKING AREA
;
WORK1L = $36 ;LOW INDEX (Y) FOR POTENTIALLY LARGEST VALUE
WORK1LH = $37 ;HIGH INDEX (ADDL+1) FOR POTENTIALLY LARGEST VALUE (LEAST SIGNIFICANT PART)
WORK1HH = $38 ;HIGH INDEX (ADDL+1) FOR POTENTIALLY LARGEST VALUE (MOST SIGNIFICANT PART)
WORK2L = $39 ;POTENTIALLY LARGEST VALUE (LEAST SIGNIFICANT PART)
WORK2H = $3A ;POTENTIALLY LARGEST VALUE (MOST SIGNIFICANT PART)
WORK3L = $3B ;COPY OF LAST VALUE IN NOT YET SORTED SEQUENCE (LEAST SIGNIFICANT PART)
WORK3H = $3C ;COPY OF LAST VALUE IN NOT YET SORTED SEQUENCE (MOST SIGNIFICANT PART)
WORK4L = $3D ;HIGH ADDRESS FOR LAST ELEMENT IN NOT YET SORTED SEQUENCE (LEAST SIGNIFICANT PART)
WORK4H = $3E ;HIGH ADDRESS FOR LAST ELEMENT IN NOT YET SORTED SEQUENCE (MOST SIGNIFICANT PART)
*=$1000 ;CODE ANYWHERE IN RAM OR ROM
LDA #< SEQL
STA ZPADL
LDA #> SEQL
STA ZPADL+1
LDA #< SEQH
STA ZPADH
LDA #> SEQH
STA ZPADH+1
LDA #$09
STA NVALL
LDA #$00
STA NVALH
JSR SORT
BRK
SEQL .BYTE $30
.BYTE $20
.BYTE $DF
.BYTE $5A
.BYTE $81
.BYTE $C7
.BYTE $62
.BYTE $6D
.BYTE $6A
.BYTE $76
SEQH .BYTE $05
.BYTE $1F
.BYTE $16
.BYTE $0D
.BYTE $1E
.BYTE $17
.BYTE $18
.BYTE $10
.BYTE $26
.BYTE $10
"Insert the code for subroutine SORT here"
To test the efficiency (and correctness) of this algorithm I generated a totally random permutation of the numbers 0 - 9999 using a random number generator. This sequence was successfully sorted using subroutine SORT above. I measured the number of T1H counts of the W65C22S timer 1 during this sorting, it was 4324613. As T1H decrements every 256 PHI2 pulses and my W6502 runs at 8 megahertz this corresponds to 138.4 seconds.