====== Cycle Exact Measuring of Execution Times ======
In most cases one will measure how long certain subroutines take to execute by changing the border colors. This is usually sufficient to see how many rasters are wasted, but sometimes you want to know the exact number of cycles spent, or the routine in question takes more than a frame to execute, causing the color changes overlap in a way that makes it difficult to see where the execution starts and ends. For these cases the CIA timers come in handy:
;ZEN-TIMER 64, 6502tass v1.31 version. Original idea by M. Abrash. Usage:
; jsr measure or +0 if precompiled to start cycle counting
; jsr evaluate or +3 if precompiled to stop counting & print result
;Note: max cycle count range is limited to about 65.500 cycles (=roughly 3 frames)
overhead = 19 ;cycles wasted by the timer itself during measurement
irqs_allowed = 0 ;1 to allow them (less accurate results)
dma_off = 1 ;0 to allow badlines (dito)
sprites_off = 1 ;0 to allow sprites (dito)
printout = $400 ;0 to use $bdcd, to write directly to screen
;(or some other location to look it up via ml-mon)
;* = $1000 ;uncomment to precompile to wanted address
jmp measure
evaluate sei
lda #0
sta $dc0f
lda vald011
sta $d011
lda vald015
sta $d015
cld
sec
lda #<($ffff-overhead)
sbc $dc06
sta locycles
lda #>($ffff-overhead)
sbc $dc07
.if !printout
ldx locycles
jsr $bdcd
lda #13
jsr $ffd2
lda statusreg ;restore (most of) st
pha
plp
rts
.else
ldy locycles ;lame hex to petscii conversion
ldx #$30-1
stx ten1000s
stx ten1000s+1
stx ten1000s+2
stx ten1000s+3
stx ten1000s+4
sec
hploop sta temp
inc ten1000s-$30+1,x
tya
sbc lo,x
tay
lda temp
sbc hi,x
bcs hdloop
tya
adc lo,x
tay
inx
cpx #$34
sec
bne hploop+3
ldx #4
print lda ten1000s,x
sta printout,x
lda $d021
eor #8
sta (printout//$400)+$d800,x
dex
bpl print
lda statusreg ;restore (most of) st
pha
plp
rts
temp .byte 0 ;needed for hb
ten1000s .byte 0,0,0,0,0
lo = *-$30+1
.byte <10000,<1000,<100,<10,<1
hi = *-$30+1
.byte >10000,>1000,>100,>10,>1
.fi
locycles .byte 0
vald015 .byte 0
vald011 .byte 0
statusreg .byte 0
measure php ;save st, just in case
sei
pla
sta statusreg
lda $d011
sta vald011
lda $d015
sta vald015
ldx #$00
stx $dc0f ;stop timer b (not really necessary, but still)
.if dma_off
stx $d011
.fi
.if sprites_off
stx $d015
.fi
dex
cpx $d012
bne *-3 ;wait for vblank area
stx $dc06 ;set to $ffff
stx $dc07
lda #$19
.if irqs_allowed
cli
.fi
sta $dc0f ;start timer b, one shot mode
rts
So for example, if you had to find out how many cycles your latest uberbrilliant sprite-sorting algo takes, you could do that like this:
jsr initdata ;prepare test case for your sorting algo
jsr measure ;start cycle counting
jsr sortalgo
jsr evaluate ;stop count & print out cycle count
Note that the zen-timer can't be used for really slow routines as it can only count up to about 65.500 cycles. For those routines you should use the extended timer below which chains both CIA1 timers together but thereby doesn't behave that well in an environment that uses the timer a irq (e.g the kernal - you might want to change the program to use CIA2 for that):
;LNG-TIMER 64, 6502tass version. Original idea by M. Abrash. Extended ;version for extra-slow routine evaluation. Doesn't like timer interrupts
;& output is in hex for simplicity's sake. Usage:
; jsr measure or +0 if precompiled to start cycle counting
; jsr evaluate or +3 if precompiled to stop counting & print result
overhead = 19 ;cycles wasted by the timer itself during measurement
dma_off = 1 ;0 to allow badlines (dito)
sprites_off = 1 ;0 to allow sprites (dito)
printout = $400 ;where to write the result
;* = $1000 ;uncomment to precompile to wanted address
jmp measure
evaluate sei
lda #0
sta $dc0e
sta $dc0f
lda vald011
sta $d011
lda vald015
sta $d015
cld
sec
lda #<($ffff-overhead)
sbc $dc04
sta cycles
lda #>($ffff-overhead)
sbc $dc05
sta cycles+1
lda #$ff
sbc $dc06
sta cycles+2
lda #$ff
sbc $dc07
sta cycles+3
ldx #3
ldy #0
showresult lda cycles,x
lsr
lsr
lsr
lsr
jsr toscreen
lda cycles,x
and #$0f
jsr toscreen
dex
bpl showresult
lda statusreg ;restore (most of) st
pha
plp
rts
toscreen sed ;simple hex to hexpetscii conversion,
cmp #$0a ;courtesy of Frank Kontros
adc #$30
cld
sta printout,y
lda $d021
eor #$08
sta (printout//$400)+$d800,y
iny
rts
cycles .byte 0,0,0,0,0
vald015 .byte 0
vald011 .byte 0
statusreg .byte 0
measure php ;save st, just in case
sei
pla
sta statusreg
lda $d011
sta vald011
lda $d015
sta vald015
ldx #$00
stx $dc0e ;stop timers
stx $dc0f
.if dma_off
stx $d011
.fi
.if sprites_off
stx $d015
.fi
dex
cpx $d012
bne *-3 ;wait for vblank area
stx $dc04 ;set timers to $ffffffff
stx $dc05
stx $dc06
stx $dc07
lda #$59
sta $dc0f ;reload and set timer b to count timer a underflow
lda #$11
sta $dc0e ;reload and start timer a, continuous mode
rts