profkelly wrote:
Many disassemblers follow the instructions to decide what is code and what is data. Begin at the starting address and disassemble the code. When a branch instruction is found follow both possible paths. Keep a table of memory accessed by the code. These memory areas are probably variables.
Thank you for your advice. That is quite a challenging proposition for me.
I believe that I would have to effectively interpret the intent of the code (nearly a full simulation, as far as I can tell) in order to properly follow all of the branch instructions.
For instance, in the disassembly that I have so far I have seen two ways of accessing a particular subroutine:
Code:
LEA ($532).L,A2 ;load address into A2
....
JSR (A2) ;jump to the address pointed to by A2
and
Code:
0x000680: BSR.W $feb0 ;jumps to 0x000532 via displacement of twos compliment of $feb0
one could conceivably use this as well:
Code:
JSR ($532).L
In order to properly follow the first version, I would need to keep track of what is in each address register as I walk through the code. I would need to prepare for the case where someone used some other command to populate the register used for the JSR command.
As a first step towards fixing my code's shortcoming, I left the disassembler to step through the code without regard to process flow, but I tried to account for jump/branch operations and possible data storage points. I made a change to my code where it tracks jump command landing points which are of higher address than the current instruction of interest which is being decoded (erasing those lower because the disassembler is still stepping forward through the code and will not visit lower addresses again). When an instruction would be bisected by a landing point, the disassembler rejects the interpretation. This increased my code's run time from about 4 minutes to about 10 minutes (lots of time eaten up by reading and writing a text file to store the landing points for jump routines). It still cannot capture the indirect JSR or JMP effective addressing modes, though, because I'm not tracking what is in the registers.
I did a similar accounting for non-jump commands which reference effective addresses containing a ($XXXX) type mode. These may be potential data addresses, thus the disassembler rejects instructions which occupy these addresses. Unfortunately, I had to ignore the LEA operation because I have found at least one instance where it is used to set up a jump, rather than loading the pointer to a data address. Another issue- this increased the run time of the disassembler to about an hour and a half.
The changes I made are rather daft, and I think that I may have to try to implement the more complicated version which tracks along the code's operation path. In doing so, I fear that the disassembly will miss some code which may not normally be accessed- for instance the exception handling code. I have also seen infinite loop code written in places where the PC should never have access to (I assume this was placed here as a fail-safe).
On the note of the exception handling code, the beginning of the disassembly points to 0x000000 as the beginning of the first vector table (made up of 256 long words, the VBR is sometimes set to another location in memory in my disassembly) which indicate what addresses contain code that each exception type should go to. It is unfortunately uninteresting- all addresses in the vector table (apart from the SP and PC reset values, and the Trace exception) point to the same location. This means that the code which I have failed to interpret in the disassembly (even with operation codes 1111 or 1010) is not likely some programmer defined special function. I may be in over my head.