But I have spoken and learned from someone who understands decompilation. But since I’m no expert on the subject I will have him speak about it and paste it here.
Last time you told me personally that you haven’t tried to decompile it yourself, yet you’re telling me I should try and I will for sure find it matches. If you aren’t sure yourself, you are falling into logical fallacy. Nobody should tell someone to try something and tell him the result if he hasn’t done so himself.
And indeed he shouldn’t have just taken the liberty because that leads to further inacurracy and instead also posted the raw data maybe?
But exactly it is also my problem with this, he only reconstructed what we see/got to run but there must be more, those strings are there but unreferenced (maybe, only maybe) but the original code must have had it.
Below I am adding what zeur said about it:
[Disclaimer: me’s not an expert on mess-dos programming, at all (in fact
me’s a UNIX guy), but medid grow up w/ mess-dos and medoes
understand the complexities me’s about to outline.]
Deconstruction of an executable program on mess-dos is not often an easy
task, but let’s assume Carmack was already using DJGPP at the time, and
that the 386 part of the executable is in some vaguely UNIX-like format
(such as COFF). That makes the job a whole lot easier.
Let’s recap what happens, classically, when creating a program from
source:
- preprocessing
- compilation
- assembly
- linkage
Some of these passes can be (and are often) condensed, but basically,
that’s what happens. At each step, the level of complexity is reduced,
and thus our reconstruction is an execise in managing subsequent
increases of complexity.
First of all, we take the executable apart into separate objects based
on the symbol table (the symbol table is however often stripped from
production software, making this (and debugging) difficult). Then, we
disassemble each object; if the resulting assembler code is garbage: we
know that the object most likely consists of data (if not already so
specified through the symbol table); otherwise, it’s more likely to be
actual code (as an experienced programmer will realize: in the end,
“code” is a special case of “data”, which means the results of this
separation cannot be definitive).
Then, we can attempt to decompile the code “back” into C code (assuming
it was written in C, but that’s more than reasonable assumption here).
Unfortunately, that in particular is very difficult to pull off with any
exactitude (especially if we don’t know the habits of the compiler used
to produce it), and indeed most automatic decompilers will produce
rather comical results. This is because of the simple fact that the same
sequence of machine instructions can map to many different expressions
(let alone statements) in the C language (and that’s discarding compiler
{opt,pess}imization). Straight decompilation is thus unlikely to yield
clean C code, let alone code that is, in appearance, anything like the
original.
Usually, there are very few traces of preprocessing in the compiled
code, so generally we can forget about reproducing that part altogether.
Of course, it’s possible to take a cowboy attitude and skip large chunks
of the process (this can be handy in a hurry, but is useless for
historically-accurate reconstruction), but the above is about how it
goes.
Now, one of the points you folks seem to gloss over: data objects are
objects, too! Why aren’t they included in your little “decompilation”?
Though hogsy’s comment about compilers not being so advanced is a good
one: code not being referenced does not mean it’s not necessarily there!
And (especially since this shit runs on mess-dos), there’s no
guarantee it isn’t referenced in any way.
Oh, and could you folks please define “garbage”? Random data? Stuff you
can’t make sense of (a common problem in reverse engineering)? Or
perhaps stuff that you were unable to “decompile”?
Technical considerations don’t go away with bluff, bluster, and personal
attacks. They stay. As many people in history have already had to find
out the hard way.