A Non-Professional Comparison of Various Open Source Assembly/Disassembly Engines

For both personal interest and work, I have studied and used various popular open source x86/64 assembly and disassembly engines. Analyzing and operating on assembly instructions requires either studying the Intel instruction set and writing an engine yourself, or using an existing open source engine. Because writing one from scratch is time-consuming, laborious, and error-prone, using an existing engine is preferable.

Here is a comparison of some of the more popular disassembly engines that I have used:

1. Ollydbg’s ODDisassm

ODDisassm, a component of Ollydbg, was the first open source disassembly engine I used. In 2007, due to the limited options available, I used this library to write a very simple virtual machine, as described in my article, “Encryption and Decryption (3)”. At the time, the requirements for the disassembly library were not high, only that string text be used as an intermediate representation for encoding/decoding.

The advantage of this disassembly library is that it contains an assembly interface; that is, it can parse and encode text strings into binary. This feature alone was unique at the time. Few people in the open source community were doing this work. In recent years, the new debugger x64dbg has also developed an open source assembly library, XEDParse, which is similar in function to OD’s text parsing, supports a more complete instruction set, has fewer bugs, and also supports X64. Maintenance has been very strong.

However, ODDisassm also has many shortcomings, such as:

Incomplete instruction set support. Since Ollydbg has been out of maintenance for a long time, even the support for the MMX instruction set is incomplete. The multiple versions of the current INTEL/AMD extended instruction set standards, SSE5/AVX/AES/XOP, and others, cannot be parsed at all.
The decoded structure is not detailed enough. For example, the instruction prefix support is not friendly enough. This can be seen from the disassembly window of Ollydbg. Except for instructions such as movs/cmps, repcc and other instructions are separated when combined. For another example, the register cannot represent high 8-bit registers such as ah\bh\ch\dh.
The author no longer maintains the open source version after the one-time open source, and it is difficult to fix bugs in disassembly in a timely manner.

However, these are understandable because the author’s development goal at the time was to perform text assembly/disassembly. No structure or interface was established for the decoded information. In general, using this disassembly engine today is antiquated.

2. BeaEngine

BeaEngine is the second library I used. At that time, the OD library could no longer meet my needs. When making a decompiler, I needed a library that could decode as much information as possible, so I found BeaEngine. I remember that the previous version of this library did not support high 8-bit register recognition, but the current version does. I haven’t found any obvious shortcomings while using it, and many new extended instruction sets that are not commonly used have also been implemented.

The extended instruction sets currently implemented are:

1	FPU, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, VMX, CLMUL, AES, MPX

Simultaneously, it classifies different types of instructions, which is very convenient when judging different instructions. Another feature is that it can decode the registers used and affected by each instruction, including the flag register, and even the exact location of each bit in the flag register. This function is perfect for making optimizers and obfuscators.

However, personally, I think the coding style of BeaEngine is really not good, with all kinds of forced type conversions and naming styles. It gives a messy feeling. For someone like me who has a cleanliness obsession with coding, it’s unbearable, so I switched to other libraries. If you don’t mind these, BeaEngine’s performance is still relatively good. However, be aware that BeaEngine has been known to exhibit occasional bugs.

3. udis86

udis86 has emerged as my preferred disassembly engine. The udis86 codebase is notable for its clean and concise style. Functions and variables are named descriptively, making the code easy to read and understand. The interface is well-defined and flexible. Even maintaining a personal branch is straightforward, as understanding the overall architecture takes very little time.

udis86 supports these X86 extended instruction sets:

1	MMX, FPU (x87), AMD 3DNow, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AES, AMD-V, INTEL-VMX, SMX

The advantage of udis86 is that the interface is very flexible. You can choose to use ud_decode to decode an instruction and then use ud_translate_intel on the decoded structure to convert it into assembly code, or you can directly use ud_disassemble to complete the entire operation at once. All these interfaces can be used in one line.

Due to the modular design concept of udis86, it can adapt to various scenarios. If you want to develop a disassembler like IDA, it can do it; if you want to develop an instruction simulator, analyzer, optimizer, or obfuscator, it can do it.

This concept directly enables udis86 to have strong adaptability while taking into account performance. I have done performance tests, and udis86 is the engine with the fastest decoding speed in cases with similar decoding detail capabilities.

As for shortcomings, I haven’t found any yet, but udis86 does not support BeaEngine’s register analysis, which is somewhat regrettable.

4. Capstone

capstone should be regarded as the culmination of all disassembly engines. I have to spend somewhat more time on it because I have a love-hate relationship with it. Capstone is ported from the MC component part of the LLVM framework, so the CPU architectures supported by LLVM are also supported by capstone.

The CPU architectures it supports are: Arm, Arm64 (Armv8), M68K, Mips, PowerPC, Sparc, SystemZ, XCore & X86 (include X86_64).

Capstone boasts the most comprehensive X86 instruction set support among these engines, including:

1	3dnow, 3dnowa, x86_64, adx, aes, atom, avx, avx2, avx512cd, avx512er, avx512f, avx512pf, bmi, bmi2, fma, fma4, fsgsbase, lzcnt, mmx, sha, slm, sse, sse2, sse3, sse4.1, sse4.2, sse4a, ssse3, tbm, xop.

This robust support makes it a top contender.

In the context of the current popularity of mobile terminals, very few disassembly libraries support ARM. If you want to develop compilers under both X86 and ARM simultaneously, it would be better to use a unified interface. Additionally, the next branch of capstone (the master branch does not have this interface) also supports the cool stunt of analyzing the registers used and affected by instructions when decoding, like BeaEngine. With such a basic library, you can be somewhat lazy.

From the perspective of the X86/64 platform alone, whether in decoding ability or instruction set support, Capstone can be called a complete existence that surpasses BeaEngine.

Having sung its praises, it’s time to talk about the shortcomings.

Because capstone is ported from LLVM, capstone is a C language project, while LLVM is a C++ project, so much adaptation work was done during the porting process, making it bloated.

For example, MCInst in LLVM is a description class for single-instruction underlying mechanism instructions. Because capstone is a C project, these classes are turned into structures during transplantation, and member functions are turned into independent C functions, such as MCInst_Init, MCInst_setOpcode, etc. Because of the complexity and high compatibility of the LLVM framework, all the concepts in it have been highly abstracted, and Capstone has also made an adaptation interface to convert it to its own architecture, which causes too many intermediate layers during decoding, resulting in performance degradation. The order of important intermediate layer structures used in the decoding process of an instruction is as follows:

1	MCInst => InternalInstruction => cs_insn

The most basic decoding work relies on the LLVM architecture to decode to Capstone’s InternalInstruction, which is an internal structure containing all the details in the decoding process. After the decoding is completed, update_pub_insn is called to copy the content that needs to be publicly exposed to cs_insn. Other disassembly engines decode to the target structure at once.

Capstone’s decoding process is so complicated that it naturally affects performance. I have done a less rigorous performance test, and Capstone’s performance consumption time is about five or six times that of udis86 (by the way, I gave Capstone a small Pull Request, here and here, the PR comes with a benchmark that, after testing, shows the performance is improved by nearly 20%). If tested in another way, udis86 only uses ud_decode to decode, and Capstone, which does not have an independent decoding interface, needs some hacking so that it does not generate assembly text. Then, the consumption time of Capstone is about twice that of udis86, which shows that the text operation of Capstone is much slower than udis86.

Second, Capstone consumes much memory. When decoding an instruction, the instruction structure cs_insn passed in must be allocated by a dynamic allocation function, and it must be allocated twice, once for cs_insn and once for cs_detail. This will cause a huge amount of memory fragmentation. Additionally, the structure of each instruction is very large. I don’t remember how large, but sizeof(cs_insn)+sizeof(cs_detail) seems to be at least 2K or more. It is necessary to use dynamic memory. This is the difference between Capstone and other disassembly engines. If you want to use Capstone for much instruction analysis, you need to equip it with a fixed object memory allocator, which can slightly alleviate the memory fragmentation situation and improve performance somewhat.

Perhaps for these reasons, the x64dbg community originally used BeaEngine as the supporting foundation. However, BeaEngine always exhibited many bugs, so it was later replaced by Capstone. However, they only use Capstone for the text disassembly of the GUI because although the decoding speed is not great, there are few bugs (after all, LLVM has a large company like Apple for support). The flow graph and instruction analysis (not yet perfected) still use BeaEngine, which is unavoidable; after all, performance is also very important.

Another problem is that if you need a disassembly engine with strong decoding capabilities, I recommend comparing the decoding structures of each engine before choosing to see if it has any fields that you need.

Capstone has a frustrating issue. Although its decoding ability is actually very strong, Capstone encapsulates the middle layer and only exposes the fields that it thinks need to be exposed. Its maintainer is somewhat stubborn (or perhaps, “rigorous”) and insists that less commonly used fields do not need to be exposed and that a simple interface is best.

For example, the offset of the immediate Immediate in the instruction and the offset of the Displacement in the memory operand, originally in the internal structure InternalInstruction, are discarded when copied to the public structure cs_insn. There are also REP and REPE prefixes. Although they are represented by the same constant, they have different functions when combined with different instructions. For this, Capstone internally has a valid_repe function that can distinguish them, but it is not exposed to the public structure and is recognized as REP. Although these are very specialized, they are still very useful for instruction analysis and transformation.

I personally think that the interface of Capstone is really frustrating to use, but its function is powerful. If you study the internal structure of its source code, you will find that many interfaces are not provided but are internally available. I maintain a branch myself, using it with joy and pain.

Others

Actually, there is also XDE, but I haven’t used it, so I won’t comment on it.

Additionally, a length-disassembly engine in blackbone is worth mentioning. It is called ldasm. It’s not really an engine because it has only one function: to calculate the length of an instruction. This is very useful when relocating jump instructions during hooking. Code Portal

Summary

Each of these disassembly engines has its strengths (except for OD), but each has some minor flaws. There is no such thing as perfection in this world. They are open source, so it’s great to use them. But you have to contribute yourself, right? Pick a good library, find bugs during use, submit an Issue to the community, or make a solution and then send a Pull Request. That is a way to pay for using it.

Feature	Comparison
Performance	udis86 > BeaEngine > capstone
Detail in Decoded Output	capstone > BeaEngine > udis86 (udis86 does not support register analysis; the remaining decoding capabilities are similar)
Platform support	capstone > (udis86 = BeaEngine)
X86 extended instruction set	capstone > (udis86 ≈ BeaEngine)

If you need an X86/64 disassembly engine with good performance and strong decoding ability but don’t need stunts like register analysis, then udis86 is right for you. If you also need register analysis functions, then BeaEngine and Capstone are right for you. If you also need ARM architecture support, then Capstone should suit you better.

Ultimately, the best engine for a given task depends on the specific needs and priorities of the project.