Solidity Compiler

While the EVM is agnostic to the high level language used as long as it compiles to EVM bytecode, the Solidity language is still considered the most recognized and widely used. The Solidity language is a fundamental pillar of the Ethereum ecosystem, and as such, the pillar must be inspected for cracks often. But where do you look for vulnerabilities when investigating a programming language for bugs which will affect an ecosystem as a whole?
Using the  previous disclosures  as a reference, those issues with the highest severities were bugs which resulted in incorrect bytecode generation, either due to faulty logic in the compiler which affected memory usage or over optimization which removed critical bytecode. Incorrect usage of memory when compiling to bytecode may be straightforward to understand, but what is the optimizer and why can issues in the optimizer be so critical? The optimizer is a step during compilation where the compiler tries to simplify complicated expressions, which reduces both code size and execution cost, so it can reduce gas needed for contract deployment as well as for external calls made to the contract.
The  optimizer  has 3 different levels of optimization which help reduce bytecode size and decrease execution costs. The simplest level, optimizations at the opcode level, simply combines equal code and removes unused code, however it’s important these  simplification rules  don’t incidentally remove code that will be used or is critical. The second level is optimizations on the Yul IR code, which is much more powerful since it operates across function calls. This optimization level allows for function cells to be reordered or even removed completely, for example if the result of a side-effect free function is simply multiplied by zero. The third and final level deals with optimizations during direct analysis of the Solidity code. This optimizer affects the initial low-level code produced from Solidity input. In the legacy pipeline, bytecode is generated directly from Solidity code, however with the IR-based pipeline, Yul code is generated in between to bridge the gap from high-level Solidity code and low-level opcodes. This allows for much more powerful optimization rules to be implemented. That means that the third level of codegen-level optimization is done only in limited cases but is straightforward based on the abstract syntax tree of the Solidity code.
You can learn more about the Solidity optimizer in the “Solidity Compiler” section of the learning materials, under “Understanding the Solidity Optimizer”. Before diving into the optimizer, you’ll want to know the basics of the  Solidity Language , what  EVM Opcodes  are, and how to analyze compilation output of the  Solidity compiler .