All Articles

đŸ„Š Cracking JavaScript Series: JavaScript Engine

javascript engine execution steps

History

  • The first JavaScript Engine was created by Brendan Eich in 1995 for the Netscape Navigator web browser. It called “SpiderMonkey” which is still used by the Firefox browser now.
  • V8 Engine (Wrote in C++) was developed by Google in 2008 because of building faster browser engine in order to control people to use their search engine. However, there is not only one JavaScript Engine in the world.

AST (Abstract Syntax Tree)

The word “Parser” mentioned above actually could break down in three steps below.

Fortunately, we don’t need to go through all its phases, converting HHL (High-level language) code to bits. We are interested in Lexical and Syntax Analysis only. These two steps play the main role in generating AST from code.

  • Lexical analyzer (a.k.a. scanner)

    First step. Lexical analyzer, also called scanner, it reads a stream of characters (our code) and combines them into tokens using defined rules. Also, it will remove white space characters, comments, etc. In the end, the entire string of code will be split into a list of tokens.

    var a = 42
    
    // -------\<lexical analyzer\>------->
    
    [
        {type:'identifier',value:'var'},
        {type:'whitespace',value:' '},    
        {type:'identifier',value:'a'},
        {type:'whitespace',value:' '},
        {type:'operator',value:'='},
        {type:'whitespace',value:' '},
        {type:'num',value:'42'},
        {type:'sep',value:';'}
    ]

    When the lexical analyzer read the source-code, it scans the code letter by letter; and when it encounters a whitespace, operator symbol, or special symbols, it decides that a word is completed.

  • Syntax analyzer parser

    Second step. Syntax analyzer, also called parser, will take a plain list of tokens after Lexical Analysis and turn it into a tree representation, validating language syntax and throwing syntax errors, if such happened.

    While generating a tree, some parsers omit unnecessary tokens (like redundant brackets for example) so they create ‘Abstract Syntax Tree’ — it is not 100% of code match, but enough to know how to deal with it. On another hand, parsers which fully cover all code structure generate tree called ‘Concrete Syntax Tree’.

    You can use AST explorer to see how a parser converts JavaScript into a syntax tree.

Interpreter & Compiler

  • Interpreter (From Wiki)

    In computer science, an interpreter is a computer program that directly executes, i.e. performs, instructions written in a programming or scripting language, without previously compiling them into a machine language program.

    • Pros (Take JavaScript as an example.)

      An interpreter is able to interpret JS directly and run as fast as it can.

    • Cons

      The more JS code is written, the slower running speed it can be. Without compiler, when you are running redundant code like the same code in a for loop, it will execute them over and over again, even though the result of the loop gives the same.

      // With interpreter
      function sumCalulation(x, y){
          return x + y;
      }
      
      for (let i = 0; i < 1000; i++) {
          /*** 
           * With only interpreter, the code will be running 1000 times,
           * though the result would always be 9.
          ***/
          sumCalulation(5,4);
      }
  • Compiler (From Wiki)

    The name compiler is primarily used for programs that translate source code from a high-level programming language to a lower level language (e.g., assembly language, object code, or machine code) to create an executable program.

    • Pros

      The compiler will do optimization with the source code.

    • Cons

      The compiler takes a little bit longer to get up and to run.

      // With compiler
        function sumCalulation(x, y){
            return x + y;
        }
    
        for (let i = 0; i < 1000; i++) {
            // sumCalulation(5,4); it can be replace with 9
            9
        }

đŸ€” Self-question
Is there a way we can get the best of both?

  • Jit Compiler (Just-in-time compilation)

    In 2017-06-05, Google launched the Ignition+TurboFan pipeline in Chrome 59

    • Ignition (Interpreter)

      V8 features an interpreter called Ignition. Ignition is a fast low-level register-based interpreter written using the backend of TurboFan, and it’s designed for reducing memory cost in a mobile device. Because in the past, Full-Codegen doesn’t really apply any serious optimizations (it was supposed to generate code as quickly as possible), it was used to taking 30% of the managed memory in typical web applications was used by Code objects.

    • TurboFan (Compilers)

      Replace Crankshaft with TurboFan which is based on sea-of-nodes architecture. It’s designed for solving Performance Cliffs, and become more adaptable to add new ECMAScript function in the future.

đŸ€” Self-question
Is JavaScript an interpreted language?

Optimized Code

  • Hidden Class

    1 function Point(x,y) {
    2    this.x = x;
    3    this.y = y;
    4  }
    5
    6  var obj1 = new Point(1,2);
    7  var obj2 = new Point(3,4);
    8
    9  obj1.a = 5;
    10 obj1.b = 10;
    11
    12 obj2.b = 10;
    13 obj2.a = 5;

Up until line 8, obj1 and obj2 shared the same hidden class. However, since properties a and b were added in opposite orders, obj1 and obj2 end up with different hidden classes as a result of following separate transition paths.

ps: Because JavaScript is a dynamic language, we are able to add new properties without define them in constructure previously, and the engine will create hidden class to deal with it during run time.

If you’ve been following along closely, your first instinct might be to think that obj1 and obj2 having two different hidden classes isn’t a big deal. As long as each of their hidden classes stores the appropriate offsets, accessing their properties should be just as fast as if they shared a hidden class right?

In order to understand why this isn’t true, we need to take a look at another optimization technique employed by V8 called inline caching.

  • Inline Cache

    Whenever a method is called on a specific object, the V8 engine has to perform a lookup to that objects hidden class to determine the offset for accessing a specific property. After two successful calls of the same method to the same hidden class, V8 omits the hidden class lookup and simply adds the offset of the property to the object pointer itself. For all future calls of that method, the V8 engine assumes that the hidden class hasn’t changed, and jumps directly into the memory address for a specific property using the offsets stored from previous lookups; this greatly increases execution speed.

đŸ€” Self-question
Why not just use machine code from the beginning? So they don’t have to worry about interpreter, compiler, JIT compliler stuff

WebAssembly

Nowadays, browsers still have different ways of doing things like JS engine, but what if there is a standard binary executable format? Here comes “WebAssembly”.

Call Stack & Memory Heap

  • Call Stack (First in last out mode)

    Call Stack mainly is to keep track of the point to which each active subroutine should return control when it finishes executing.

  • Memory Heap

    As a place to allocate memory, use memory, and release memory

    Call stack now has the anonymous function which was the global execution context. When running the calculate() function, it’s just added on the top of the stack.

  • Stack Overflow

    It’s caused when a program writes more data to a buffer located on the stack than what is actually allocated for that buffer. It usually happened in infinite recursion function, or just because of too many stacks are put.

  • Memory Leaks

    It’s a type of resource leak that occurs when a computer program incorrectly manages memory allocations[1] in such a way that memory which is no longer needed is not released.

    The garbage collection wasn’t really working because we had this array we were using this array over and over until we crashed our program.

    Three typically reasons cause memory leaks:

    • Global variable

      var a = 1;
      var b = 1;
      var c = 1;
      ...
    • Event listeners

      var element = document.getElementById('button');
      element.addEventListener('click', onclick);
      // You have better unregister the event listener when things are done.
    • SetInterval

      setInterval(() => {
      //...
      })
      // You have better unregister the event listener when things are done.
  • Garbage Collection

    • Advantages of Mark and Sweep Algorithm

      • It handles the case with cyclic references, even in case of a cycle, this algorithm never ends up in an infinite loop.
      • There are no additional overheads incurred during the execution of the algorithm.
    • Disadvantages of Mark and Sweep Algorithm

      • The main disadvantage of the mark-and-sweep approach is the fact that that normal program execution is suspended while the garbage collection algorithm runs.
      • Other disadvantage is that, after the Mark and Sweep Algorithm is run several times on a program, reachable objects end up being separated by many, small unused memory regions. Look at the below figure for better understanding.

đŸ€” Self-question
How to deal with memory fragmentation?
Take a look for Memory paging.