Go AST to Haxe AST walkthrough

The entry point for the compiler comes from running the command via haxelib.

haxelib run go2hx ./finderrors.go

This command will run Run.hx, this file will clone the necessary repos for the Go part of the compiler, run go build . on the users behalf and then choose a target to build the Haxe part of the compiler, the target is chosen based on what is available or explicit arguments.

go build .

The go build command will build according to the root file go.mod and the main file export.go, go.mod can be thought of a build configuration file for Go and will list the required dependencies. There is a special replace in the go.mod this allows the compiler to use a custom fork of tools found here (not very important but may come up when it comes to default sizes for int and uint)

export.go

  1. main entry for the Go part of the compiler (also called go4hx)
  2. Facilitates calling the Go compiler and pulling out all of the typed AST.
  3. Called from the Haxe side of the compiler
  4. Launched by Haxe side with specific arguments and waits to connect to the TCP server (Haxe side).
  5. Once connected sends over the data in JSON form .
  6. After sending all of the data, will stay connected in case the another package will need to be compiled.

export.go also calls other go files such as in package/folder analysis these files perform passes over the Go AST for example, to add pointer variables for each pointer initiated, control flow flattening when goto jumps are detected etc.

The Haxe part of the compiler receives the type information and AST, the AST in haxe is held in ./src/Ast.hx the types are held in ./src/Types.hx not to complicated!

src/Main.hx

  1. entry point is function main
  2. Coordinates go4hx and processes all of the command line arguments.
  3. Calls src/Typer.hx for the typing process and after the typing takes place.
  4. Passes the Haxe AST to src/Gen.hx.
  5. Generator and handles calling src/Printer.hx, creating Haxe files and interop layer files.

src/Typer.hx

  1. Entry point is function main
  2. Does multiple passes over the Go AST and handles all of the AST transformations.
  3. Most complex and biggest file in the codebase
  4. Mostly takes in Go AST and returns Haxe AST.
  5. Naming convention follows Go AST.
  6. Uses a lot of the macro keyword

The Go AST is File->Decl. Decl->FuncDecl or GenDecl, FuncDecl is a modular function which holds args, params (for generics), return, body (stmts->exprs), and if it is a variadic (final argument is a rest). GenDecl->Array->Spec can be an import, type, or variable, an Array means it is all grouped together. For example an Array or type specs would be:

type (
    X struct{}
    Y int
)

If it 2 separate Specs it would be:

type X struct{}
type Y int

Stmts or Statements are a special category of Exprs, in Haxe all Go stmts would be an expr because in Haxe everything is an expr, for example: for loop, if stmt etc. In Haxe terms these are Exprs that always return void type.

A Stmt also has a 2 special Statements, ExprStmt and DeclStmt. This allows Stmt to go up or down the abstraction layer in AST. DeclStmt allows having a Decl where a Stmt would be and likewise for an ExprStmt.

This is special in the case of DeclStmt because it allows creating DeclTypes inside of a function body (a function body is an Array of statement)

For example:

func main() {
    type X struct{
        x int
        y int
    }
}

func foo() {
    {
        type X struct{}
    }
    {
        type X int
    }
}

This allows scoped types. The types must be able to be inferred by the compiler at compile time, so no runtime types from an if else is allowed for instance.

Next part (advanced notes)

Now onto the interesting part.

We will now go over a real world Go program, source code here

The program was written by me to be able to found the most repeating errors in the compiler from the test logs, in order to prioritize what to fix.

It uses some file operations and text and the go2hx compiler is able to compile it correctly.

The code is not very clean and could do with lots of optimization, but given it's job is one off analysis, it's not too important and hits the goal of going over something real and gaining insights for how the compiler handles it.

So to start off the compiler is ran (details found above) and the start, for our purposes will be inside ./export.go the AST is obtained from:

initial, err := packages.Load(cfg, &types.StdSizes{WordSize: 4, MaxAlign: 8}, args...)

initial now holds all of the packages and parsePkgList will now be called.

parsePkList goes over every package in a loop and calls mergePackage to merge the package into a single File. The package is looped again and parseLocalPackage is called. parseLocalPackage runs all of the analyzers from the analysis folder/package. Then parsePkg is called, which in turn calls parseFile. parseFile loops over the decls, and the specs, and parseData is then called. This function gets called iteratively.

If new information needs to be sent to the Haxe part of the compiler modification can be made by finding the type in the very large type switch stmts and adding a key map to new data exposed, for example:

case *ast.CompositeLit:
    data["exprType"] = parseData(node.Type)
    data["type"] = parseType(checker.TypeOf(node), map[string]bool{})
    data["test"] = node.NewExposedThing // not valid field access for this type

and then have it be exposed in Haxe by adding it in ./src/Ast.hx:

typedef CompositeLit = {
	// > Node,
	type:Expr,
	exprType:Expr,
	lbrace:Pos,
	elts:Array<Expr>,
	rbrace:Pos,
	incomplete:Bool,
	test:String, // this is the new field
};

Whenever a type needs to be parsed, parseType is called.

The interesting part

The Go typed AST will be all be passed into ./src/Typer.hx this is where most important things happen.

package main

Sets what the package is, and is set into info.global.path, info is passed to most functions in the codebase and info.global is always the same across an entire package, where as info fields are passed by value for each function, and some of the fields are reset.

Info holds lots of important context information when transpiling the AST.

import (
	"os"
	"path/filepath"
	"sort"
	"strings"
)

Each import is run with typeImport, and sets info.renameIdents in most cases which is a map that looks for a given identifier and renames it.

var removeStrings = []string{
	",",
	".",
	":",
	"0", "1", "2", "3", "4", "5", "6", "7", "8", "9",
}

typeValue is run and returns an array of TypeDefinition. []string is called with typeExprType which takes a Go AST expr, and turns it into a ComplexType []string -> stdgo.Slice<stdgo.GoString>.

The first check is if it's a destructure, in this case no

value.names each name is ran through nameIdent

All values exist, so defaultValue is not used, and instead every value expr is called with typeExpr

expr = typeExpr(value.values[i], info);

The GoType variable nameType is turned into a haxe.macro.ComplexType via the function toComplexType. GoType is how Go type information is held for Haxe. It is an enum and found in ./src/Types.hx

In the end this creates a Haxe TDField of FVar

Inside of typeExpr the expr is a CompositeLit and therefore typeCompositeLit is called, and turns the type into a ComplexType and runs over an array of Expr. The exprs are all BasicLit and runs typeBasicLit it is a string literal and is turned into a Haxe string expr.

func main() {}

Is transformed inside function typeFunction the decl.body has typeBlockStmt called on it, and goes over every stmt with typeStmt

The first stmt is a DeclStmt this leads to having typeValue be called, because it's a constant the analysis package already handles it so a macro {} is returned to denote nothing.

The second stmt is an AssignStmt which is transformed with function typeAssignStmt this is a destructure version and runs the lhs (left hand side) and rhs (right hand side) exprs with typeExpr.