--- /dev/null
+// Copyright 2013 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+/*
+
+Package pointer implements Andersen's analysis, an inclusion-based
+pointer analysis algorithm first described in (Andersen, 1994).
+
+A pointer analysis relates every pointer expression in a whole program
+to the set of memory locations to which it might point. This
+information can be used to construct a call graph of the program that
+precisely represents the destinations of dynamic function and method
+calls. It can also be used to determine, for example, which pairs of
+channel operations operate on the same channel.
+
+The package allows the client to request a set of expressions of
+interest for which the points-to information will be returned once the
+analysis is complete. In addition, the client may request that a
+callgraph is constructed. The example program in example_test.go
+demonstrates both of these features. Clients should not request more
+information than they need since it may increase the cost of the
+analysis significantly.
+
+
+CLASSIFICATION
+
+Our algorithm is INCLUSION-BASED: the points-to sets for x and y will
+be related by pts(y) ⊇ pts(x) if the program contains the statement
+y = x.
+
+It is FLOW-INSENSITIVE: it ignores all control flow constructs and the
+order of statements in a program. It is therefore a "MAY ALIAS"
+analysis: its facts are of the form "P may/may not point to L",
+not "P must point to L".
+
+It is FIELD-SENSITIVE: it builds separate points-to sets for distinct
+fields, such as x and y in struct { x, y *int }.
+
+It is mostly CONTEXT-INSENSITIVE: most functions are analyzed once,
+so values can flow in at one call to the function and return out at
+another. Only some smaller functions are analyzed with consideration
+of their calling context.
+
+It has a CONTEXT-SENSITIVE HEAP: objects are named by both allocation
+site and context, so the objects returned by two distinct calls to f:
+ func f() *T { return new(T) }
+are distinguished up to the limits of the calling context.
+
+It is a WHOLE PROGRAM analysis: it requires SSA-form IR for the
+complete Go program and summaries for native code.
+
+See the (Hind, PASTE'01) survey paper for an explanation of these terms.
+
+
+SOUNDNESS
+
+The analysis is fully sound when invoked on pure Go programs that do not
+use reflection or unsafe.Pointer conversions. In other words, if there
+is any possible execution of the program in which pointer P may point to
+object O, the analysis will report that fact.
+
+
+REFLECTION
+
+By default, the "reflect" library is ignored by the analysis, as if all
+its functions were no-ops, but if the client enables the Reflection flag,
+the analysis will make a reasonable attempt to model the effects of
+calls into this library. However, this comes at a significant
+performance cost, and not all features of that library are yet
+implemented. In addition, some simplifying approximations must be made
+to ensure that the analysis terminates; for example, reflection can be
+used to construct an infinite set of types and values of those types,
+but the analysis arbitrarily bounds the depth of such types.
+
+Most but not all reflection operations are supported.
+In particular, addressable reflect.Values are not yet implemented, so
+operations such as (reflect.Value).Set have no analytic effect.
+
+
+UNSAFE POINTER CONVERSIONS
+
+The pointer analysis makes no attempt to understand aliasing between the
+operand x and result y of an unsafe.Pointer conversion:
+ y = (*T)(unsafe.Pointer(x))
+It is as if the conversion allocated an entirely new object:
+ y = new(T)
+
+
+NATIVE CODE
+
+The analysis cannot model the aliasing effects of functions written in
+languages other than Go, such as runtime intrinsics in C or assembly, or
+code accessed via cgo. The result is as if such functions are no-ops.
+However, various important intrinsics are understood by the analysis,
+along with built-ins such as append.
+
+The analysis currently provides no way for users to specify the aliasing
+effects of native code.
+
+------------------------------------------------------------------------
+
+IMPLEMENTATION
+
+The remaining documentation is intended for package maintainers and
+pointer analysis specialists. Maintainers should have a solid
+understanding of the referenced papers (especially those by H&L and PKH)
+before making making significant changes.
+
+The implementation is similar to that described in (Pearce et al,
+PASTE'04). Unlike many algorithms which interleave constraint
+generation and solving, constructing the callgraph as they go, this
+implementation for the most part observes a phase ordering (generation
+before solving), with only simple (copy) constraints being generated
+during solving. (The exception is reflection, which creates various
+constraints during solving as new types flow to reflect.Value
+operations.) This improves the traction of presolver optimisations,
+but imposes certain restrictions, e.g. potential context sensitivity
+is limited since all variants must be created a priori.
+
+
+TERMINOLOGY
+
+A type is said to be "pointer-like" if it is a reference to an object.
+Pointer-like types include pointers and also interfaces, maps, channels,
+functions and slices.
+
+We occasionally use C's x->f notation to distinguish the case where x
+is a struct pointer from x.f where is a struct value.
+
+Pointer analysis literature (and our comments) often uses the notation
+dst=*src+offset to mean something different than what it means in Go.
+It means: for each node index p in pts(src), the node index p+offset is
+in pts(dst). Similarly *dst+offset=src is used for store constraints
+and dst=src+offset for offset-address constraints.
+
+
+NODES
+
+Nodes are the key datastructure of the analysis, and have a dual role:
+they represent both constraint variables (equivalence classes of
+pointers) and members of points-to sets (things that can be pointed
+at, i.e. "labels").
+
+Nodes are naturally numbered. The numbering enables compact
+representations of sets of nodes such as bitvectors (or BDDs); and the
+ordering enables a very cheap way to group related nodes together. For
+example, passing n parameters consists of generating n parallel
+constraints from caller+i to callee+i for 0<=i<n.
+
+The zero nodeid means "not a pointer". For simplicity, we generate flow
+constraints even for non-pointer types such as int. The pointer
+equivalence (PE) presolver optimization detects which variables cannot
+point to anything; this includes not only all variables of non-pointer
+types (such as int) but also variables of pointer-like types if they are
+always nil, or are parameters to a function that is never called.
+
+Each node represents a scalar part of a value or object.
+Aggregate types (structs, tuples, arrays) are recursively flattened
+out into a sequential list of scalar component types, and all the
+elements of an array are represented by a single node. (The
+flattening of a basic type is a list containing a single node.)
+
+Nodes are connected into a graph with various kinds of labelled edges:
+simple edges (or copy constraints) represent value flow. Complex
+edges (load, store, etc) trigger the creation of new simple edges
+during the solving phase.
+
+
+OBJECTS
+
+Conceptually, an "object" is a contiguous sequence of nodes denoting
+an addressable location: something that a pointer can point to. The
+first node of an object has a non-nil obj field containing information
+about the allocation: its size, context, and ssa.Value.
+
+Objects include:
+ - functions and globals;
+ - variable allocations in the stack frame or heap;
+ - maps, channels and slices created by calls to make();
+ - allocations to construct an interface;
+ - allocations caused by conversions, e.g. []byte(str).
+ - arrays allocated by calls to append();
+
+Many objects have no Go types. For example, the func, map and chan type
+kinds in Go are all varieties of pointers, but their respective objects
+are actual functions (executable code), maps (hash tables), and channels
+(synchronized queues). Given the way we model interfaces, they too are
+pointers to "tagged" objects with no Go type. And an *ssa.Global denotes
+the address of a global variable, but the object for a Global is the
+actual data. So, the types of an ssa.Value that creates an object is
+"off by one indirection": a pointer to the object.
+
+The individual nodes of an object are sometimes referred to as "labels".
+
+For uniformity, all objects have a non-zero number of fields, even those
+of the empty type struct{}. (All arrays are treated as if of length 1,
+so there are no empty arrays. The empty tuple is never address-taken,
+so is never an object.)
+
+
+TAGGED OBJECTS
+
+An tagged object has the following layout:
+
+ T -- obj.flags ⊇ {otTagged}
+ v
+ ...
+
+The T node's typ field is the dynamic type of the "payload": the value
+v which follows, flattened out. The T node's obj has the otTagged
+flag.
+
+Tagged objects are needed when generalizing across types: interfaces,
+reflect.Values, reflect.Types. Each of these three types is modelled
+as a pointer that exclusively points to tagged objects.
+
+Tagged objects may be indirect (obj.flags ⊇ {otIndirect}) meaning that
+the value v is not of type T but *T; this is used only for
+reflect.Values that represent lvalues. (These are not implemented yet.)
+
+
+ANALYSIS ABSTRACTION OF EACH TYPE
+
+Variables of the following "scalar" types may be represented by a
+single node: basic types, pointers, channels, maps, slices, 'func'
+pointers, interfaces.
+
+Pointers
+ Nothing to say here, oddly.
+
+Basic types (bool, string, numbers, unsafe.Pointer)
+ Currently all fields in the flattening of a type, including
+ non-pointer basic types such as int, are represented in objects and
+ values. Though non-pointer nodes within values are uninteresting,
+ non-pointer nodes in objects may be useful (if address-taken)
+ because they permit the analysis to deduce, in this example,
+
+ var s struct{ ...; x int; ... }
+ p := &s.x
+
+ that p points to s.x. If we ignored such object fields, we could only
+ say that p points somewhere within s.
+
+ All other basic types are ignored. Expressions of these types have
+ zero nodeid, and fields of these types within aggregate other types
+ are omitted.
+
+ unsafe.Pointers are not modelled as pointers, so a conversion of an
+ unsafe.Pointer to *T is (unsoundly) treated equivalent to new(T).
+
+Channels
+ An expression of type 'chan T' is a kind of pointer that points
+ exclusively to channel objects, i.e. objects created by MakeChan (or
+ reflection).
+
+ 'chan T' is treated like *T.
+ *ssa.MakeChan is treated as equivalent to new(T).
+ *ssa.Send and receive (*ssa.UnOp(ARROW)) and are equivalent to store
+ and load.
+
+Maps
+ An expression of type 'map[K]V' is a kind of pointer that points
+ exclusively to map objects, i.e. objects created by MakeMap (or
+ reflection).
+
+ map K[V] is treated like *M where M = struct{k K; v V}.
+ *ssa.MakeMap is equivalent to new(M).
+ *ssa.MapUpdate is equivalent to *y=x where *y and x have type M.
+ *ssa.Lookup is equivalent to y=x.v where x has type *M.
+
+Slices
+ A slice []T, which dynamically resembles a struct{array *T, len, cap int},
+ is treated as if it were just a *T pointer; the len and cap fields are
+ ignored.
+
+ *ssa.MakeSlice is treated like new([1]T): an allocation of a
+ singleton array.
+ *ssa.Index on a slice is equivalent to a load.
+ *ssa.IndexAddr on a slice returns the address of the sole element of the
+ slice, i.e. the same address.
+ *ssa.Slice is treated as a simple copy.
+
+Functions
+ An expression of type 'func...' is a kind of pointer that points
+ exclusively to function objects.
+
+ A function object has the following layout:
+
+ identity -- typ:*types.Signature; obj.flags ⊇ {otFunction}
+ params_0 -- (the receiver, if a method)
+ ...
+ params_n-1
+ results_0
+ ...
+ results_m-1
+
+ There may be multiple function objects for the same *ssa.Function
+ due to context-sensitive treatment of some functions.
+
+ The first node is the function's identity node.
+ Associated with every callsite is a special "targets" variable,
+ whose pts() contains the identity node of each function to which
+ the call may dispatch. Identity words are not otherwise used during
+ the analysis, but we construct the call graph from the pts()
+ solution for such nodes.
+
+ The following block of contiguous nodes represents the flattened-out
+ types of the parameters ("P-block") and results ("R-block") of the
+ function object.
+
+ The treatment of free variables of closures (*ssa.FreeVar) is like
+ that of global variables; it is not context-sensitive.
+ *ssa.MakeClosure instructions create copy edges to Captures.
+
+ A Go value of type 'func' (i.e. a pointer to one or more functions)
+ is a pointer whose pts() contains function objects. The valueNode()
+ for an *ssa.Function returns a singleton for that function.
+
+Interfaces
+ An expression of type 'interface{...}' is a kind of pointer that
+ points exclusively to tagged objects. All tagged objects pointed to
+ by an interface are direct (the otIndirect flag is clear) and
+ concrete (the tag type T is not itself an interface type). The
+ associated ssa.Value for an interface's tagged objects may be an
+ *ssa.MakeInterface instruction, or nil if the tagged object was
+ created by an instrinsic (e.g. reflection).
+
+ Constructing an interface value causes generation of constraints for
+ all of the concrete type's methods; we can't tell a priori which
+ ones may be called.
+
+ TypeAssert y = x.(T) is implemented by a dynamic constraint
+ triggered by each tagged object O added to pts(x): a typeFilter
+ constraint if T is an interface type, or an untag constraint if T is
+ a concrete type. A typeFilter tests whether O.typ implements T; if
+ so, O is added to pts(y). An untagFilter tests whether O.typ is
+ assignable to T,and if so, a copy edge O.v -> y is added.
+
+ ChangeInterface is a simple copy because the representation of
+ tagged objects is independent of the interface type (in contrast
+ to the "method tables" approach used by the gc runtime).
+
+ y := Invoke x.m(...) is implemented by allocating contiguous P/R
+ blocks for the callsite and adding a dynamic rule triggered by each
+ tagged object added to pts(x). The rule adds param/results copy
+ edges to/from each discovered concrete method.
+
+ (Q. Why do we model an interface as a pointer to a pair of type and
+ value, rather than as a pair of a pointer to type and a pointer to
+ value?
+ A. Control-flow joins would merge interfaces ({T1}, {V1}) and ({T2},
+ {V2}) to make ({T1,T2}, {V1,V2}), leading to the infeasible and
+ type-unsafe combination (T1,V2). Treating the value and its concrete
+ type as inseparable makes the analysis type-safe.)
+
+reflect.Value
+ A reflect.Value is modelled very similar to an interface{}, i.e. as
+ a pointer exclusively to tagged objects, but with two generalizations.
+
+ 1) a reflect.Value that represents an lvalue points to an indirect
+ (obj.flags ⊇ {otIndirect}) tagged object, which has a similar
+ layout to an tagged object except that the value is a pointer to
+ the dynamic type. Indirect tagged objects preserve the correct
+ aliasing so that mutations made by (reflect.Value).Set can be
+ observed.
+
+ Indirect objects only arise when an lvalue is derived from an
+ rvalue by indirection, e.g. the following code:
+
+ type S struct { X T }
+ var s S
+ var i interface{} = &s // i points to a *S-tagged object (from MakeInterface)
+ v1 := reflect.ValueOf(i) // v1 points to same *S-tagged object as i
+ v2 := v1.Elem() // v2 points to an indirect S-tagged object, pointing to s
+ v3 := v2.FieldByName("X") // v3 points to an indirect int-tagged object, pointing to s.X
+ v3.Set(y) // pts(s.X) ⊇ pts(y)
+
+ Whether indirect or not, the concrete type of the tagged object
+ corresponds to the user-visible dynamic type, and the existence
+ of a pointer is an implementation detail.
+
+ (NB: indirect tagged objects are not yet implemented)
+
+ 2) The dynamic type tag of a tagged object pointed to by a
+ reflect.Value may be an interface type; it need not be concrete.
+
+ This arises in code such as this:
+ tEface := reflect.TypeOf(new(interface{}).Elem() // interface{}
+ eface := reflect.Zero(tEface)
+ pts(eface) is a singleton containing an interface{}-tagged
+ object. That tagged object's payload is an interface{} value,
+ i.e. the pts of the payload contains only concrete-tagged
+ objects, although in this example it's the zero interface{} value,
+ so its pts is empty.
+
+reflect.Type
+ Just as in the real "reflect" library, we represent a reflect.Type
+ as an interface whose sole implementation is the concrete type,
+ *reflect.rtype. (This choice is forced on us by go/types: clients
+ cannot fabricate types with arbitrary method sets.)
+
+ rtype instances are canonical: there is at most one per dynamic
+ type. (rtypes are in fact large structs but since identity is all
+ that matters, we represent them by a single node.)
+
+ The payload of each *rtype-tagged object is an *rtype pointer that
+ points to exactly one such canonical rtype object. We exploit this
+ by setting the node.typ of the payload to the dynamic type, not
+ '*rtype'. This saves us an indirection in each resolution rule. As
+ an optimisation, *rtype-tagged objects are canonicalized too.
+
+
+Aggregate types:
+
+Aggregate types are treated as if all directly contained
+aggregates are recursively flattened out.
+
+Structs
+ *ssa.Field y = x.f creates a simple edge to y from x's node at f's offset.
+
+ *ssa.FieldAddr y = &x->f requires a dynamic closure rule to create
+ simple edges for each struct discovered in pts(x).
+
+ The nodes of a struct consist of a special 'identity' node (whose
+ type is that of the struct itself), followed by the nodes for all
+ the struct's fields, recursively flattened out. A pointer to the
+ struct is a pointer to its identity node. That node allows us to
+ distinguish a pointer to a struct from a pointer to its first field.
+
+ Field offsets are logical field offsets (plus one for the identity
+ node), so the sizes of the fields can be ignored by the analysis.
+
+ (The identity node is non-traditional but enables the distinction
+ described above, which is valuable for code comprehension tools.
+ Typical pointer analyses for C, whose purpose is compiler
+ optimization, must soundly model unsafe.Pointer (void*) conversions,
+ and this requires fidelity to the actual memory layout using physical
+ field offsets.)
+
+ *ssa.Field y = x.f creates a simple edge to y from x's node at f's offset.
+
+ *ssa.FieldAddr y = &x->f requires a dynamic closure rule to create
+ simple edges for each struct discovered in pts(x).
+
+Arrays
+ We model an array by an identity node (whose type is that of the
+ array itself) followed by a node representing all the elements of
+ the array; the analysis does not distinguish elements with different
+ indices. Effectively, an array is treated like struct{elem T}, a
+ load y=x[i] like y=x.elem, and a store x[i]=y like x.elem=y; the
+ index i is ignored.
+
+ A pointer to an array is pointer to its identity node. (A slice is
+ also a pointer to an array's identity node.) The identity node
+ allows us to distinguish a pointer to an array from a pointer to one
+ of its elements, but it is rather costly because it introduces more
+ offset constraints into the system. Furthermore, sound treatment of
+ unsafe.Pointer would require us to dispense with this node.
+
+ Arrays may be allocated by Alloc, by make([]T), by calls to append,
+ and via reflection.
+
+Tuples (T, ...)
+ Tuples are treated like structs with naturally numbered fields.
+ *ssa.Extract is analogous to *ssa.Field.
+
+ However, tuples have no identity field since by construction, they
+ cannot be address-taken.
+
+
+FUNCTION CALLS
+
+ There are three kinds of function call:
+ (1) static "call"-mode calls of functions.
+ (2) dynamic "call"-mode calls of functions.
+ (3) dynamic "invoke"-mode calls of interface methods.
+ Cases 1 and 2 apply equally to methods and standalone functions.
+
+ Static calls.
+ A static call consists three steps:
+ - finding the function object of the callee;
+ - creating copy edges from the actual parameter value nodes to the
+ P-block in the function object (this includes the receiver if
+ the callee is a method);
+ - creating copy edges from the R-block in the function object to
+ the value nodes for the result of the call.
+
+ A static function call is little more than two struct value copies
+ between the P/R blocks of caller and callee:
+
+ callee.P = caller.P
+ caller.R = callee.R
+
+ Context sensitivity
+
+ Static calls (alone) may be treated context sensitively,
+ i.e. each callsite may cause a distinct re-analysis of the
+ callee, improving precision. Our current context-sensitivity
+ policy treats all intrinsics and getter/setter methods in this
+ manner since such functions are small and seem like an obvious
+ source of spurious confluences, though this has not yet been
+ evaluated.
+
+ Dynamic function calls
+
+ Dynamic calls work in a similar manner except that the creation of
+ copy edges occurs dynamically, in a similar fashion to a pair of
+ struct copies in which the callee is indirect:
+
+ callee->P = caller.P
+ caller.R = callee->R
+
+ (Recall that the function object's P- and R-blocks are contiguous.)
+
+ Interface method invocation
+
+ For invoke-mode calls, we create a params/results block for the
+ callsite and attach a dynamic closure rule to the interface. For
+ each new tagged object that flows to the interface, we look up
+ the concrete method, find its function object, and connect its P/R
+ blocks to the callsite's P/R blocks, adding copy edges to the graph
+ during solving.
+
+ Recording call targets
+
+ The analysis notifies its clients of each callsite it encounters,
+ passing a CallSite interface. Among other things, the CallSite
+ contains a synthetic constraint variable ("targets") whose
+ points-to solution includes the set of all function objects to
+ which the call may dispatch.
+
+ It is via this mechanism that the callgraph is made available.
+ Clients may also elect to be notified of callgraph edges directly;
+ internally this just iterates all "targets" variables' pts(·)s.
+
+
+PRESOLVER
+
+We implement Hash-Value Numbering (HVN), a pre-solver constraint
+optimization described in Hardekopf & Lin, SAS'07. This is documented
+in more detail in hvn.go. We intend to add its cousins HR and HU in
+future.
+
+
+SOLVER
+
+The solver is currently a naive Andersen-style implementation; it does
+not perform online cycle detection, though we plan to add solver
+optimisations such as Hybrid- and Lazy- Cycle Detection from (Hardekopf
+& Lin, PLDI'07).
+
+It uses difference propagation (Pearce et al, SQC'04) to avoid
+redundant re-triggering of closure rules for values already seen.
+
+Points-to sets are represented using sparse bit vectors (similar to
+those used in LLVM and gcc), which are more space- and time-efficient
+than sets based on Go's built-in map type or dense bit vectors.
+
+Nodes are permuted prior to solving so that object nodes (which may
+appear in points-to sets) are lower numbered than non-object (var)
+nodes. This improves the density of the set over which the PTSs
+range, and thus the efficiency of the representation.
+
+Partly thanks to avoiding map iteration, the execution of the solver is
+100% deterministic, a great help during debugging.
+
+
+FURTHER READING
+
+Andersen, L. O. 1994. Program analysis and specialization for the C
+programming language. Ph.D. dissertation. DIKU, University of
+Copenhagen.
+
+David J. Pearce, Paul H. J. Kelly, and Chris Hankin. 2004. Efficient
+field-sensitive pointer analysis for C. In Proceedings of the 5th ACM
+SIGPLAN-SIGSOFT workshop on Program analysis for software tools and
+engineering (PASTE '04). ACM, New York, NY, USA, 37-42.
+http://doi.acm.org/10.1145/996821.996835
+
+David J. Pearce, Paul H. J. Kelly, and Chris Hankin. 2004. Online
+Cycle Detection and Difference Propagation: Applications to Pointer
+Analysis. Software Quality Control 12, 4 (December 2004), 311-337.
+http://dx.doi.org/10.1023/B:SQJO.0000039791.93071.a2
+
+David Grove and Craig Chambers. 2001. A framework for call graph
+construction algorithms. ACM Trans. Program. Lang. Syst. 23, 6
+(November 2001), 685-746.
+http://doi.acm.org/10.1145/506315.506316
+
+Ben Hardekopf and Calvin Lin. 2007. The ant and the grasshopper: fast
+and accurate pointer analysis for millions of lines of code. In
+Proceedings of the 2007 ACM SIGPLAN conference on Programming language
+design and implementation (PLDI '07). ACM, New York, NY, USA, 290-299.
+http://doi.acm.org/10.1145/1250734.1250767
+
+Ben Hardekopf and Calvin Lin. 2007. Exploiting pointer and location
+equivalence to optimize pointer analysis. In Proceedings of the 14th
+international conference on Static Analysis (SAS'07), Hanne Riis
+Nielson and Gilberto Filé (Eds.). Springer-Verlag, Berlin, Heidelberg,
+265-280.
+
+Atanas Rountev and Satish Chandra. 2000. Off-line variable substitution
+for scaling points-to analysis. In Proceedings of the ACM SIGPLAN 2000
+conference on Programming language design and implementation (PLDI '00).
+ACM, New York, NY, USA, 47-56. DOI=10.1145/349299.349310
+http://doi.acm.org/10.1145/349299.349310
+
+*/
+package pointer // import "golang.org/x/tools/go/pointer"