+++ /dev/null
-// Copyright 2013 The Go Authors. All rights reserved.
-// Use of this source code is governed by a BSD-style
-// license that can be found in the LICENSE file.
-
-/*
-
-Package pointer implements Andersen's analysis, an inclusion-based
-pointer analysis algorithm first described in (Andersen, 1994).
-
-A pointer analysis relates every pointer expression in a whole program
-to the set of memory locations to which it might point. This
-information can be used to construct a call graph of the program that
-precisely represents the destinations of dynamic function and method
-calls. It can also be used to determine, for example, which pairs of
-channel operations operate on the same channel.
-
-The package allows the client to request a set of expressions of
-interest for which the points-to information will be returned once the
-analysis is complete. In addition, the client may request that a
-callgraph is constructed. The example program in example_test.go
-demonstrates both of these features. Clients should not request more
-information than they need since it may increase the cost of the
-analysis significantly.
-
-
-CLASSIFICATION
-
-Our algorithm is INCLUSION-BASED: the points-to sets for x and y will
-be related by pts(y) ⊇ pts(x) if the program contains the statement
-y = x.
-
-It is FLOW-INSENSITIVE: it ignores all control flow constructs and the
-order of statements in a program. It is therefore a "MAY ALIAS"
-analysis: its facts are of the form "P may/may not point to L",
-not "P must point to L".
-
-It is FIELD-SENSITIVE: it builds separate points-to sets for distinct
-fields, such as x and y in struct { x, y *int }.
-
-It is mostly CONTEXT-INSENSITIVE: most functions are analyzed once,
-so values can flow in at one call to the function and return out at
-another. Only some smaller functions are analyzed with consideration
-of their calling context.
-
-It has a CONTEXT-SENSITIVE HEAP: objects are named by both allocation
-site and context, so the objects returned by two distinct calls to f:
- func f() *T { return new(T) }
-are distinguished up to the limits of the calling context.
-
-It is a WHOLE PROGRAM analysis: it requires SSA-form IR for the
-complete Go program and summaries for native code.
-
-See the (Hind, PASTE'01) survey paper for an explanation of these terms.
-
-
-SOUNDNESS
-
-The analysis is fully sound when invoked on pure Go programs that do not
-use reflection or unsafe.Pointer conversions. In other words, if there
-is any possible execution of the program in which pointer P may point to
-object O, the analysis will report that fact.
-
-
-REFLECTION
-
-By default, the "reflect" library is ignored by the analysis, as if all
-its functions were no-ops, but if the client enables the Reflection flag,
-the analysis will make a reasonable attempt to model the effects of
-calls into this library. However, this comes at a significant
-performance cost, and not all features of that library are yet
-implemented. In addition, some simplifying approximations must be made
-to ensure that the analysis terminates; for example, reflection can be
-used to construct an infinite set of types and values of those types,
-but the analysis arbitrarily bounds the depth of such types.
-
-Most but not all reflection operations are supported.
-In particular, addressable reflect.Values are not yet implemented, so
-operations such as (reflect.Value).Set have no analytic effect.
-
-
-UNSAFE POINTER CONVERSIONS
-
-The pointer analysis makes no attempt to understand aliasing between the
-operand x and result y of an unsafe.Pointer conversion:
- y = (*T)(unsafe.Pointer(x))
-It is as if the conversion allocated an entirely new object:
- y = new(T)
-
-
-NATIVE CODE
-
-The analysis cannot model the aliasing effects of functions written in
-languages other than Go, such as runtime intrinsics in C or assembly, or
-code accessed via cgo. The result is as if such functions are no-ops.
-However, various important intrinsics are understood by the analysis,
-along with built-ins such as append.
-
-The analysis currently provides no way for users to specify the aliasing
-effects of native code.
-
-------------------------------------------------------------------------
-
-IMPLEMENTATION
-
-The remaining documentation is intended for package maintainers and
-pointer analysis specialists. Maintainers should have a solid
-understanding of the referenced papers (especially those by H&L and PKH)
-before making making significant changes.
-
-The implementation is similar to that described in (Pearce et al,
-PASTE'04). Unlike many algorithms which interleave constraint
-generation and solving, constructing the callgraph as they go, this
-implementation for the most part observes a phase ordering (generation
-before solving), with only simple (copy) constraints being generated
-during solving. (The exception is reflection, which creates various
-constraints during solving as new types flow to reflect.Value
-operations.) This improves the traction of presolver optimisations,
-but imposes certain restrictions, e.g. potential context sensitivity
-is limited since all variants must be created a priori.
-
-
-TERMINOLOGY
-
-A type is said to be "pointer-like" if it is a reference to an object.
-Pointer-like types include pointers and also interfaces, maps, channels,
-functions and slices.
-
-We occasionally use C's x->f notation to distinguish the case where x
-is a struct pointer from x.f where is a struct value.
-
-Pointer analysis literature (and our comments) often uses the notation
-dst=*src+offset to mean something different than what it means in Go.
-It means: for each node index p in pts(src), the node index p+offset is
-in pts(dst). Similarly *dst+offset=src is used for store constraints
-and dst=src+offset for offset-address constraints.
-
-
-NODES
-
-Nodes are the key datastructure of the analysis, and have a dual role:
-they represent both constraint variables (equivalence classes of
-pointers) and members of points-to sets (things that can be pointed
-at, i.e. "labels").
-
-Nodes are naturally numbered. The numbering enables compact
-representations of sets of nodes such as bitvectors (or BDDs); and the
-ordering enables a very cheap way to group related nodes together. For
-example, passing n parameters consists of generating n parallel
-constraints from caller+i to callee+i for 0<=i<n.
-
-The zero nodeid means "not a pointer". For simplicity, we generate flow
-constraints even for non-pointer types such as int. The pointer
-equivalence (PE) presolver optimization detects which variables cannot
-point to anything; this includes not only all variables of non-pointer
-types (such as int) but also variables of pointer-like types if they are
-always nil, or are parameters to a function that is never called.
-
-Each node represents a scalar part of a value or object.
-Aggregate types (structs, tuples, arrays) are recursively flattened
-out into a sequential list of scalar component types, and all the
-elements of an array are represented by a single node. (The
-flattening of a basic type is a list containing a single node.)
-
-Nodes are connected into a graph with various kinds of labelled edges:
-simple edges (or copy constraints) represent value flow. Complex
-edges (load, store, etc) trigger the creation of new simple edges
-during the solving phase.
-
-
-OBJECTS
-
-Conceptually, an "object" is a contiguous sequence of nodes denoting
-an addressable location: something that a pointer can point to. The
-first node of an object has a non-nil obj field containing information
-about the allocation: its size, context, and ssa.Value.
-
-Objects include:
- - functions and globals;
- - variable allocations in the stack frame or heap;
- - maps, channels and slices created by calls to make();
- - allocations to construct an interface;
- - allocations caused by conversions, e.g. []byte(str).
- - arrays allocated by calls to append();
-
-Many objects have no Go types. For example, the func, map and chan type
-kinds in Go are all varieties of pointers, but their respective objects
-are actual functions (executable code), maps (hash tables), and channels
-(synchronized queues). Given the way we model interfaces, they too are
-pointers to "tagged" objects with no Go type. And an *ssa.Global denotes
-the address of a global variable, but the object for a Global is the
-actual data. So, the types of an ssa.Value that creates an object is
-"off by one indirection": a pointer to the object.
-
-The individual nodes of an object are sometimes referred to as "labels".
-
-For uniformity, all objects have a non-zero number of fields, even those
-of the empty type struct{}. (All arrays are treated as if of length 1,
-so there are no empty arrays. The empty tuple is never address-taken,
-so is never an object.)
-
-
-TAGGED OBJECTS
-
-An tagged object has the following layout:
-
- T -- obj.flags ⊇ {otTagged}
- v
- ...
-
-The T node's typ field is the dynamic type of the "payload": the value
-v which follows, flattened out. The T node's obj has the otTagged
-flag.
-
-Tagged objects are needed when generalizing across types: interfaces,
-reflect.Values, reflect.Types. Each of these three types is modelled
-as a pointer that exclusively points to tagged objects.
-
-Tagged objects may be indirect (obj.flags ⊇ {otIndirect}) meaning that
-the value v is not of type T but *T; this is used only for
-reflect.Values that represent lvalues. (These are not implemented yet.)
-
-
-ANALYSIS ABSTRACTION OF EACH TYPE
-
-Variables of the following "scalar" types may be represented by a
-single node: basic types, pointers, channels, maps, slices, 'func'
-pointers, interfaces.
-
-Pointers
- Nothing to say here, oddly.
-
-Basic types (bool, string, numbers, unsafe.Pointer)
- Currently all fields in the flattening of a type, including
- non-pointer basic types such as int, are represented in objects and
- values. Though non-pointer nodes within values are uninteresting,
- non-pointer nodes in objects may be useful (if address-taken)
- because they permit the analysis to deduce, in this example,
-
- var s struct{ ...; x int; ... }
- p := &s.x
-
- that p points to s.x. If we ignored such object fields, we could only
- say that p points somewhere within s.
-
- All other basic types are ignored. Expressions of these types have
- zero nodeid, and fields of these types within aggregate other types
- are omitted.
-
- unsafe.Pointers are not modelled as pointers, so a conversion of an
- unsafe.Pointer to *T is (unsoundly) treated equivalent to new(T).
-
-Channels
- An expression of type 'chan T' is a kind of pointer that points
- exclusively to channel objects, i.e. objects created by MakeChan (or
- reflection).
-
- 'chan T' is treated like *T.
- *ssa.MakeChan is treated as equivalent to new(T).
- *ssa.Send and receive (*ssa.UnOp(ARROW)) and are equivalent to store
- and load.
-
-Maps
- An expression of type 'map[K]V' is a kind of pointer that points
- exclusively to map objects, i.e. objects created by MakeMap (or
- reflection).
-
- map K[V] is treated like *M where M = struct{k K; v V}.
- *ssa.MakeMap is equivalent to new(M).
- *ssa.MapUpdate is equivalent to *y=x where *y and x have type M.
- *ssa.Lookup is equivalent to y=x.v where x has type *M.
-
-Slices
- A slice []T, which dynamically resembles a struct{array *T, len, cap int},
- is treated as if it were just a *T pointer; the len and cap fields are
- ignored.
-
- *ssa.MakeSlice is treated like new([1]T): an allocation of a
- singleton array.
- *ssa.Index on a slice is equivalent to a load.
- *ssa.IndexAddr on a slice returns the address of the sole element of the
- slice, i.e. the same address.
- *ssa.Slice is treated as a simple copy.
-
-Functions
- An expression of type 'func...' is a kind of pointer that points
- exclusively to function objects.
-
- A function object has the following layout:
-
- identity -- typ:*types.Signature; obj.flags ⊇ {otFunction}
- params_0 -- (the receiver, if a method)
- ...
- params_n-1
- results_0
- ...
- results_m-1
-
- There may be multiple function objects for the same *ssa.Function
- due to context-sensitive treatment of some functions.
-
- The first node is the function's identity node.
- Associated with every callsite is a special "targets" variable,
- whose pts() contains the identity node of each function to which
- the call may dispatch. Identity words are not otherwise used during
- the analysis, but we construct the call graph from the pts()
- solution for such nodes.
-
- The following block of contiguous nodes represents the flattened-out
- types of the parameters ("P-block") and results ("R-block") of the
- function object.
-
- The treatment of free variables of closures (*ssa.FreeVar) is like
- that of global variables; it is not context-sensitive.
- *ssa.MakeClosure instructions create copy edges to Captures.
-
- A Go value of type 'func' (i.e. a pointer to one or more functions)
- is a pointer whose pts() contains function objects. The valueNode()
- for an *ssa.Function returns a singleton for that function.
-
-Interfaces
- An expression of type 'interface{...}' is a kind of pointer that
- points exclusively to tagged objects. All tagged objects pointed to
- by an interface are direct (the otIndirect flag is clear) and
- concrete (the tag type T is not itself an interface type). The
- associated ssa.Value for an interface's tagged objects may be an
- *ssa.MakeInterface instruction, or nil if the tagged object was
- created by an instrinsic (e.g. reflection).
-
- Constructing an interface value causes generation of constraints for
- all of the concrete type's methods; we can't tell a priori which
- ones may be called.
-
- TypeAssert y = x.(T) is implemented by a dynamic constraint
- triggered by each tagged object O added to pts(x): a typeFilter
- constraint if T is an interface type, or an untag constraint if T is
- a concrete type. A typeFilter tests whether O.typ implements T; if
- so, O is added to pts(y). An untagFilter tests whether O.typ is
- assignable to T,and if so, a copy edge O.v -> y is added.
-
- ChangeInterface is a simple copy because the representation of
- tagged objects is independent of the interface type (in contrast
- to the "method tables" approach used by the gc runtime).
-
- y := Invoke x.m(...) is implemented by allocating contiguous P/R
- blocks for the callsite and adding a dynamic rule triggered by each
- tagged object added to pts(x). The rule adds param/results copy
- edges to/from each discovered concrete method.
-
- (Q. Why do we model an interface as a pointer to a pair of type and
- value, rather than as a pair of a pointer to type and a pointer to
- value?
- A. Control-flow joins would merge interfaces ({T1}, {V1}) and ({T2},
- {V2}) to make ({T1,T2}, {V1,V2}), leading to the infeasible and
- type-unsafe combination (T1,V2). Treating the value and its concrete
- type as inseparable makes the analysis type-safe.)
-
-reflect.Value
- A reflect.Value is modelled very similar to an interface{}, i.e. as
- a pointer exclusively to tagged objects, but with two generalizations.
-
- 1) a reflect.Value that represents an lvalue points to an indirect
- (obj.flags ⊇ {otIndirect}) tagged object, which has a similar
- layout to an tagged object except that the value is a pointer to
- the dynamic type. Indirect tagged objects preserve the correct
- aliasing so that mutations made by (reflect.Value).Set can be
- observed.
-
- Indirect objects only arise when an lvalue is derived from an
- rvalue by indirection, e.g. the following code:
-
- type S struct { X T }
- var s S
- var i interface{} = &s // i points to a *S-tagged object (from MakeInterface)
- v1 := reflect.ValueOf(i) // v1 points to same *S-tagged object as i
- v2 := v1.Elem() // v2 points to an indirect S-tagged object, pointing to s
- v3 := v2.FieldByName("X") // v3 points to an indirect int-tagged object, pointing to s.X
- v3.Set(y) // pts(s.X) ⊇ pts(y)
-
- Whether indirect or not, the concrete type of the tagged object
- corresponds to the user-visible dynamic type, and the existence
- of a pointer is an implementation detail.
-
- (NB: indirect tagged objects are not yet implemented)
-
- 2) The dynamic type tag of a tagged object pointed to by a
- reflect.Value may be an interface type; it need not be concrete.
-
- This arises in code such as this:
- tEface := reflect.TypeOf(new(interface{}).Elem() // interface{}
- eface := reflect.Zero(tEface)
- pts(eface) is a singleton containing an interface{}-tagged
- object. That tagged object's payload is an interface{} value,
- i.e. the pts of the payload contains only concrete-tagged
- objects, although in this example it's the zero interface{} value,
- so its pts is empty.
-
-reflect.Type
- Just as in the real "reflect" library, we represent a reflect.Type
- as an interface whose sole implementation is the concrete type,
- *reflect.rtype. (This choice is forced on us by go/types: clients
- cannot fabricate types with arbitrary method sets.)
-
- rtype instances are canonical: there is at most one per dynamic
- type. (rtypes are in fact large structs but since identity is all
- that matters, we represent them by a single node.)
-
- The payload of each *rtype-tagged object is an *rtype pointer that
- points to exactly one such canonical rtype object. We exploit this
- by setting the node.typ of the payload to the dynamic type, not
- '*rtype'. This saves us an indirection in each resolution rule. As
- an optimisation, *rtype-tagged objects are canonicalized too.
-
-
-Aggregate types:
-
-Aggregate types are treated as if all directly contained
-aggregates are recursively flattened out.
-
-Structs
- *ssa.Field y = x.f creates a simple edge to y from x's node at f's offset.
-
- *ssa.FieldAddr y = &x->f requires a dynamic closure rule to create
- simple edges for each struct discovered in pts(x).
-
- The nodes of a struct consist of a special 'identity' node (whose
- type is that of the struct itself), followed by the nodes for all
- the struct's fields, recursively flattened out. A pointer to the
- struct is a pointer to its identity node. That node allows us to
- distinguish a pointer to a struct from a pointer to its first field.
-
- Field offsets are logical field offsets (plus one for the identity
- node), so the sizes of the fields can be ignored by the analysis.
-
- (The identity node is non-traditional but enables the distinction
- described above, which is valuable for code comprehension tools.
- Typical pointer analyses for C, whose purpose is compiler
- optimization, must soundly model unsafe.Pointer (void*) conversions,
- and this requires fidelity to the actual memory layout using physical
- field offsets.)
-
- *ssa.Field y = x.f creates a simple edge to y from x's node at f's offset.
-
- *ssa.FieldAddr y = &x->f requires a dynamic closure rule to create
- simple edges for each struct discovered in pts(x).
-
-Arrays
- We model an array by an identity node (whose type is that of the
- array itself) followed by a node representing all the elements of
- the array; the analysis does not distinguish elements with different
- indices. Effectively, an array is treated like struct{elem T}, a
- load y=x[i] like y=x.elem, and a store x[i]=y like x.elem=y; the
- index i is ignored.
-
- A pointer to an array is pointer to its identity node. (A slice is
- also a pointer to an array's identity node.) The identity node
- allows us to distinguish a pointer to an array from a pointer to one
- of its elements, but it is rather costly because it introduces more
- offset constraints into the system. Furthermore, sound treatment of
- unsafe.Pointer would require us to dispense with this node.
-
- Arrays may be allocated by Alloc, by make([]T), by calls to append,
- and via reflection.
-
-Tuples (T, ...)
- Tuples are treated like structs with naturally numbered fields.
- *ssa.Extract is analogous to *ssa.Field.
-
- However, tuples have no identity field since by construction, they
- cannot be address-taken.
-
-
-FUNCTION CALLS
-
- There are three kinds of function call:
- (1) static "call"-mode calls of functions.
- (2) dynamic "call"-mode calls of functions.
- (3) dynamic "invoke"-mode calls of interface methods.
- Cases 1 and 2 apply equally to methods and standalone functions.
-
- Static calls.
- A static call consists three steps:
- - finding the function object of the callee;
- - creating copy edges from the actual parameter value nodes to the
- P-block in the function object (this includes the receiver if
- the callee is a method);
- - creating copy edges from the R-block in the function object to
- the value nodes for the result of the call.
-
- A static function call is little more than two struct value copies
- between the P/R blocks of caller and callee:
-
- callee.P = caller.P
- caller.R = callee.R
-
- Context sensitivity
-
- Static calls (alone) may be treated context sensitively,
- i.e. each callsite may cause a distinct re-analysis of the
- callee, improving precision. Our current context-sensitivity
- policy treats all intrinsics and getter/setter methods in this
- manner since such functions are small and seem like an obvious
- source of spurious confluences, though this has not yet been
- evaluated.
-
- Dynamic function calls
-
- Dynamic calls work in a similar manner except that the creation of
- copy edges occurs dynamically, in a similar fashion to a pair of
- struct copies in which the callee is indirect:
-
- callee->P = caller.P
- caller.R = callee->R
-
- (Recall that the function object's P- and R-blocks are contiguous.)
-
- Interface method invocation
-
- For invoke-mode calls, we create a params/results block for the
- callsite and attach a dynamic closure rule to the interface. For
- each new tagged object that flows to the interface, we look up
- the concrete method, find its function object, and connect its P/R
- blocks to the callsite's P/R blocks, adding copy edges to the graph
- during solving.
-
- Recording call targets
-
- The analysis notifies its clients of each callsite it encounters,
- passing a CallSite interface. Among other things, the CallSite
- contains a synthetic constraint variable ("targets") whose
- points-to solution includes the set of all function objects to
- which the call may dispatch.
-
- It is via this mechanism that the callgraph is made available.
- Clients may also elect to be notified of callgraph edges directly;
- internally this just iterates all "targets" variables' pts(·)s.
-
-
-PRESOLVER
-
-We implement Hash-Value Numbering (HVN), a pre-solver constraint
-optimization described in Hardekopf & Lin, SAS'07. This is documented
-in more detail in hvn.go. We intend to add its cousins HR and HU in
-future.
-
-
-SOLVER
-
-The solver is currently a naive Andersen-style implementation; it does
-not perform online cycle detection, though we plan to add solver
-optimisations such as Hybrid- and Lazy- Cycle Detection from (Hardekopf
-& Lin, PLDI'07).
-
-It uses difference propagation (Pearce et al, SQC'04) to avoid
-redundant re-triggering of closure rules for values already seen.
-
-Points-to sets are represented using sparse bit vectors (similar to
-those used in LLVM and gcc), which are more space- and time-efficient
-than sets based on Go's built-in map type or dense bit vectors.
-
-Nodes are permuted prior to solving so that object nodes (which may
-appear in points-to sets) are lower numbered than non-object (var)
-nodes. This improves the density of the set over which the PTSs
-range, and thus the efficiency of the representation.
-
-Partly thanks to avoiding map iteration, the execution of the solver is
-100% deterministic, a great help during debugging.
-
-
-FURTHER READING
-
-Andersen, L. O. 1994. Program analysis and specialization for the C
-programming language. Ph.D. dissertation. DIKU, University of
-Copenhagen.
-
-David J. Pearce, Paul H. J. Kelly, and Chris Hankin. 2004. Efficient
-field-sensitive pointer analysis for C. In Proceedings of the 5th ACM
-SIGPLAN-SIGSOFT workshop on Program analysis for software tools and
-engineering (PASTE '04). ACM, New York, NY, USA, 37-42.
-http://doi.acm.org/10.1145/996821.996835
-
-David J. Pearce, Paul H. J. Kelly, and Chris Hankin. 2004. Online
-Cycle Detection and Difference Propagation: Applications to Pointer
-Analysis. Software Quality Control 12, 4 (December 2004), 311-337.
-http://dx.doi.org/10.1023/B:SQJO.0000039791.93071.a2
-
-David Grove and Craig Chambers. 2001. A framework for call graph
-construction algorithms. ACM Trans. Program. Lang. Syst. 23, 6
-(November 2001), 685-746.
-http://doi.acm.org/10.1145/506315.506316
-
-Ben Hardekopf and Calvin Lin. 2007. The ant and the grasshopper: fast
-and accurate pointer analysis for millions of lines of code. In
-Proceedings of the 2007 ACM SIGPLAN conference on Programming language
-design and implementation (PLDI '07). ACM, New York, NY, USA, 290-299.
-http://doi.acm.org/10.1145/1250734.1250767
-
-Ben Hardekopf and Calvin Lin. 2007. Exploiting pointer and location
-equivalence to optimize pointer analysis. In Proceedings of the 14th
-international conference on Static Analysis (SAS'07), Hanne Riis
-Nielson and Gilberto Filé (Eds.). Springer-Verlag, Berlin, Heidelberg,
-265-280.
-
-Atanas Rountev and Satish Chandra. 2000. Off-line variable substitution
-for scaling points-to analysis. In Proceedings of the ACM SIGPLAN 2000
-conference on Programming language design and implementation (PLDI '00).
-ACM, New York, NY, USA, 47-56. DOI=10.1145/349299.349310
-http://doi.acm.org/10.1145/349299.349310
-
-*/
-package pointer // import "golang.org/x/tools/go/pointer"