Wednesday, February 27, 2002

Here is a benchmark for compiler interfaces
http://www.cs.utoronto.ca/~simsuz/cascon2001/bm.html
There is a benchmark for what information is to be expected from the compilers.
Very interesting.

Example XML Code from GCC_XML
http://public.kitware.com/GCC_XML/HTML/example1.xml
Blogger does not support XML, so you will have to check it out directly.

/* type information for enumerations */
class __enum_type_info
: public std::type_info
{
/* abi defined member functions */
public:
virtual ~__enum_type_info ();
public:
explicit __enum_type_info (const char *__n)
: std::type_info (__n)
{ }
};

Table of demangling code characters
The following special characters are used in mangling:
`A'
Indicates a C array type.
`b'
Encodes the C bool type, and the Java boolean type.
`c'
Encodes the C char type, and the Java byte type.
`C'
A modifier to indicate a const type. Also used to indicate a const member function (in which cases it precedes the encoding of the method's class).
`d'
Encodes the C and Java double types.
`e'
Indicates extra unknown arguments ....
`f'
Encodes the C and Java float types.
`F'
Used to indicate a function type.
`H'
Used to indicate a template function.
`i'
Encodes the C and Java int types.
`J'
Indicates a complex type.
`l'
Encodes the C long type.
`P'
Indicates a pointer type. Followed by the type pointed to.
`Q'
Used to mangle qualified names, which arise from nested classes. Should also be used for namespaces (?). In Java used to mangle package-qualified names, and inner classes.
`r'
Encodes the GNU C long double type.
`R'
Indicates a reference type. Followed by the referenced type.
`s'
Encodes the C and java short types.
`S'
A modifier that indicates that the following integer type is signed. Only used with char. Also used as a modifier to indicate a static member function.
`t'
Indicates a tem

GOTTA LOOK INTO THIS
Using and Porting the GNU Compiler Collection (GCC) - 2 GCC Command Options -fsquangle
-fno-squangle
`-fsquangle' will enable a compressed form of name mangling for identifiers. In particular, it helps to shorten very long names by recognizing types and class names which occur more than once, replacing them with special short ID codes. This option also requires any C libraries being used to be compiled with this option as well. The compiler has this disabled (the equivalent of `-fno-squangle') by default. Like all options that change the ABI, all C code, including libgcc.a must be built with the same setting of this option.

Overview of Stab Format
There are three overall formats for stab assembler directives, differentiated by the first word of the stab. The name of the directive describes which combination of four possible data fields follows. It is either .stabs (string), .stabn (number), or .stabd (dot). IBM's XCOFF assembler uses .stabx (and some other directives such as .file and .bi) instead of .stabs, .stabn or .stabd.
The overall format of each class of stab is:
.stabs "string",type,other,desc,value
.stabn type,other,desc,value
.stabd type,other,desc
.stabx "string",value,type,sdb-type

For .stabn and .stabd, there is no string (the n_strx field is zero; see section Symbol Information in Symbol Tables). For .stabd, the value field is implicit and has the value of the current file location. For .stabx, the sdb-type field is unused for stabs and can always be set to zero. The other field is almost always unused and can be set to zero.
The number in the type field gives some basic information about which type of stab this is (or whether it is a stab, as opposed to an ordinary symbol). Each valid type number defines a different stab type; further, the stab type defines the exact interpretation of, and possible values for, any remaining string, desc, or value fields present in the stab. See section Table of Stab Types, for a list in numeric order of the valid type field values for stab directives.
The String Field
0

digit
(
Type reference; see section The String Field.
-
Reference to builtin type; see section Negative Type Numbers.
#
Method (C ); see section The `#' Type Descriptor.
*
Pointer; see section Miscellaneous Types.
&
Reference (C ).
@
Type Attributes (AIX); see section The String Field. Member (class and variable) type (GNU C ); see section The `@' Type Descriptor.
a
Array; see section Array Types.
A
Open array; see section Array Types.
b
Pascal space type (AIX); see section Miscellaneous Types. Builtin integer type (Sun); see section Defining Builtin Types Using Builtin Type Descriptors. Const and volatile qualfied type (OS9000).
B
Volatile-qualified type; see section Miscellaneous Types.
c
Complex builtin type (AIX); see section Defining Builtin Types Using Builtin Type Descriptors. Const-qualified type (OS9000).
C
COBOL Picture type. See AIX documentation for details.
d
File type; see section Miscellaneous Types.
D
N-dimensional dynamic array; see section Array Types.
e
Enumeration type; see section Enumerations.
E
N-dimensional subarray; see section Array Types.
f
Function type; see section Function Types.
F
Pascal function parameter; see section Function Types
g
Builtin floating point type; see section Defining Builtin Types Using Builtin Type Descriptors.
G
COBOL Group. See

STABS Overview of Stabs
Stabs refers to a format for information that describes a program to a debugger. This format was apparently invented by Peter Kessler at the University of California at Berkeley, for the pdx Pascal debugger; the format has spread widely since then.
This document is one of the few published sources of documentation on stabs. It is believed to be comprehensive for stabs used by C. The lists of symbol descriptors (see section Table of Symbol Descriptors) and type descriptors (see section Table of Type Descriptors) are believed to be completely comprehensive. Stabs for COBOL-specific features and for variant records (used by Pascal and Modula-2) are poorly documented here.
Other sources of information on stabs are Dbx and Dbxtool Interfaces, 2nd edition, by Sun, 1988, and AIX Version 3.2 Files Reference, Fourth Edition, September 1992, "dbx Stabstring Grammar" in the a.out section, page 2-31. This document is believed to incorporate the information from those two sources except where it explicitly directs you to them for more information.

This could be really useful for producing a name for types and such.
Zack Weinberg - C error printer cleanup, phase 1 The C front end has its own set of error-printing routines which
permit it to print several different language constructs in error
messages. For instance,

cp_error ("type `%T' is not a base type for type `%T'", basetype, type);

This naturally duplicates a lot of machinery. It is also severely
type-unsafe. For instance, it feeds inaccurate type information to
va_arg, and it calls function pointers with the wrong signature. I'm
somewhat surprised it worked at all. This is the first of a series of
three patches which aim to clean up the mess.

This is really interesting

(C ) patch to tweak name formatting decl_to_string (decl, verbose)


MARC: Mailing list ARChives at AIMS Search the gcc mailling list at MARC

What does search tree do and how is it used?
Re: sorry, not implemented: initializer contains unrecognizedtree code
t = search_tree (t, no_linkage_helper);

What does this opname_tab do?
Re: sorry, not implemented: initializer contains unrecognizedtree code diff -c -p -r1.127 lex.c
*** lex.c 1999/08/03 10:18:13 1.127
--- lex.c 1999/08/06 15:01:25
*************** init_parse (filename)
*** 761,766 ****
--- 761,767 ----
opname_tab[(int) CEIL_MOD_EXPR] = "(ceiling %)";
opname_tab[(int) FLOOR_MOD_EXPR] = "(floor %)";
opname_tab[(int) ROUND_MOD_EXPR] = "(round %)";
opname_tab[(int) EXACT_DIV_EXPR] = "/";

We nee dto look into symbol tables
METHOD 2 - SYMBOL TABLE OF LISTS/STACKS
Idea: Maintain a single symtab. Individual stacks, one for each name,
replace stack(list) of symbol tables. When parsing block b, stack
associated w/ each name x includes info about decl of x in b or in
scopes that enclose b. Level # attribute is used to determine whether
a decl was made in b's scope or in an enclosing scope.

(1) On block entry: increment level # (after entering proc name into symbol
table)
(2) See decl of x: look up x in symtab;
if x is there, fetch level # from the top of stack entry
if that level# = current level # then multi decl'd
else /* x is there, top of stack level # not equal current */
push (current level # and other attributes)
else /* x not in symbol table */
add x to symtab;
push (current level # and other attributes) onto x's stack
(3) See use of x: lookup x in symtab if not there or stack is empty then error.
(4) On scope exit:
(a) Scan entries in symtab popping every top-of-stack entry that has
current level #
(b) decrement level #

Source File datatype.h /* A data structure to contain the information for a typedef. */
112 | struct _Typedef
113 | {
114 | char* comment; /* The comment for the type definition. */
115 |
116 | char* name; /* The name of the defined type. */
117 |
118 | char* type; /* The type of the definition. */
119 | StructUnion sutype; /* The type of the definition if it is a locally declared struct / union. */
120 | Typedef typexref; /* The type of the definition if it is not locally declared or a repeat definition. */
121 |
122 | int lineno; /* The line number that this type definition appears on. */
123 |
124 | Typedef next; /* A pointer to the next item. */
125 | };

The Cxref Homepage The Cxref Homepage
C Cross Referencing & Documenting tool
Cxref is a program that will produce documentation (in LaTeX, HTML, RTF or SGML) including cross-references from C program source code.
It has been designed to work with ANSI C, incorporating K&R, and most popular GNU extensions.
(The cxref program only works for C not C , I have no plans to produce a C version.)
The documentation for the program is produced from comments in the code that are appropriately formatted. The cross referencing comes from the code itself and requires no extra work.
The documentation is produced for each of the following:
Files
A comment that applies to the whole file.
Functions
A comment for the function, including a description of each of the arguments and the return value.
Variables
A comment for each of a group of variables and/or individual variables.
#include
A comment for each included file.
#define
A comment for each pre-processor symbol definition, and for macro arguments.
Type definitions
A comment for each defined type and for each element of a structure or union type.

Linux Applications and Utilities Page - Programming Linux Applications and Utilities Page
Programming and
Development Tools

Dear Brad and All on the gccxml list,

I have been working parallel to brad on an C Interface to the compiler,
with an XML dumper as well.
The "Introspector" is project has been set-up to host an extension to the
GCC compiler to allow it to
reflect upon itself, introspect about itself, and even have a meta-object
protocol to creating new code on the fly.

This project is not in direct competition with the gccxml because of
following reasons
a: it does not interface with the c compiler, but the c compiler.
b: it does not dump out the structure of the project directly, but
the structure of the internal representation.
c: I am hoping to use what brad has done in to help bootstrap the
gcc compiler.

After long thought, I have decided to place the project under sourceforge to
give it a proper platform for growth.
The project is located in sourceforge under the URL :
https://sourceforge.net/projects/introspector/

I would like to get your feedback on this project
Here is a link to the survey
https://sourceforge.net/survey/survey.php?group_id=19878&survey_id=11340

I am sorry if I keep on polluting the gccxml list news from a my project,
and I will not do so any more.

I would be happy to have you try out the code when it is ready to run,
and will be sure to try and merge our projects when the time is right.

Thanks,