module CGenerator

The CGenerator module is a framework for dynamically generating C extensions. It is a bit like Perl's inline but intended for a different purpose: managing incremental, structured additions to C source files, and compiling the code and loading the library just in time for execution. Whereas inline helps you write a C extension, CGenerator helps you write a Ruby program that generates C extensions. To put it another way, this is a Ruby interface to the Ruby C API.

The original use of CGenerator was as the back end of a compiler for mathematical expressions in C-like syntax involving limited Ruby subexpressions. In that case, CGenerator allowed the compiler to think about the syntax and semantics of the input expressions without having to worry about the high-level structure of the generated .c and .h files.

One potential use is quick-turnaround development and testing of C code, possibly using Ruby as a driver environment; the library under construction needn't be Ruby-specific. If SWIG didn't support Ruby, this framework could be the starting point for a program that generates wrapper code for existing libraries. Finally, a Ruby package that includes C extensions could benefit from being able to use Ruby code to dynamically specify the contents and control the build process during installation.

The CGenerator framework consists of two base classes, Accumulator and Template. Think of accumulators as blanks in a form and templates as the form around the blanks, except that accumulators and templates can nest within each other. The base classes have subclasses which hierarchically decompose the information managed by the framework. This hierarchy is achieved by inheritance along the parent attribute, which is secondary to subclass inheritance.

Templates

The main template in the CGenerator module is Library. It has accumulators for such constructs as including header files, declaring variables, declaring Ruby symbols, declaring classes, defining functions, and defining structs. Some accumulators, such as those for adding function and struct definitions, return a new template each time they are called. Those templates, in turn, have their own accumulators for structure members, function arguments, declarations, initialization, scope, etc.

Library templates

A Library corresponds to one main C source file and one shared C library (.so or .dll). It manages the Init_library code (including registration of methods), as well as user-specified declaration and initialization in the scope of the .c file and its corresponding .h file. All files generated in the process of building the library are kept in a directory with the same name as the library. Additional C files in this directory will be compiled and linked to the library.

Each library is the root of a template containment hierarchy, and it alone has a commit method. After client code has sent all desired fragments to the accumulators, calling the commit method uses the structure imposed by the sub-templates of the library to joins the fragments into two strings, one for the .h file and one for the .c file. Then each string is written to the corresponding file (only if the string is different from the current file contents), and the library is compiled (if necessary) and loaded.

Function templates

Function templates are used to define the functions in a Library. The base class, CGenerator::Function, generates a function (by default, static) in the library without registering it with Ruby in any way.

The CGenerator::RubyFunction templates define the function as above and also register the function as an instance method, module function, or singleton method of a specified class or module, or as a global function (a private method of Kernel).

Client code does not instantiate these templates directly, but instead uses the library's define accumulator methods, which return the new template.

The function template for the library's initialization function can be accessed using the library's init_library_function method, although direct access to this template is typically not needed. (Use the library's setup method to write code to the init_library_function.)

Struct templates

A struct template generates a typedef for a C struct. It can be external, in which case it is written to the .h file. It has a declare accumulator for adding data members.

Accumulators

Accumulators are a way of defining a hierarchical structure and populating it with data in such a way that the data can be serialized to a string at any point during the process without side effects. Templates are Accumulators which contain other accumulators and have convenience methods for accessing them from client code.

Accumulators can be fairly unstructured--they just accumulate in sequence whatever is sent to them, possibly with some filtering, which may include other accumulators. Templates are usually more more structured. In general, only Templates can be parents; other accumulators set the parent of each accumulated item to be the accumulator's parent, simplifying the parent hierarchy.

Accumulators are responsible for the format of each accumulated item, for joining the items to form a string when requested to do so, and for doing any necessary preprocessing on the items (e.g., discarding duplicates).

From the point of view of client code, accumulators are methods for "filling in the blanks" in templates. Client code doesn't access the accumulator object directly, only through a method on the template. For example:

lib.declare :global_int_array =>
              'int global_int_array[100]',
            :name =>
              'char *name'

is used to access the "declare" accumulator of the library (which is actually delegated to a file template).

Providing a key for each declaration (in the example, the keys are symbols, but they can be any hash keys) helps CGenerator reject repeated declarations. (Redundancy checking by simple string comparison is inadequate, because it would allow two declarations of different types, but the same name, or two declarations with insignificant whitespace differences.)

The basic Accumulator class adds fragments to an array in sequence. When converted to a string with to_s, it joins the fragments with newline separators. These behaviors change as needed in the subclasses. Note that the accumulated items need not all be strings, they need only respond to to_s.

Return values of accumulators are not very consistent: in general, an accumulator returns whatever is needed for the caller to continue working with the thing that was just accumulated. It might be a template which supports some other accumulators, or it might be a string which can be inserted in C code.

Some accumulators take existing Ruby objects as an argument. These accumulators typically return, as a Ruby symbol, the C identifier that has been defined or declared to refer to that Ruby object. This can be interpolated into C code to refer to the Ruby object from C.

Note about argument order: Since hashes are unordered, passing a hash of key-value pairs to declare or similar methods will not preserve the textual ordering. Internally, cgen sorts this hash into an array of pairs so that at least the result is deterministic, reducing recompilation. One can force an argument order by using an array of pairs.

lib.declare [[:global_int_array,
               'int global_int_array[100]'],
             [:name =>
               'char *name']

Alternately, simply break the declaration into multiple declares.

C code output

Format

Some effort is made to generate readable code. Relative tabbing within code fragments is preserved. One goal of CGenerator is producing Ruby extensions that can be saved and distributed with little or no modification (as opposed to just created and loaded on the fly).

Use of C identifiers

CGenerator attempts to generate C identifiers in non-conflicting ways... (prove some nice property)

Usage

Create a library with:

lib = CGenerator::Library.new "my_lib_name"

The name must be an identifier: /[A-Za-z0-9_]*/.

It is useful to keep a reference to lib around to send define and declare messages to.

Templates

All templates respond to library and file methods, which return the library or file object which contains the template. (The library itself does not respond to file.) They also respond to name and parent.

Library

Library#use_work_dir dir_name

Changes into dir_name, creating it first if necessary. Does nothing if alread in a diredctory of that name. Often used with "tmp".

Library#commit

Writes the files to disk, and makes and loads the library.

Note that commit must be called after all C code definitions for the library, but before instantiation of any objects that use those definitions. If a definition occurs after commit, or if instantiation occurs before commit, then a CGenerator::Library::CommitError is raised, with an appropriate message. Sometimes, this forces you to use many small libraries, each committed just in time for use. See examples/fixed-array.rb.

Library#committed?

True if the library has been committed.

Library#before_commit(&block)
Library#after_commit(&block)

Schedules block to run before or after Libarary#commit. The before blocks are run in the same order in which they were scheduled; the after blocks run in the reverse order (analogously with BEGIN/END). Each block is evaluated in the context in which it was created (instance_eval is not used), and it is passed the library as an argument.

Library#empty?

True if no content has been added to the library.

Library#add_file name

Creates templates for two files, a source (.c) file and an include (.h) file that will be generated in the same dir as the library. The base file name is taken from the argument. Returns an array containing the include file template and the source file template, in that order.

Functions can be added to the source file by calling define_method and similar methods on the source file template. Their rb_init calls are done in init_library_function in the main library source file. The new source file automatically #includes the library's main header file, as well as its own header file, and the library's main source file also #includes the new header file. Declarations can be added to the header file by calling declare on it, but in many cases this is taken care of automatically.

Library#extconf

Override extconf if you want to do more than just create_makefile. Note that create_makefile recognizes all .c files in the library directory, and generates a makefile that compiles them and links them into the dynamic library.

Library#write
Library#makedepend
Library#mkmf
Library#make arg = nil

Internal methods called, in sequence, by commit:

These methods can be overridden, but are more typically called directly. The argument to make is interpolated into the system call as a command line argument to the make program. If the argument is 'clean' or 'distclean' then the make log is deleted; if the argument is 'distclean' then all .c and .h files generated by write are deleted (additional user-supplied .c and .h files in the library dir are not affected).

Library#update_file f, template

Called by write on each .c and .h file to actually write template to the open file f. The default behavior is to compare the existing data with the generated data, and leave the file untouched if nothing changed. Subclasses may have more efficient ways of doing this. (For instance, check a version indicator in the file on disk, perhaps stored using the file's preamble accumulator. It is even possible to defer some entries in the template until after this check has been made: code that only needs to be regenerated if some specification has changed)

Library#purge_source_dir
Library#purge_source_dir= flag

Access the purge_source_dir attribute of a library, which controls what happens to .c, .h, and .o files in the source dir of the library that are not among those generated as part of the library. If this is set to :delete, then those files are deleted. Other true values cause the .c, .h, and .o files to be renamed with the .hide extension. (Note that this makes it difficult to keep manually written C files in the same dir.) False flag values (the default) cause CGen to leave the files untouched.

Note that, regardless of this setting, mkmf will construct a Makefile which lists all .c files that are in the source dir. If you do not delete obsolete files, they will be compiled into your library!

Library#init_library_function

Returns a Function template object; see below. This function is called when the library is loaded. Method definitions put stuff here to register methods with Ruby. Usually, there is no need to bother this guy directly. Use Library#setup instead.

Library#setup key => "statements", ...

Inserts code in the init_library_function, which is called when the library is loaded. The key is used for redundancy checking, as in the declare accumulators. Note that hashes are unordered, so constructs like

setup :x => "...", :y => "..."

can result in unpredictable order. To avoid this, use several setup calls.

Library#source_file
Library#include_file

Returns the template for the main source or include file of the library. Usually, there is no need to access these directly.

Library#define_c_function name, type

Defines a plain ol' C function. Returns a Function template (see below), or a template of the specified type, if given.

Library#define_c_method mod, name, subclass
Library#define_c_module_function mod, name, subclass
Library#define_c_global_function name, subclass
Library#define_c_singleton_method mod, name, subclass
Library#define_c_class_method mod, name, subclass

Defines a function of the specified name and type in the given class/module (or in the global scope), and returns the function template (often used with instance_eval to add arguments, code, etc.). The subclass argument is optional and allows the template to belong to a subclass of the function template it would normally belong to.

For example,

define_c_method String, "reverse"

The arguments accepted by the method automatically include self. By default, arguments are passed as individual C arguments, but the can be passed in a Ruby or C array. The latter has the advantage of argument parsing (based on rb_scan_args), defaults, and typechecking. See Method#c_array_args. define_c_class_method is just an alias for define_c_singleton_method.

Library#include "file1.h", "<file2.h>", ...

Insert the include statement(s) at the top of the library's main .c file. For convenience, <ruby.h> is included automatically, as is the header file of the library itself.

Library#declare :x => "int x", ...
Library#declare_extern :x => "int x", ...

Puts the string in the declaration area of the .c or .h file, respectively. The declaration area is before the function definitions, and after the structure declarations.

Library#declare_struct name, attributes=nil
Library#declare_extern_struct name, attributes=nil

Returns a Structure template, which generates to a typedefed C struct in the .c or .h file. The declare method of this template is used to add members.

Library#declare_class cl
Library#declare_module mod
Library#declare_symbol sym

Define a C variable which will be initialized to refer to the class, module, or symbol. These accumulators return the name of the C variable which will be generated and initialized to the ID of the symbol, and this return value can be interpolated into C calls to the Ruby API. (The arguments are the actual Ruby objects.) This is very useful in rb_ivar_get/set calls, and it avoids doing the lookup more than once:

...
declare :my_ivar => "VALUE my_ivar"
body %{
  my_ivar = rb_ivar_get(shadow->self, #{declare_symbol :@my_ivar});
  rb_ivar_set(shadow->self, #{declare_symbol :@my_ivar}, Qnil);
}

The second declaration notices that the library already has a variable that will be initialized to the ID of the symbol, and uses it.

Library#literal_symbol sym

Like Library#declare_symbol, but converts the ID to a VALUE at library initialization time. Useful for looking up hash values keyed by symbol objects, for example. sym is a string or symbol.

Library#show_times message

If the attribute show_times_flag is set to true, print the user and system times (and child user and child system on some platforms) and real time for each major step of the commit process. Display +message+.

File

File templates are managed by the Library, and most users do not need to interact with them directly. They are structured into four sections: includes, structure declarations, variable and function declarations, and function definitions. Each source file automatically includes its corresponding header file and the main header file for the library (which includes ruby.h). The main source file for the library includes each additional header file.

File#define_c_method
File#define_c_module_function
File#define_c_global_function
File#define_c_singleton_method

As for the Library, but can be used on any source file within the library. Used to break large projects up into many files.

File#preamble

An accumulator that wraps its input in C comments and places it at the head of the source file.

Function

Funtion#scope :static
Funtion#scope :extern
Funtion#arguments 'int x', 'double y', 'VALUE obj', ...
Funtion#return_type 'void'

These accumulators affect the prototype of the function, which will be placed in the declaration section of either the .h or the .c file, depending on the scope setting. The default scope is static. The default return type is 'void'.

For the Method subclasses of Function, argument and return types can be omitted, in which case they default to 'VALUE'.

Funtion#declare :x => "static double x", ...
Funtion#init "x = 0", ...
Funtion#setup 'x' => "x += 1", ...
Funtion#body 'y = sin(x); printf("%d\n", y)', ...

These four accumulators determine the contents of the function between the opening and closing braces. The init code is executed once when the function first runs; it's useful for initializing static data. The setup code runs each time the function is called, as does the body. Distinguishing setup from body is useful for two reasons: first, setup is guaranteed to execute before body, and, second, one can avoid setting up the same variable twice, because of the key.

Funtion#returns "2*x"

Specifies the string used in the final return statement of the function. Subsequent uses of this method clobber the previous value. Alternately, one can simply insert a "return" manually in the body.

Method

ModuleFunction

GlobalFunction

SingletonMethod

These subclasses of the Function template are designed for coding Ruby methods. The necessary registration (rb_define_method, etc.) is handled automatically. Defaults are different from Function: 'VALUE self' is automatically an argument, and argument and return types are assumed to be 'VALUE' and can be omitted by the caller. The return value is nil by default.

Method#arguments :arg1, :arg2, ...

The default way of specifying arguments. Allows a fixed number of VALUE arguments.

Method#c_array_args argc_name = 'argc', argv_name = 'argv', &block
Method#rb_array_args args_name = 'args'

Specifies that arguments are to be collected and passed in a C or Ruby array, instead of individually (which is the default). In each case, the array of actual arguments will be bound to a C parameter with the name specified. See the Ruby API documentation for details.

If a block is given to Method#c_array_args, it will be used to specify a call to the API function rb_scan_args and to declare the associated variables. For example:

c_array_args('argc', 'argv') {
  required :arg0, :arg1
  optional :arg2, :arg3, :arg4
  rest     :rest
  block    :block
}

declares all the listed symbols as variables of type VALUE in function scope, and arranges for the following to be called in the setup clause (i.e., before the body):

rb_scan_args(argc, argv, "23*&", &arg0, &arg1, &arg2, &arg3, &arg4, &rest, &block);

The 'argc', 'argv' are the default values and are usually omitted.

The lines in the block can occur in any order, and any line can be omitted. However, only one line of each kind should be used. In addition, each optional argument can be associated with a fragment of C code that will be executed to assign it a default value, if needed. For example, one can add the following lines to the above block:

default   :arg3 => "INT2NUM(7)",
          :arg4 => "INT2NUM(NUM2INT(arg2) + NUM2INT(arg3))"

Otherwise, optional arguments are assigned nil.

In this case, if arg4 is not provided by argv, then it is initialized using the code given. If, in addition, arg3 is not provided, then it too is initialized. These initializations happen in the setup clause of the Function template and are executed in the same order as the arguments are given in the optional line.

Finally, argument types can be checked automatically:

typecheck :arg2 => Numeric, :arg3 => Numeric

The value passed to the function must either be nil or match the type. Note that type checking happens before default assignment, so that default calculation code can assume types are correct. No typechecking code is generated if the type is Object.

Structure

Structure#declare :x => "int x"

Adds the specified string to define a structure member.

Utility functions

CGenerator.make_c_name s

Geenrates a unique C itentifier from the given Ruby identifier, which may include /[@$?!]/, '::', and even '.'. (Some special globals are not yet supported: $: and $-I, for example.)

It is unique in the sense that distinct Ruby identifiers map to distinct C identifiers. (Not completely checked. Might fail for some really obscure cases.)

String.tab n

Tabs left or right by n chars, using spaces.

String.tabto n

The first non-empty line is adjusted to have n spaces before the first nonspace. Additional lines are changed to preserve relative tabbing.

String.taballto n

Aligns each line to have n spaces before the first non-space.

(These routines probably don't work well, if at all, with "hard" tabs.)

Example

require 'cgen'

lib = CGenerator::Library.new "sample_lib"

class Point; end

lib.declare_extern_struct(:point).instance_eval {
  # make it extern so we can see it from another lib
  declare :x => "double x"
  declare :y => "double y"
}

lib.define_c_global_function(:new_point).instance_eval {
  arguments "x", "y"        # 'VALUE' is assumed
  declare :p => "point *p"
  declare :result => "VALUE result"
      # semicolons are added automatically
  body %{
    result = Data_Make_Struct(#{lib.declare_class Point}, point, 0, free, p);
    p->x = NUM2DBL(x);
    p->y = NUM2DBL(y);

//  might want to do something like this, too:
//  rb_funcall(result, #{lib.declare_symbol :initialize}, 0);
  }
  returns "result"
      # can put a return statement in the body, if preferred
}

for var in [:x, :y]   # metaprogramming in C!
  lib.define_c_method(Point, var).instance_eval {
    declare :p => "point *p"
    body %{
      Data_Get_Struct(self, point, p);
    }
    returns "rb_float_new(p->#{var})"
  }
end

# A utility function, available to other C files
lib.define_c_function("distance").instance_eval {
  arguments "point *p1", "point *p2"
  return_type "double"
  scope :extern
  returns "sqrt(pow(p1->x - p2->x, 2) + pow(p1->y - p2->y, 2))"
  include "<math.h>"
  # The include accumulator call propagates up the parent
  # hierarchy until something handles it. In this case,
  # the Library lib handles it by adding an include
  # directive to the .c file. This allows related, but
  # separate aspects of the C source to be handled in
  # the same place in the Ruby code. We could also have
  # called include directly on lib.
}

lib.define_c_method(Point, :distance).instance_eval {
  # no name conflict between this "distance" and the previous one,
  # because "method" and "Point" are both part of the C identifier
  # for this method
  arguments "other"
  declare :p => "point *p"
  declare :q => "point *q"
  body %{
    Data_Get_Struct(self, point, p);
    Data_Get_Struct(other, point, q);
  }
  returns "rb_float_new(distance(p, q))"
}

lib.commit # now you can use the new definitions

p1 = new_point(1, 2)
puts "p1: x is #{p1.x}, y is #{p1.y}"

p2 = new_point(5, 8)
puts "p2: x is #{p2.x}, y is #{p2.y}"

puts "distance from p1 to p2 is #{p1.distance p2}"

Output is:

p1: x is 1.0, y is 2.0
p2: x is 5.0, y is 8.0
distance from p1 to p2 is 7.211102551  

That's a lot of code to do a simple operation, compared with an Inline-style construct. CGenerator's value shows up with more complex tasks. The sample.rb file extends this example.

Notes

To do

version

CGenerator 0.14

The current version of this software can be found at http://redshift.sourceforge.net/cgen .

license

This software is distributed under the Ruby license. See http://www.ruby-lang.org.

author

Joel VanderWerf, vjoel@users.sourceforge.net