module CGenerator

The CGenerator module is a framework for dynamically generating C extensions. It is a bit like Perl's inline but intended for a different purpose: managing incremental, structured additions to C source files, and compiling the code and loading the library just in time for execution. Whereas inline helps you write a C extension, CGenerator helps you write a Ruby program that generates C extensions. To put it another way, this is a Ruby interface to the Ruby C API.

The original use of CGenerator was as the back end of a compiler for mathematical expressions in C-like syntax involving limited Ruby subexpressions. In that case, CGenerator allowed the compiler to think about the syntax and semantics of the input expressions without having to worry about the high-level structure of the generated .c and .h files.

One potential use is quick-turnaround development and testing of C code, possibly using Ruby as a driver environment; the library under construction needn't be Ruby-specific. If SWIG didn't support Ruby, this framework could be the starting point for a program that generates wrapper code for existing libraries. Finally, a Ruby package that includes C extensions could benefit from being able to use Ruby code to dynamically specify the contents and control the build process during installation.

The CGenerator framework consists of two base classes, Accumulator and Template. Think of accumulators as blanks in a form and templates as the form around the blanks, except that accumulators and templates can nest within each other. The base classes have subclasses which hierarchically decompose the information managed by the framework. This hierarchy is achieved by inheritance along the parent attribute, which is secondary to subclass inheritance.

Templates

The main template in the CGenerator module is Library. It has accumulators for such constructs as including header files, declaring variables, declaring Ruby symbols, defining classes, defining functions, and defining structs. Some accumulators, such as those for adding function and struct definitions, return a new template each time they are called. Those templates, in turn, have their own accumulators for structure members, function arguments, declarations, initialization, scope, etc.

Library templates

A Library corresponds to one main C source file and one shared C library (.so or .dll). It manages the Init_library code (including registration of methods), as well as user-specified declaration and initialization in the scope of the .c file and its corresponding .h file. All files generated in the process of building the library are kept in a directory with the same name as the library. Additional C files in this directory will be compiled and linked to the library.

Each library is the root of a template containment hierarchy, and it alone has a commit method. After client code has sent all desired fragments to the accumulators, calling the commit method uses the structure imposed by the sub-templates of the library to joins the fragments into two strings, one for the .h file and one for the .c file. Then each string is written to the corresponding file (only if the string is different from the current file contents), and the library is compiled (if necessary) and loaded.

Function templates

Function templates are used to define the functions in a Library. The base class, CGenerator::Function, generates a function (by default, static) in the library without registering it with Ruby in any way.

The CGenerator::Method template defines the function as above and also registers the function as an instance method, module function, or singleton method of a specified class or module, or as a global function (a private method of Kernel).

Client code does not instantiate these templates directly, but instead uses the library's define accumulator methods, which return the new template.

The function template for the library's initialization function can be accessed using the library's init_library_function method, although direct access to this template is typically not needed. (Use the library's setup method to write code to the init_library_function.)

Struct templates

A struct template generates a typedef for a C struct. It can be external, in which case it is written to the .h file. It has a declare accumulator for adding data members.

Accumulators

Accumulators are a way of defining a hierarchical structure and populating it with data in such a way that the data can be serialized to a string at any point during the process without side effects. Templates are Accumulators which contain other accumulators and have convenience methods for accessing them from client code.

Accumulators can be fairly unstructured--they just accumulate in sequence whatever is sent to them, possibly with some filtering, which may include other accumulators. Templates are usually more more structured. In general, only Templates can be parents; other accumulators set the parent of each accumulated item to be the accumulators parent, simplifying the parent hierarchy.

Accumulators are responsible for the format of each accumulated item, for joining the items to form a string when requested to do so, and for doing any necessary preprocessing on the items (e.g., discarding duplicates).

From the point of view of client code, accumulators are methods for "filling in the blanks" in templates. Client code doesn't access the accumulator object directly, only through a method on the template. For example:

lib.declare :global_int_array =>
              'int global_int_array[100]',
            :name =>
              'char *name'

is used to access the "declare" accumulator of the library (which is actually delegated to a file template).

Providing a key for each declaration (in the example, the keys are symbols, but they can be any hash keys) helps CGenerator reject repeated declarations. (Redundancy checking by simple string comparison is inadequate, because it would allow two declarations of different types, but the same name, or two declarations with insignificant whitespace differences.)

The basic Accumulator class adds fragments to an array in sequence. When converted to a string with to_s, it joins the fragments with newline separators. These behaviors change as needed in the subclasses. Note that the accumulated items need not all be strings, they need only respond to to_s.

Return values of accumulators are not very consistent: in general, an accumulator returns whatever is needed for the caller to continue working with the thing that was just accumulated. It might be a template which supports some other accumulators, or it might be a string which can be inserted in C code.

Some accumulators take existing Ruby objects as an argument. These accumulators typically return, as a Ruby symbol, the C identifier that has been defined or declared to refer to that Ruby object. This can be interpolated into C code to refer to the Ruby object from C.

C code output

Format

Some effort is made to generate readable code. Relative tabbing within code fragments is preserved. One goal of CGenerator is producing Ruby extensions that can be saved and distributed with little or no modification (as opposed to just created and loaded on the fly).

Use of C identifiers

CGenerator attempts to generate C identifiers in non-conflicting ways... (prove some nice property)

Usage

Create a library with:

lib = CGenerator::Library.new "my_lib_name"

The name must be an identifier: /[A-Za-z0-9_]/.

Keep a reference to the lib around to send define and declare messages to.

Templates

All templates respond to library and file methods, which return the library or file object which contains the template. (The library itself does not respond to file.) They also respond to name and parent.

Library

Library#commit

Writes the files to disk, and makes and loads the library.

Library#committed?

True if the library has been committed.

Library#empty?

True if no content has been added to the library.

Library#extconf

Override this if you want to do more than just create_makefile.

Library#init_library_function

Returns a Function template object; see below. This function is called when the library is loaded. Method definitions put stuff here to register methods with Ruby. Usually, there is no need to bother this guy directly. Use Library#setup instead.

Library#lib.setup key => "statements", ...

Inserts code in the init_library_function, which is called when the library is loaded. The key is used for redundancy checking, as in the declare accumulators.

Library#define name

Defines a plain ol' C function. Returns a Function template (see below).

Library#define_method mod, name
Library#define_module_function mod, name
Library#define_global_function name
Library#define_singleton_method mod, name

Defines a function of the specified name and type in the given class/module (or in the global scope). For example,

define_method String, "reverse"
Library#include "file1.h", "<file2.h>", ...

Insert the include statement(s) at the top of the library's main .c file. For convenience, <ruby.h> is included automatically, as is the header file of the library itself.

Library#declare :x => "int x", ...
Library#declare_extern :x => "int x", ...

Puts the string in the declaration area of the .c or .h file, respectively. The declaration area is before the function definitions, and after the structure declarations.

Library#declare_struct name
Library#declare_extern_struct name

Returns a Structure template, which generates to a typedefed C struct in the .c or .h file. The declare method of this template is used to add members.

Library#declare_class cl
Library#declare_module mod
Library#declare_symbol sym

Define a C variable which will be initialized to refer to the class, module, or symbol. These accumulators return the c-name, which can be interpolated into C calls to the Ruby API. (The arguments are the actual Ruby objects.)

File

File templates are managed by the Library, and most users do not need to interact with them directly. They are structured into four sections: includes, structure declarations, variable and function declarations, and function definitions.

Function

Funtion#scope :static
Funtion#scope :extern
Funtion#arguments 'int x', 'double y', 'VALUE obj', ...
Funtion#return_type 'void'

These accumulators affect the prototype of the function, which will be placed in the declaration section of either the .h or the .c file, depending on the scope setting. The default scope is static. The default return type is 'void'.

For the Method subclasses of Function, argument and return types can be omitted, in which case they default to 'VALUE'.

Funtion#declare :x => "static double x", ...
Funtion#init "x = 0", ...
Funtion#setup 'x' => "x += 1", ...
Funtion#body 'y = sin(x); printf("%d\n", y)', ...

These four accumulators determine the contents of the function between the opening and closing braces. The init code is executed once when the function first runs; it's useful for initializing static data. The setup code runs each time the function is called, as does the body. Distinguishing setup from body is useful for two reasons: first, setup is guaranteed to execute before body, and, second, once can avoid setting up the same variable twice, because of the key.

Funtion#returns "2*x"

Specifies the string used in the final return statement of the function. Subsequent uses of this method clobber the previous value. Alternately, one can simply insert a "return" manually in the body.

Method

ModuleFunction

GlobalFunction

SingletonMethod

These subclasses of the Function template are designed for coding Ruby methods. The necessary registration (rb_define_method, etc.) is handled automatically. Defaults are different from Function: 'VALUE self' is automatically an argument, and argument and return types are assumed to be 'VALUE' and can be omitted by the caller.

Method#c_array_args
Method#rb_array_args

Specifies that arguments are to be collected and passed in a C or Ruby array, instead of individually. See the Ruby API documentation for details.

Structure

Structure#declare :x => "int x"

Adds the specified string to define a structure member.

Utility functions

CGenerator.make_c_name s

Geenrates a unique C itentifier from the given Ruby identifier, which may include /[@$?!]/, '::', and even '.'. (Some special globals are not yet supported: $: and $-I, for example.)

It is unique in the sense that distinct Ruby identifiers map to distinct C identifiers. (Not completely checked. Might fail for some really obscure cases.)

String.tab n

Tabs left or right by n chars, using spaces.

String.tabto n

The first non-empty line is adjusted to have n spaces before the first nonspace. Additional lines are changed to preserve relative tabbing.

String.taballto n

Aligns each line to have n spaces before the first non-space.

(These routines probably don't work well, if at all, with "hard" tabs.)

Example

require 'cgen'

lib = CGenerator::Library.new "sample_lib"

class Point
  def initialize
    # Not necessary, but you could do something here.
  end
end

lib.declare_extern_struct(:point).instance_eval {
  declare :x => "double x"
  declare :y => "double y"
}

C_name_Point = lib.declare_class Point
C_name_initialize = lib.declare_symbol :initialize

lib.define_global_function(:new_point).instance_eval {
  arguments "x", "y"        # 'VALUE' is assumed
  declare :p => "point *p"
  declare :result => "VALUE result"
      # semicolons are added automatically
  body %{
    p = ALLOC(point);
    p->x = NUM2DBL(x);
    p->y = NUM2DBL(y);
    result = Data_Wrap_Struct(#{C_name_Point}, 0, free, p);
    rb_funcall(result, #{C_name_initialize}, 0);
  }
  returns "result"
      # can put a return statement in the body, if preferred
}

for var in [:x, :y]   # metaprogramming in C!
  lib.define_method(Point, var).instance_eval {
    declare :p => "point *p"
    body %{
      Data_Get_Struct(self, point, p);
    }
    returns "rb_float_new(p->#{var})"
  }
end

# A utility function, available to other C files
lib.define("distance").instance_eval {
  arguments "point *p1", "point *p2"
  return_type "double"
  scope :extern
  returns "sqrt(pow(p1->x - p2->x, 2) + pow(p1->y - p2->y, 2))"
  include "<math.h>"
  # The include accumulator call propagates up the parent
  # hierarchy until something handles it. In this case,
  # the Library lib handles it by adding an include
  # directive to the .c file. This allows related, but
  # separate aspects of the C source to be handled in
  # the same place in the Ruby code. We could also have
  # called inclulde directly on lib.
}

lib.define_method(Point, :distance).instance_eval {
  arguments "other"
  declare :p => "point *p"
  declare :q => "point *q"
  body %{
    Data_Get_Struct(self, point, p);
    Data_Get_Struct(other, point, q);
  }
  returns "rb_float_new(distance(p, q))"
}

lib.commit # now you can use the new definitions

p1 = new_point(1, 2)
puts "p1: x is #{p1.x}, y is #{p1.y}"

p2 = new_point(5, 8)
puts "p2: x is #{p2.x}, y is #{p2.y}"

puts "distance from p1 to p2 is #{p1.distance p2}"

Output is:

p1: x is 1.0, y is 2.0
p2: x is 5.0, y is 8.0
distance from p1 to p2 is 7.211102551  

That's a lot of code to do a simple operation, compared with an Inline-style construct. CGenerator's value shows up with more complex tasks.

Notes

To do

version

CGenerator 0.1

The current version of this software can be found at http://redshift.sourceforge.net/cgen .

license

This software is distributed under the Ruby license. See http://www.ruby-lang.org.

author

Joel VanderWerf, vjoel@sourceforge.net