7 min read

Foreign-Function Interface

Jan Wielemaker [1] wrote a very useful Prolog pack for foreign-function interfacing. It works well, cleverly parsing pre-processed C header files for function prototypes.

It has only one limitation when it comes to embedded systems: it relies on gcc and header file availability. Embedded systems tend not to load a compiler; they cross-compile instead; the compiler only exists on the developer’s host machine along with the compile-time dependencies including source headers. Bare-bones embedded systems with limited memory and other resources strip away all compile- and link-time artifacts, leaving only executable artifacts such as libraries and binaries for the target platform. In this use case, the FFI machinery cannot pre-process headers using the compiler’s pre-processor in order to extract C prototypes. The headers do not exist, nor the compiler; requirements and assumptions subtly change. Buildroot-based systems [2] provide an example. In an effort to simplify and minimise the target distribution, the build system cross-compiles the kernel and its modules, along with user-land libraries and programs. Only the compiler results appear in the deployed run-time environment without header or other compile-time dependencies and without the build tools that created the end-products.

Problem

Can the FFI developer use ‘Programming in Logic’ to dynamically access C libraries without headers or compiler? The solution below focuses on Linux and compatible Unix-like deployments.

Solution

Find the full module here.

Firstly, find a library. The logic requires a way to locate a dynamically linked shared object library file from some abstract library specification, e.g. libc for the standard C library or libm for the standard maths library, etc. This is one solution: co-opt the user:file_search_path/2 predicate.

:- multifile user:file_search_path/2.

user:file_search_path(ld_library_path, '/usr/lib').
user:file_search_path(ld_library_path, Path) :-
    getenv('HOSTTYPE', HostType),
    atomic_list_concat([HostType, linux, gnu], -, Dir),
    atomic_list_concat(['', usr, lib, Dir], /, Path).
user:file_search_path(ld_library_path, Path) :-
    getenv('LD_LIBRARY_PATH', LdLibraryPath),
    atomic_list_concat(Components, :, LdLibraryPath),
    member(Path, Components).

The c_library/2 predicate below loads a dynamic library using the loader library path. Notice that the first clause resolves the library from its \(Alias\) to its absolute file path at \(Abs\).

%% Finds a C shared-object library. Loads a found library if not already loaded.
%% Uses default loading options.
%%
%% @arg Alias is the shared object alias. It must resolve to an existing
%% executable shared object. Shared objects may have more than one alias as
%% symbol links referencing the actual executable library file.
%%
%% @arg Library is a C library blob on successful unification.
c_library(Alias, Library) :-
    absolute_file_name(ld_library_path(Alias), Abs, [access(read)]),
    c_library_(Abs, Library).

:- dynamic library/2.
:- volatile library/2.

c_library_(Abs, Library) :- library(Abs, Library), !.
c_library_(Abs, Library) :-
    ffi:ffi_library_create(Abs, Library, []),
    asserta(library(Abs, Library)).

Next, given a library, look up a C symbol by \(Name\).

%% Looks up a Symbol by Name within a library using the library's Alias.
%% Caches the symbol.
c_symbol(Alias, Name, Symbol) :-
    c_library(Alias, Library),
    c_symbol_(Library, Name, Symbol).

:- dynamic symbol/3.
:- volatile symbol/3.

c_symbol_(Library, Name, Symbol) :- symbol(Library, Name, Symbol), !.
c_symbol_(Library, Name, Symbol) :-
    ffi:ffi_lookup_symbol(Library, Name, Symbol),
    asserta(symbol(Library, Name, Symbol)).

From a library \(Alias\) and a C symbol \(Name\), build an abstraction function prototype using an expected return type at \(RetType\) and a list of argument types at \(ArgTypes\).

%% Builds a callable Function. Looks up the function Name within a shared
%% library at Alias and builds a prototype using RetType as the return type and
%% ArgTypes as a list of function argument types. Applies the default ABI.
c_prototype(Alias, Name, RetType, ArgTypes, Function) :-
    c_symbol(Alias, Name, Symbol),
    c_prototype(Symbol, RetType, ArgTypes, Function).

:- dynamic prototype/4.
:- volatile prototype/4.

c_prototype(Symbol, RetType, ArgTypes, Function) :-
    prototype(Symbol, RetType, ArgTypes, Function), !.
c_prototype(Symbol, RetType, ArgTypes, Function) :-
    ffi:ffi_prototype_create(Symbol, default, RetType, ArgTypes, Function),
    asserta(prototype(Symbol, RetType, ArgTypes, Function)).

Finally, call a C function using a library \(Alias\), a return type at \(RetType\) and a list of argument types at \(ArgTypes\).

%% The Goal's compound name provides the symbol look-up key. Use of
%% compound_name_arguments/3 allows for zero-arity goals, that is, for C
%% calls with no arguments.
c_call(Alias, RetType, ArgTypes, Goal) :-
    compound_name_arguments(Goal, Name, _),
    c_prototype(Alias, Name, RetType, ArgTypes, Function),
    c_call(Function, Goal).

c_call(Function, Goal) :- ffi:ffi_call(Function, Goal).

Usage

As a basic example, call the C math library’s double sin(double) function. Note that the last unbound \(Goal\) argument binds the return value.

?- c_call('libm.so.6', double, [double], sin(1.0, A)).
A = 0.8414709848078965.

Calling the sin() function fails if you ask for libc.so.6 because the math library defines that function.

Calling rand() generates a pseudo-random number.

?- c_call('libc.so.6', int, [], rand(A)).
A = 1681692777.

Explanations

Underneath the bonnet1, the FFI interface involves four core actions:

  1. Dynamically loading a shared object library.
  2. Finding a C symbol within a loaded library.
  3. Building a prototype from a C symbol using a return type and argument types.
  4. Calling a function prototype which involves:
    1. Marshalling outbound arguments by prototype.
    2. Calling a C function with marshalled arguments.
    3. Un-marshalling inbound return values by prototype.

All the predicates ‘memoise’ their volatile results.

Loader Library Path

The entire edifice starts with finding a required library. They do not necessarily reside in the same place from one Linux system to the next. Nor does the name necessarily derive directly from its basic name. The standard C library is not always called libc.so located in /usr/lib for example. The Linux loader cache configures the library cache on most fully-equipped distributions, e.g. ldconfig -p prints the current cache.

The user:file_search_path/2 clauses expand the user file search path to allow ld_library_path/1 terms. Such terms expand to the path of a library given its base name with extension. The library loader uses the loader library path to locate a library somewhere within the Linux root file system, e.g. ld_library_path('libm.so.6') finds the math library shared object.

C Library Loader

Predicate c_library(Alias, Library) loads a \(Library\) using an \(Alias\). The absolute file name of the library must resolve to an existing and “readable” file or access(read) but not necessarily executable. Some Linux systems make their shared objects executable but not all. Requiring only read access ensures that the library also exists and permits a level of flexibility for file modes, executable or not.

Symbols, Prototypes, Functions

One symbol may have multiple “prototypes,” in the strictest sense if a prototype describes precise combinations of return and argument types. The C calling interface, and therefore FFI, permits any prototype to be applied to any function. It may not work; it may even crash when called. The caller is responsible for complying with the convention and the explicitly declared prototype plus any other implicit calling requirements.

Conclusions

Seen as a whole, a Linux system amounts to a large set of compiled C functions; other languages exist as well of course but mostly C within embedded systems. Ultimately, descending through the layers of software, the machine performs function calls with arguments. Different calling conventions exist2, C-style nevertheless proves to be the general-purpose Lingua Franca of machine-code bindings.

The C function is a flexible interface. It involves arguments pushed on a stack and a return value. Architectures can optimise the stack pushes by passing the initial arguments in registers. Nevertheless, the interface can vary from fixed double(double) prototypes, one floating-point number in, another out, e.g. sin(), all the way to variable length and variable type argument lists such as those supplied to the printf() family of C functions. Thus albeit flexible, the interface also proves easy to break. The caller must provide the correct arguments in the correct order and interpret the return value correctly. C applies minimal type checking and simultaneously makes life easy when casting between types. C arguably encodes a low-level language, effectively a machine language shorthand.

[1]
J. Wielemaker, Dynamically calling C from Prolog.” Accessed: Dec. 16, 2023. [Online]. Available: https://www.swi-prolog.org/pack/list?p=ffi
[2]
Buildroot, “Making Embedded Linux Easy.” Accessed: Oct. 07, 2023. [Online]. Available: https://buildroot.org/

  1. or “hood” in American↩︎

  2. Pascal and C++ for instance↩︎