NAME

perlxstypemap - Perl XS C/Perl 型マッピング

DESCRIPTION

二つの言語の間のインターフェースについて考えれば考えるほど、プログラマの努力の大半はどちらかの言語にネイティブなデータ構造の変換に費やされることに気付くことになります。これは呼び出し規則の違いといったものより重要です; 問題空間が遥かに大きいからです。単純に、関数呼び出しを実装する方法よりデータをメモリに納める方法の方が遥かに多いです。

これに対する Perl XS での解決の試みは、typemap という概念です。抽象レベルでは、Perl の XS typemap はある種の Perl データ構造からある種の C データ構造、およびその逆への変換のためのレシピ以外の何者でもありません。同じロジックでの変換を保証することと十分に似ている C 型もあるので、 XS typemap はユニークな識別子として表現され、以降この文書では「XS 型」 <XS type> と呼ばれます。それから複数の C 型が同じ XS typemap にマッピングされるように XS コンパイラに伝えることができます。

XS コード中で、C 型の引数を定義したり、XSUB の C 返り型と CODE: と OUTPUT: の節を共に使ったりする場合、これらを簡単にするために typemap 機構があります。

typemap の構造

より実際的な用語としては、typemap は C 関数の引数と値を Perl の値にマッピングするために xsubpp コンパイラによって使われるコード片の集合 (collection)です。 typemap ファイルは TYPEMAP, INPUT, OUTPUT というラベルの付いた三つのセクションから構成されます。ラベルのついていない初期化セクションは、TYPEMAP であるかのように仮定されます。 INPUT セクションは、コンパイラに対して Perl の値をどのように (幾つかある) C の型に変換するかを指示します。 OUTPUT セクションは、コンパイラに対してどのようにして C の型を Perl が認識できる値に変換するのかを指示します。 TYPEMAP セクションは、コンパイラに対して指示された C の型を Perl の値にマッピングするのに使うべき INPUT セクションもしくは OUTPUT セクションにあるコード片を指示します。セクションラベル TYPEMAP, INPUT, OUTPUT は行の先頭におかれ、大文字でなければなりません。

もっと複雑な例を挙げましょう。 struct netconfig を Net::Config というクラスに bless したいと考えていると仮定しましょう。これを行うやり方の一つは、アンダースコア(_)を以下の様にパッケージ名を区切るために使うというものです。

  typedef struct netconfig * Net_Config;

それからアンダースコアをダブルコロン(::)にマップする typemap エントリー T_PTROBJ_SPECIAL を用意して、Net_Config をその型として宣言します。

  TYPEMAP
  Net_Config      T_PTROBJ_SPECIAL
  
  INPUT
  T_PTROBJ_SPECIAL
    if (sv_derived_from($arg, \"${(my $ntt=$ntype)=~s/_/::/g;\$ntt}\")){
      IV tmp = SvIV((SV*)SvRV($arg));
      $var = INT2PTR($type, tmp);
    }
    else
      croak(\"$var is not of type ${(my $ntt=$ntype)=~s/_/::/g;\$ntt}\")
  
  OUTPUT
  T_PTROBJ_SPECIAL
    sv_setref_pv($arg, \"${(my $ntt=$ntype)=~s/_/::/g;\$ntt}\",
                 (void*)$var);

INPUT セクションと OUTPUT セクションはアンダースコアをダブルコロンへその場で置換して、期待される効果をもたらします。この例では typemap 機構の威力と適用範囲の広さを例示します。

(perl.h に定義されている) INT2PTR マクロは、整数から与えられた型のポインタへのキャストを、整数とポインタとのサイズが異なる可能性を考慮しつつ行います。また、OUTPUT セクションで有用かもしれない、他の方法でマッピングする、 PTR2IV, PTR2UV, PTR2NV マクロもあります。

配布の中の typemap ファイルの役割

Perl のソースの lib/ExtUtils ディレクトリにあるデフォルトの typemap は Perl のエクステンションから使うことのできるたくさんの便利な型があります。一部のエクステンションではそれに固有のディレクトリに、typemap に対する追加の定義を置いています。これらの追加された typemap はメインの typemap にある INPUT や OUTPUT のマッピングを参照することができます。 xsubpp コンパイラは、エクステンションに固有の typemap がデフォルトの typemap にあるマッピングをオーバーライドすることを許しています。追加の typemap ファイルを使う代わりに、typemap はヒヤドキュメント風の文法を使って XS にそのまま組み込むことができます。 TYPEMAP: XS キーワードの文書を参照してください。

(perl 5.16 以降に同梱されている) ExtUtils::ParseXS バージョン 3.13_01 から、 it is rather easy to share typemap code between multiple CPAN distributions. The general idea is to share it as a module that offers a certain API and have the dependent modules declare that as a built-time requirement and import the typemap into the XS. An example of such a typemap-sharing module on CPAN is ExtUtils::Typemaps::Basic. Two steps to getting that module's typemaps available in your code: (TBT)

Makefile.PL (BUILD_REQUIRES を使う) または Build.PL (build_requires を使う) で、ビルド時依存として ExtUtils::Typemaps::Basic を宣言する。

XS ファイルの XS 節に以下の行を含めてください: (行は分割しないでください)

  INCLUDE_COMMAND: $^X -MExtUtils::Typemaps::Cmd
                   -e "print embeddable_typemap(q{Basic})"

typemap エントリを書く

それぞれの INPUT または OUTPUT typemap エントリは、特定の C 型にマッピングするための最終的な C コードを得るために、特定の変数の存在が評価される、ダブルクォートで囲まれた Perl 文字列です。

つまり、${ ここにスカラリファレンスに評価される perl コード } のような構文を使って、typemap (C) コードの中に Perl コードを組み込めるということです。一般的な使用法は、ALIAS XS 機能を使っているときでも、真の関数名を参照するエラーメッセージを生成することです:

  ${ $ALIAS ? \q[GvNAME(CvGV(cv))] : \qq[\"$pname\"] }

多くの typemap の例で、perl ソースツリーの lib/ExtUtils/typemap にあるコア typemap ファイルを参照しています。

typemap へ展開可能な Perl 変数は以下のものです:

$var - 入力または出力の変数名; 例えば返り値の RETVAL。
$type - 引数の生の C 型; : は _ に置換されます。
$ntype - 提供された型 ; * は Ptr に置換されます。例えば Foo::Bar の型では、$ntype は Foo::Bar です。
$arg - スタックエントリ; 引数は入力または出力; 例えば ST(0)
$argoff - 引数の引数スタックオフセット。つまり、最初の引数は 0、など。
$pname - XSUB 完全名; PACKAGE 名を含み、PREFIX は除去されます。これは非-ALIAS 名です。
$Package - もっとも最近の PACKAGE キーワードで指定されたパッケージ。
$ALIAS - 現在の XSUB が ALIAS で宣言された別名を持っていれば非 0。

コア typemap の完全な一覧

Each C type is represented by an entry in the typemap file that is responsible for converting perl variables (SV, AV, HV, CV, etc.) to and from that type. The following sections list all XS types that come with perl by default. (TBT)

T_SV

This simply passes the C representation of the Perl variable (an SV*) in and out of the XS layer. This can be used if the C code wants to deal directly with the Perl variable. (TBT)

T_SVREF

SV へのリファレンスを渡したり返したりするために使われます。

Note that this typemap does not decrement the reference count when returning the reference to an SV*. See also: T_SVREF_REFCOUNT_FIXED (TBT)

T_SVREF_FIXED

Used to pass in and return a reference to an SV. This is a fixed variant of T_SVREF that decrements the refcount appropriately when returning a reference to an SV*. Introduced in perl 5.15.4. (TBT)

T_AVREF

perl レベルでは perl 配列へのリファレンスです。 C レベルでは AV へのポインタです。

この typemap は AV* を返すときに参照カウントをデクリメントしないことに注意してください。 T_AVREF_REFCOUNT_FIXED も参照してください。

T_AVREF_REFCOUNT_FIXED

From the perl level this is a reference to a perl array. From the C level this is a pointer to an AV. This is a fixed variant of T_AVREF that decrements the refcount appropriately when returning an AV*. Introduced in perl 5.15.4. (TBT)

T_HVREF

From the perl level this is a reference to a perl hash. From the C level this is a pointer to an HV. (TBT)

Note that this typemap does not decrement the reference count when returning an HV*. See also: T_HVREF_REFCOUNT_FIXED (TBT)

T_HVREF_REFCOUNT_FIXED

From the perl level this is a reference to a perl hash. From the C level this is a pointer to an HV. This is a fixed variant of T_HVREF that decrements the refcount appropriately when returning an HV*. Introduced in perl 5.15.4. (TBT)

T_CVREF

From the perl level this is a reference to a perl subroutine (e.g. $sub = sub { 1 };). From the C level this is a pointer to a CV. (TBT)

Note that this typemap does not decrement the reference count when returning an HV*. See also: T_HVREF_REFCOUNT_FIXED (TBT)

T_CVREF_REFCOUNT_FIXED

From the perl level this is a reference to a perl subroutine (e.g. $sub = sub { 1 };). From the C level this is a pointer to a CV. (TBT)

This is a fixed variant of T_HVREF that decrements the refcount appropriately when returning an HV*. Introduced in perl 5.15.4. (TBT)

T_SYSRET

The T_SYSRET typemap is used to process return values from system calls. It is only meaningful when passing values from C to perl (there is no concept of passing a system return value from Perl to C). (TBT)

System calls return -1 on error (setting ERRNO with the reason) and (usually) 0 on success. If the return value is -1 this typemap returns undef. If the return value is not -1, this typemap translates a 0 (perl false) to "0 but true" (which is perl true) or returns the value itself, to indicate that the command succeeded. (TBT)

POSIX モジュールはこの型を広範囲に使います。

T_UV

符号なし整数。

T_IV

符号付き整数。 This is cast to the required integer type when passed to C and converted to an IV when passed back to Perl. (TBT)

T_INT

符号付き整数。 This typemap converts the Perl value to a native integer type (the int type on the current platform). When returning the value to perl it is processed in the same way as for T_IV. (TBT)

この振る舞いは XS で T_IV 付きで int を使うのと同じです。

T_ENUM

列挙値。 C から列挙要素を転送するために使われます。 C に列挙値を渡す理由はありません; perl の内部では IV として保管されているからです。

T_BOOL

真偽値型。これは C との間で真と偽の値を渡すために使えます。

T_U_INT

これは符号なし整数のためのものです。 It is equivalent to using T_UV but explicitly casts the variable to type unsigned int. The default type for unsigned int is T_UV. (TBT)

T_SHORT

short 整数。 This is equivalent to T_IV but explicitly casts the return to type short. The default typemap for short is T_IV. (TBT)

T_U_SHORT

符号なし short 整数。 This is equivalent to T_UV but explicitly casts the return to type unsigned short. The default typemap for unsigned short is T_UV. (TBT)

T_U_SHORT は標準 typemap での U16 型で使われます。

T_LONG

long 整数。 This is equivalent to T_IV but explicitly casts the return to type long. The default typemap for long is T_IV. (TBT)

T_U_LONG

符号なし long 整数。 This is equivalent to T_UV but explicitly casts the return to type unsigned long. The default typemap for unsigned long is T_UV. (TBT)

T_U_LONG は標準 typemap での U32 型で使われます。

T_CHAR

単一の 8 ビット文字。

T_U_CHAR

符号なしバイト。

T_FLOAT

浮動小数点数。この typemap は float にキャストした変数を返すことを保証します。

T_NV

Perl の浮動小数点数。 Similar to T_IV and T_UV in that the return type is cast to the requested numeric type rather than to a specific type. (TBT)

T_DOUBLE

倍精度浮動小数点数。 This typemap guarantees to return a variable cast to a double. (TBT)

T_PV

文字列 (char *)。

T_PTR

メモリアドレス(ポインタ)。典型的には void * 型に結びつけられます。

T_PTRREF

Similar to T_PTR except that the pointer is stored in a scalar and the reference to that scalar is returned to the caller. This can be used to hide the actual pointer value from the programmer since it is usually not required directly from within perl. (TBT)

typemap はスカラリファレンスが perl から XS に渡されたことをチェックします。

T_PTROBJ

Similar to T_PTRREF except that the reference is blessed into a class. This allows the pointer to be used as an object. Most commonly used to deal with C structs. The typemap checks that the perl object passed into the XS routine is of the correct class (or part of a subclass). (TBT)

The pointer is blessed into a class that is derived from the name of type of the pointer but with all '*' in the name replaced with 'Ptr'. (TBT)

T_REF_IV_REF

未記述

T_REF_IV_PTR

Similar to T_PTROBJ in that the pointer is blessed into a scalar object. The difference is that when the object is passed back into XS it must be of the correct type (inheritance is not supported). (TBT)

The pointer is blessed into a class that is derived from the name of type of the pointer but with all '*' in the name replaced with 'Ptr'. (TBT)

T_PTRDESC

未記述

T_REFREF

Similar to T_PTRREF, except the pointer stored in the referenced scalar is dereferenced and copied to the output variable. This means that T_REFREF is to T_PTRREF as T_OPAQUE is to T_OPAQUEPTR. All clear? (TBT)

これは INPUT 部 (Perl から XSUB) のみが実装されていて、コアや CPAN で使っているコードは知られていません。

T_REFOBJ

未記述

T_OPAQUEPTR

This can be used to store bytes in the string component of the SV. Here the representation of the data is irrelevant to perl and the bytes themselves are just stored in the SV. It is assumed that the C variable is a pointer (the bytes are copied from that memory location). If the pointer is pointing to something that is represented by 8 bytes then those 8 bytes are stored in the SV (and length() will report a value of 8). This entry is similar to T_OPAQUE. (TBT)

In principle the unpack() command can be used to convert the bytes back to a number (if the underlying type is known to be a number). (TBT)

This entry can be used to store a C structure (the number of bytes to be copied is calculated using the C sizeof function) and can be used as an alternative to T_PTRREF without having to worry about a memory leak (since Perl will clean up the SV). (TBT)

T_OPAQUE

This can be used to store data from non-pointer types in the string part of an SV. It is similar to T_OPAQUEPTR except that the typemap retrieves the pointer directly rather than assuming it is being supplied. For example, if an integer is imported into Perl using T_OPAQUE rather than T_IV the underlying bytes representing the integer will be stored in the SV but the actual integer value will not be available. i.e. The data is opaque to perl. (TBT)

The data may be retrieved using the unpack function if the underlying type of the byte stream is known. (TBT)

T_OPAQUE supports input and output of simple types. T_OPAQUEPTR can be used to pass these bytes back into C if a pointer is acceptable. (TBT)

Implicit array

xsubpp supports a special syntax for returning packed C arrays to perl. If the XS return type is given as (TBT)

  array(type, nelem)

xsubpp will copy the contents of nelem * sizeof(type) bytes from RETVAL to an SV and push it onto the stack. This is only really useful if the number of items to be returned is known at compile time and you don't mind having a string of bytes in your SV. Use T_ARRAY to push a variable number of arguments onto the return stack (they won't be packed as a single string though). (TBT)

This is similar to using T_OPAQUEPTR but can be used to process more than one element. (TBT)

T_PACKED

Calls user-supplied functions for conversion. For OUTPUT (XSUB to Perl), a function named XS_pack_$ntype is called with the output Perl scalar and the C variable to convert from. $ntype is the normalized C type that is to be mapped to Perl. Normalized means that all * are replaced by the string Ptr. The return value of the function is ignored. (TBT)

Conversely for INPUT (Perl to XSUB) mapping, the function named XS_unpack_$ntype is called with the input Perl scalar as argument and the return value is cast to the mapped C type and assigned to the output C variable. (TBT)

typemap された構造体 foo_t * のための変換関数の例は次のようなものです:

  static void
  XS_pack_foo_tPtr(SV *out, foo_t *in)
  {
    dTHX; /* alas, signature does not include pTHX_ */
    HV* hash = newHV();
    hv_stores(hash, "int_member", newSViv(in->int_member));
    hv_stores(hash, "float_member", newSVnv(in->float_member));
    /* ... */

    /* mortalize as thy stack is not refcounted */
    sv_setsv(out, sv_2mortal(newRV_noinc((SV*)hash)));
  }

Perl から C への変換は読者の演習として残されていますが、プロトタイプは次のようになります:

  static foo_t *
  XS_unpack_foo_tPtr(SV *in);

Instead of an actual C function that has to fetch the thread context using dTHX, you can define macros of the same name and avoid the overhead. Also, keep in mind to possibly free the memory allocated by XS_unpack_foo_tPtr. (TBT)

T_PACKEDARRAY

T_PACKEDARRAY は T_PACKED に似ています。 In fact, the INPUT (Perl to XSUB) typemap is indentical, but the OUTPUT typemap passes an additional argument to the XS_pack_$ntype function. This third parameter indicates the number of elements in the output so that the function can handle C arrays sanely. The variable needs to be declared by the user and must have the name count_$ntype where $ntype is the normalized C type name as explained above. The signature of the function would be for the example above and foo_t **: (TBT)

  static void
  XS_pack_foo_tPtrPtr(SV *out, foo_t *in, UV count_foo_tPtrPtr);

The type of the third parameter is arbitrary as far as the typemap is concerned. It just has to be in line with the declared variable. (TBT)

Of course, unless you know the number of elements in the sometype ** C array, within your XSUB, the return value from foo_t ** XS_unpack_foo_tPtrPtr(...) will be hard to decypher. Since the details are all up to the XS author (the typemap user), there are several solutions, none of which particularly elegant. The most commonly seen solution has been to allocate memory for N+1 pointers and assign NULL to the (N+1)th to facilitate iteration. (TBT)

あるいは、最初の場所であなたの目的のためのカスタマイズされた typemap を使う方がおそらく好ましいです。

T_DATAUNIT

未記述

T_CALLBACK

未記述

T_ARRAY

This is used to convert the perl argument list to a C array and for pushing the contents of a C array onto the perl argument stack. (TBT)

この通常の呼び出しシグネチャは

  @out = array_func( @in );

Any number of arguments can occur in the list before the array but the input and output arrays must be the last elements in the list. (TBT)

When used to pass a perl list to C the XS writer must provide a function (named after the array type but with 'Ptr' substituted for '*') to allocate the memory required to hold the list. A pointer should be returned. It is up to the XS writer to free the memory on exit from the function. The variable ix_$var is set to the number of elements in the new array. (TBT)

When returning a C array to Perl the XS writer must provide an integer variable called size_$var containing the number of elements in the array. This is used to determine how many elements should be pushed onto the return argument stack. This is not required on input since Perl knows how many arguments are on the stack when the routine is called. Ordinarily this variable would be called size_RETVAL. (TBT)

Additionally, the type of each element is determined from the type of the array. If the array uses type intArray * xsubpp will automatically work out that it contains variables of type int and use that typemap entry to perform the copy of each element. All pointer '*' and 'Array' tags are removed from the name to determine the subtype. (TBT)

T_STDIO

This is used for passing perl filehandles to and from C using FILE * structures. (TBT)

T_INOUT

This is used for passing perl filehandles to and from C using PerlIO * structures. The file handle can used for reading and writing. This corresponds to the +< mode, see also T_IN and T_OUT. (TBT)

See perliol for more information on the Perl IO abstraction layer. Perl must have been built with -Duseperlio. (TBT)

There is no check to assert that the filehandle passed from Perl to C was created with the right open() mode. (TBT)

Hint: The perlxstut tutorial covers the T_INOUT, T_IN, and T_OUT XS types nicely. (TBT)

T_IN

Same as T_INOUT, but the filehandle that is returned from C to Perl can only be used for reading (mode <). (TBT)

T_OUT