DEV Community

Cover image for Learning XS - What is in my variable
LNATION for LNATION

Posted on • Edited on

Learning XS - What is in my variable

Over the past year, I’ve been self-studying XS and have now decided to share my learning journey through a series of blog posts. This second post introduces the fundamentals of type checking variables in XS.

What is a variable in Perl?

In Perl, a variable is a named storage location that holds a value you can use and manipulate in your program. Perl fundamentally has three types of variables which are:

  1. Scalars ($) hold single values (numbers, strings, references)
  2. Arrays (@) hold ordered lists of scalars
  3. Hashes (%) hold key-value pairs

It's important to understand that arrays always store lists, and hashes always store key-value pairs. Scalars, on the other hand, are more flexible they can hold various types of data, including numbers, strings, or references. This flexibility is why Perl provides the ref keyword, which allows you to determine the basic type of a scalar reference. However, if you need to identify the actual underlying type of a scalar reference including those that have been blessed you'll need to use Scalar::Util and its reftype function, which handles these cases more transparently.

How does this translate to XS?

In Perl, all data regardless of its type is ultimately passed to XS as an 'SV*' (scalar value pointer). Arrays and hashes are actually special types of scalars that reference array or hash structures. Internally, they also have their own specific types: 'AV*' for arrays and 'HV*' for hashes. The Perl C API offers several functions for validating an 'SV'. Some of the commonly used ones include:

  1. SvOK(sv): True if sv contains a defined value.
  2. SvROK(sv): True if sv is a reference.
  3. SvTYPE(sv): Returns the internal type of the SV.
  4. SvRV(sv): Dereferences an SV reference to get the underlying value.

SvTYPE is an important function in Perl’s C API that returns the internal type of a Perl scalar value (SV*). It tells you what kind of data structure the SV currently holds, which is essential for type checking in XS. Below is a reference table to what you can check SvTYPE against.

Macro Value Meaning
SVt_NULL 0 Uninitialized SV
SVt_IV 1 Integer value
SVt_NV 2 Floating-point value
SVt_RV 3 Reference to another SV
SVt_PV 4 String value
SVt_PVIV 5 String or integer value
SVt_PVNV 6 String or floating-point value
SVt_PVMG 7 String with "magic" (special behaviors)
SVt_PVAV 8 Array value (AV*)
SVt_PVHV 9 Hash value (HV*)
SVt_PVCV 10 Code reference (CV*)
SVt_PVGV 11 Glob value (GV*)
SVt_PVLV 12 Lvalue (e.g., tied variables)
SVt_PVFM 13 Format reference
SVt_PVIO 14 IO handle (IO*)
SVt_REGEXP 15 Regular expression (REGEXP*)

Now let’s dive into the Perl code example we’ll be porting today, we will be creating a fun little utility module that exports a single function called animal_sound. The function takes a parameter (which can be a scalar, array reference, or hash reference) and returns an array reference of strings describing what sound an animal makes at each input value.

package Animal::Util;

use parent 'Exporter';

our @EXPORT_OK = qw/animal_sound/;

use Scalar::Util qw/reftype/;

my @animals = (
    [cat    => "meows"],
    [dog    => "barks"],
    [cow    => "moos"],
    [duck   => "quacks"],
    [sheep  => "baas"],
    [horse  => "neighs"],
    [pig    => "oinks"],
    [lion   => "roars"],
    [frog   => "ribbits"],
    [owl    => "hoots"],
);

sub animal_sound {
    my ($param) = @_;

    my $ref_type = reftype($param) || 'STRING';

    my @list;
    if ($ref_type eq 'ARRAY') {
        @list = @$param;
    } elsif ($ref_type eq 'HASH') {
        @list = %$param;
    } else {
        @list = ($param);
    }

    my @result;
    for my $i (0..$#list) {
        my $val = $list[$i];
        my ($animal, $sound) = @{ $animals[$i % @animals] };
        if (!defined $val) {
            push @result, "A $animal $sound at nothing (undef)";
        } elsif (ref $val) {
            push @result, "A $animal $sound at a " . reftype($val) . " ref";
        } else {
            push @result, "A $animal $sound '$val'";
        }
    }
    return \@result;
}

1;
Enter fullscreen mode Exit fullscreen mode

You can then use this module in your Perl script as follows:

use Animal::Util 'animal_sound';

my $sounds1 = animal_sound([42, "hello", undef, {foo=>1}]);
# [
#   "A cat meows '42'",
#   "A dog barks 'hello'",
#   "A cow moos at nothing (undef)",
#   "A duck quacks at a HASH ref"
# ]
Enter fullscreen mode Exit fullscreen mode

Okay lets port this to XS. We will create a new module called Animal::Util::XS and implement the animal_sound function in C.

module-starter --module="Animal::Util::XS" --author="Your Name" --email="your email"
Enter fullscreen mode Exit fullscreen mode

Update the Makefile.PL to include XSMULTI => 1, and then open the generated Animal/Util/XS.pm file to update the boiler plate code to the following:

package Animal::Util::XS;

use 5.006;
use strict;
use warnings;
our $VERSION = '0.01';

use parent 'Exporter';
our @EXPORT_OK = qw(animal_sound);

require XSLoader;
XSLoader::load("Animal::Util::XS", $VERSION);

1;
Enter fullscreen mode Exit fullscreen mode

Next lets add a new file called Animal/Util/XS.xs. This file will contain the XS code that implements the animal_sound function. We will create a basic template so our basic tests pass:

#define PERL_NO_GET_CONTEXT 
#include "EXTERN.h"        
#include "perl.h"          
#include "XSUB.h"           

MODULE = Animal::Util::XS  PACKAGE = Animal::Util::XS
PROTOTYPES: DISABLE

SV *
animal_sound(...)
    CODE:
        SV *animal = ST(0);
        AV *result = newAV();
        RETVAL = newRV_noinc((SV*)result);
    OUTPUT:
        RETVAL
Enter fullscreen mode Exit fullscreen mode

With that in place, you should be able to build the module using 'make' and 'make test'. We will now add an additional test file to test the animal_sound function. Create a new file called t/01-animal_sound.t and add the following code:

use Test::More;

use Animal::Util::XS qw/animal_sound/;

is_deeply(animal_sound(), ["A cat meows at nothing (undef)"]);
is_deeply(animal_sound("123"), ["A cat meows '123'"]);

done_testing();
Enter fullscreen mode Exit fullscreen mode

Now if you run 'make test' again the test will fail because we havent implemented the logic to handle different types of data yet and we are just returning an empty array reference.

We will start simple and then extend upon the implementation. So to get this simple scalar test to work we will need to update the XS file to first define an c 'array' of animal sounds which we can then return in our response:

#define PERL_NO_GET_CONTEXT // we'll define thread context if necessary (faster)
#include "EXTERN.h"         // globals/constant import locations
#include "perl.h"           // Perl symbols, structures and constants definition
#include "XSUB.h"           // xsubpp functions and macros

static char *animals[][2] = {
    {"cat",   "meows"},
    {"dog",   "barks"},
    {"cow",   "moos"},
    {"duck",  "quacks"},
    {"sheep", "baas"},
    {"horse", "neighs"},
    {"pig",   "oinks"},
    {"lion",  "roars"},
    {"frog",  "ribbits"},
    {"owl",   "hoots"}
};

MODULE = Animal::Util::XS  PACKAGE = Animal::Util::XS
PROTOTYPES: DISABLE

SV *
animal_sound(...)
    CODE:
        SV *param = items > 0 ? ST(0) : &PL_sv_undef;
        char *animal = animals[0][0];
        char *sound = animals[0][1];
        AV * result = newAV();
        if (!SvOK(param)) {
            av_push(result, newSVpvf("A %s %s at nothing (undef)", animal, sound));
        } else {
            STRLEN len;
            const char *val = SvPV(param, len);
            av_push(result, newSVpvf("A %s %s '%s'", animal, sound, val));
        }
        RETVAL = newRV_noinc((SV*)result);
    OUTPUT:
        RETVAL
Enter fullscreen mode Exit fullscreen mode

We have changed the code by defining the array of animals, then we have updated animal_sound to check whether more than one param has been passed. We are also using new macros/functions here that I havent mentioned yet, such as 'newAV()' to create a new array and 'av_push()' to add elements to it. The 'newSVpvf()' function is used to create a new scalar value with a formatted string, similar to Perls 'sprintf'. Now if you run make test again, the first two tests should pass, but we still need to implement the logic to handle references. Lets extend on our test file adding a test for passing in an array reference.

is_deeply(animal_sound([qw/1 2 3/]), [
    "A cat meows '1'",
    "A dog barks '2'",
    "A cow moos '3'"
]);
Enter fullscreen mode Exit fullscreen mode

Now we will update the XS code to handle array references. We will refactor so that we build an array of results based on the type of the input parameter. We will then iterate that list to generate the results. Update your animal_sound function to the following:

SV *
animal_sound(...)
    CODE:
        SV *param = items > 0 ? ST(0) : &PL_sv_undef;
        AV * result = newAV();
        int len, i;
        if (SvROK(param)) {
            int type = SvTYPE(SvRV(param));
            if (type == SVt_PVAV) {
                AV * array = (AV*)SvRV(param);
                len = av_len(array) + 1;
                for (i = 0; i < len; i++) {
                    SV *item = *av_fetch(array, i, 0);
                    SvREFCNT_inc(item);
                    av_push(result, item);
                }
            }
        } else {
            SvREFCNT_inc(param);
            av_push(result, param);
        }

        len = av_len(result) + 1;
        int animal_len = sizeof(animals) / sizeof(animals[0]);

        for (i = 0; i < len; i++) {
            char *animal = animals[i % animal_len][0];
            char *sound = animals[i % animal_len][1];

            SV *item = *av_fetch(result, i, 0);

            if (!SvOK(item)) {
                av_store(result, i, newSVpvf("A %s %s at nothing (undef)", animal, sound));
            } else {
                STRLEN len;
                const char *val = SvPV(item, len);
                av_store(result, i, newSVpvf("A %s %s '%s'", animal, sound, val));
            }
        }

        RETVAL = newRV_noinc((SV*)result);
    OUTPUT:
        RETVAL
Enter fullscreen mode Exit fullscreen mode

If you can follow we have added logic to check if the input parameter is a reference and if it is a array reference we fetch each item from the array and push it into our result array, we do this to clone the AV. If its not a reference then we also push it into the result array. We then iterate through the result array to generate the final output strings based on the animal sounds. To check whether our input parameter is an array we use SvTYPE.

Now if you run 'make test' again, all existing tests should pass, including the one for the array reference. Next, we will add a test for a hash reference.

is_deeply(animal_sound({foo => 1}), [
    "A cat meows 'foo'",
    "A dog barks '1'"
]);
Enter fullscreen mode Exit fullscreen mode

I define my hash reference with only one key here because perl does not support ordered hashes, so we cannot guarantee the order of the keys. If you want to support many keys, you will need to implement a recursive function to handle that or use a tied object that maintains the order. With our test suite now breaking again lets fix by updating the following code in the XS file:

SV *
animal_sound(...)
    CODE:
        SV *param = items > 0 ? ST(0) : &PL_sv_undef;
        AV * result = newAV();
        int len, i;
        if (SvROK(param)) {
            int type = SvTYPE(SvRV(param));
            if (type == SVt_PVAV) {
                AV * array = (AV*)SvRV(param);
                len = av_len(array) + 1;
                for (i = 0; i < len; i++) {
                    SV *item = *av_fetch(array, i, 0);
                    SvREFCNT_inc(item);
                    av_push(result, item);
                }
            } else if (type == SVt_PVHV) {
                HV * hash = (HV*)SvRV(param);
                HE * entry;
                (void)hv_iterinit(hash);
                while ((entry = hv_iternext(hash))) {
                        SV * key = hv_iterkeysv(entry);
                        SV * val = hv_iterval(hash, entry);
                        SvREFCNT_inc(val);
                        SvREFCNT_inc(key);
                        av_push(result, key);
                        av_push(result, val);
                }
            }
        } else {
            SvREFCNT_inc(param);
            av_push(result, param);
        }

        len = av_len(result) + 1;
        int animal_len = sizeof(animals) / sizeof(animals[0]);

        for (i = 0; i < len; i++) {
            char *animal = animals[i % animal_len][0];
            char *sound = animals[i % animal_len][1];

            SV *item = *av_fetch(result, i, 0);

            if (!SvOK(item)) {
                av_store(result, i, newSVpvf("A %s %s at nothing (undef)", animal, sound));
            } else {
                STRLEN len;
                const char *val = SvPV(item, len);
                av_store(result, i, newSVpvf("A %s %s '%s'", animal, sound, val));
            }
        }

        RETVAL = newRV_noinc((SV*)result);
    OUTPUT:
        RETVAL
Enter fullscreen mode Exit fullscreen mode

Not much has changed other than the check for type == SVt_PVHV, which is checking that the type is a hash reference. We then dereference the SV into a HV and iterate over its keys and values using hv_iterinit and hv_iternext. For each key-value pair, we push both the key and value into our result array, incrementing their reference counts to avoid premature freeing.

We now have the basics working and your 'make test' will pass again, but we are missing functionality we do not check for references when concatenating the strings in the for loop, a basic test to capture this is the following:

is_deeply(animal_sound([{foo => 1}, ["bar"], sub { 1 }, qr/2/]), [
    "A cat meows at a HASH ref",
    "A dog barks at a ARRAY ref",
    "A cow moos at a CODE ref",
    "A duck quacks at a REGEXP ref"
]);
Enter fullscreen mode Exit fullscreen mode

To fix is simple as there is a function available to do just the task it is sv_reftype which is what Scalar::Util reftype subroutine uses. To implement the fix simply update the for loop to the following:

for (i = 0; i < len; i++) {
    char *animal = animals[i % animal_len][0];
    char *sound = animals[i % animal_len][1];

    SV *item = *av_fetch(result, i, 0);

    if (!SvOK(item)) {
        av_store(result, i, newSVpvf("A %s %s at nothing (undef)", animal, sound));
    } else if (SvROK(item)) {
        av_store(result, i, newSVpvf("A %s %s at a %s ref", animal, sound, (char*)sv_reftype(SvRV(item),0)));
    } else {
        STRLEN len;
        const char *val = SvPV(item, len);
        av_store(result, i, newSVpvf("A %s %s '%s'", animal, sound, val));
    }
}
Enter fullscreen mode Exit fullscreen mode

Now 'make test' and all should pass.

Thanks for reading!

Top comments (0)