The D Blog

The official blog for the D Programming Language.

Lost in Translation: Encapsulation

Nov 6, 2018 • DBlogAdmin • #Code, #The Language

I first learned programming in BASIC. Outgrew it, and switched to Fortran. Amusingly, my early Fortran code looked just like BASIC. My early C code looked like Fortran. My early C++ code looked like C. – Walter Bright, the creator of D

Programming in a language is not the same as thinking in that language. A natural side effect of experience with one programming language is that we view other languages through the prism of its features and idioms. Languages in the same family may look and feel similar, but there are guaranteed to be subtle differences that, when not accounted for, can lead to compiler errors, bugs, and missed opportunities. Even when good docs, books, and other materials are available, most misunderstandings are only going to be solved through trial-and-error.

D programmers come from a variety of programming backgrounds, C-family languages perhaps being the most common among them. Understanding the differences and how familiar features are tailored to D can open the door to more possibilities for organizing a code base, and designing and implementing an API. This article is the first of a few that will examine D features that can be overlooked or misunderstood by those experienced in similar languages.

We’re starting with a look at a particular feature that’s common among languages that support Object-Oriented Programming (OOP). There’s one aspect in particular of the D implementation that experienced programmers are sure they already fully understand and are often surprised to later learn they don’t.

Encapsulation

Most readers will already be familiar with the concept of encapsulation, but I want to make sure we’re on the same page. For the purpose of this article, I’m talking about encapsulation in the form of separating interface from implementation. Some people tend to think of it strictly as it relates to object-oriented programming, but it’s a concept that’s more broad than that. Consider this C code:

#include <stdio.h>
static size_t s_count;

void print_message(const char* msg) {
    puts(msg);
    s_count++;
}

size_t num_prints() { return s_count; }

In C, functions and global variables decorated with static become private to the translation unit (i.e. the source file along with any headers brought in via #include) in which they are declared. Non-static declarations are publicly accessible, usually provided in header files that lay out the public API for clients to use. Static functions and variables are used to hide implementation details from the public API.

Encapsulation in C is a minimal approach. C++ supports the same feature, but it also has anonymous namespaces that can encapsulate type definitions in addition to declarations. Like Java, C#, and other languages that support OOP, C++ also has access modifiers (alternatively known as access specifiers, protection attributes, visibility attributes) which can be applied to class and struct member declarations.

C++ supports the following three access modifiers, common among OOP languages:

  • public - accessible to the world

  • private - accessible only within the class

  • protected - accessible only within the class and its derived classes

An experienced Java programmer might raise a hand to say, “Um, excuse me. That’s not a complete definition of protected.” That’s because in Java, it looks like this:

  • protected - accessible within the class, its derived classes, and classes in the same package.

Every class in Java belongs to a package, so it makes sense to factor packages into the equation. Then there’s this:

  • package-private (not a keyword) - accessible within the class and classes in the same package.

This is the default access level in Java when no access modifier is specified. This combined with protected make packages a tool for encapsulation beyond classes in Java.

Similarly, C# has assemblies, which MSDN defines as “a collection of types and resources that forms a logical unit of functionality”. In C#, the meaning of protected is identical to that of C++, but the language has two additional forms of protection that relate to assemblies and that are analogous to Java’s protected and package-private.

  • internal - accessible within the class and classes in the same assembly.

  • protected internal - accessible within the class, its derived classes, and classes in the same assembly.

Examining encapsulation in other programming languages will continue to turn up similarities and differences. Common encapsulation idioms are generally adapted to language-specific features. The fundamental concept remains the same, but the scope and implementation vary. So it should come as no surprise that D also approaches encapsulation in its own, language-specific manner.

Modules

The foundation of D’s approach to encapsulation is the module. Consider this D version of the C snippet from above:

module mymod;

private size_t _count;

void printMessage(string msg) {
    import std.stdio : writeln;

    writeln(msg);
    _count++;
}

size_t numPrints() { return _count; }

In D, access modifiers can apply to module-scope declarations, not just class and struct members. _count is private, meaning it is not visible outside of the module. printMessage and numPrints have no access modifiers; they are public by default, making them visible and accessible outside of the module. Both functions could have been annotated with the keyword public.

Note that imports in module scope are private by default, meaning the symbols in the imported modules are not visible outside the module, and local imports, as in the example, are never visible outside of their parent scope.

Alternative syntaxes are supported, giving more flexibility to the layout of a module. For example, there’s C++ style:

module mymod;

// Everything below this is private until either another
// protection attribute or the end of file is encountered.
private:
    size_t _count;

// Turn public back on
public:
    void printMessage(string msg) {
        import std.stdio : writeln;

        writeln(msg);
        _count++;
    }

    size_t numPrints() { return _count; }

And this:

module mymod;

private {
    // Everything declared within these braces is private.
    size_t _count;
}

// The functions are still public by default
void printMessage(string msg) {
    import std.stdio : writeln;

    writeln(msg);
    _count++;
}

size_t numPrints() { return _count; }

Modules can belong to packages. A package is a way to group related modules together. In practice, the source files corresponding to each module should be grouped together in the same directory on disk. Then, in the source file, each directory becomes part of the module declaration:

// mypack/amodule.d
mypack.amodule;

// mypack/subpack/anothermodule.d
mypack.subpack.anothermodule; _Note that it’s possible to have package names that don’t correspond to directories and module names that don’t correspond to files, but it’s bad practice to do so. A deep dive into packages and modules will have to wait for a future post._

mymod does not belong to a package, as no packages were included in the module declaration. Inside printMessage, the function writeln is imported from the stdio module, which belongs to the std package. Packages have no special properties in D and primarily serve as namespaces, but they are a common part of the codescape.

In addition to public and private, the package access modifier can be applied to module-scope declarations to make them visible only within modules in the same package.

Consider the following example. There are three modules in three files (only one module per file is allowed), each belonging to the same root package.

// src/rootpack/subpack1/mod2.d
module rootpack.subpack1.mod2;
import std.stdio;

package void sayHello() {
    writeln("Hello!");
}

// src/rootpack/subpack1/mod1.d
module rootpack.subpack1.mod1;
import rootpack.subpack1.mod2;

class Speaker {
    this() { sayHello(); }
}

// src/rootpack/app.d
module rootpack.app;
import rootpack.subpack1.mod1;

void main() {
    auto speaker = new Speaker;
}

Compile this with the following command line:

cd src
dmd -i rootpack/app.d

The -i switch tells the compiler to automatically compile and link imported modules (excluding those in the standard library namespaces core and std). Without it, each module would have to be passed on the command line, else they wouldn’t be compiled and linked.

The class Speaker has access to sayHello because they belong to modules that are in the same package. Now imagine we do a refactor and we decide that it could be useful to have access to sayHello throughout the rootpack package. D provides the means to make that happen by allowing the package attribute to be parameterized with the fully-qualified name (FQN) of a package. So we can change the declaration of sayHello like so:

package(rootpack) void sayHello() {
    writeln("Hello!");
}

Now all modules in rootpack and all modules in packages that descend from rootpack will have access to sayHello. Don’t overlook that last part. A parameter to the package attribute is saying that a package and all of its descendants can access this symbol. It may sound overly broad, but it isn’t.

For one thing, only a package that is a direct ancestor of the module’s parent package can be used as a parameter. Consider a module rootpack.subpack.subsub.mymod. That name contains all of the packages that are legal parameters to the package attribute in mymod.d, namely rootpack, subpack, and subsub. So we can say the following about symbols declared in mymod:

  • package - visible only to modules in the parent package of mymod, i.e. the subsub package.

  • package(subsub) - visible to modules in the subsub package and modules in all packages descending from subsub.

  • package(subpack) - visible to modules in the subpack package and modules in all packages descending from subpack.

  • package(rootpack) - visible to modules in the rootpack package and modules in all packages descending from rootpack.

This feature makes packages another tool for encapsulation, allowing symbols to be hidden from the outside world but visible and accessible in specific subtrees of a package hierarchy. In practice, there are probably few cases where expanding access to a broad range of packages in an entire subtree is desirable.

It’s common to see parameterized package protection in situations where a package exposes a common public interface and hides implementations in one or more subpackages, such as a graphics package with subpackages containing implementations for DirectX, Metal, OpenGL, and Vulkan. Here, D’s access modifiers allow for three levels of encapsulation:

  • the graphics package as a whole

  • each subpackage containing the implementations

  • individual modules in each package

Notice that I didn’t include class or struct types as a fourth level. The next section explains why.

Classes and structs

Now we come to the motivation for this article. I can’t recall ever seeing anyone come to the D forums professing surprise about package protection, but the behavior of access modifiers in classes and structs is something that pops up now and then, largely because of expectations derived from experience in other languages.

Classes and structs use the same access modifiers as modules: public, package, package(some.pack), and private. The protected attribute can only be used in classes, as inheritance is not supported for structs (nor for modules, which aren’t even objects). public, package, and package(some.pack) behave exactly as they do at the module level. The thing that surprises some people is that private also behaves the same way.

import std.stdio;

class C {
    private int x;
}

void main() {
    C c = new C();
    c.x = 10;
    writeln(c.x);
}

Run this example online

Snippets like this are posted in the forums now and again by people exploring D, accompanying a question along the lines of, “Why does this compile?” (and sometimes, “I think I’ve found a bug!”). This is an example of where experience can cloud expectations. Everyone knows what private means, so it’s not something most people bother to look up in the language docs. However, those who do would find this:

Symbols with private visibility can only be accessed from within the same module.

private in D always means private to the module. The module is the lowest level of encapsulation. It’s easy to understand why some experience an initial resistance to this, that it breaks encapsulation, but the intent behind the design is to strengthen encapsulation. It’s inspired by the C++ friend feature.

Having implemented and maintained a C++ compiler for many years, Walter understood the need for a feature like friend, but felt that it wasn’t the best way to go about it.

Being able to declare a “friend” that is somewhere in some other file runs against notions of encapsulation.

An alternative is to take a Java-like approach of one class per module, but he felt that was too restrictive.

One may desire a set of closely interrelated classes that encapsulate a concept, and those should go into a module.

So the way to view a module in D is not just as a single source file, but as a unit of encapsulation. It can contain free functions, classes, and structs, all operating on the same data declared in module scope and class scope. The public interface is still protected from changes to the private implementation inside the module. Along those same lines, protected class members are accessible not just in derived classes, but also in the module.

Sometimes though, there really is a benefit to denying access to private members in a module. The bigger a module becomes, the more of a burden it is to maintain, especially when it’s being maintained by a team. Every place a private member of a class is accessed in a module means more places to update when a change is made, thereby increasing the maintenance burden. The language provides the means to alleviate the burden in the form of the special package module.

In some cases, we don’t want to require the user to import multiple modules individually. Splitting a large module into smaller ones is one of those cases. Consider the following file tree:

-- mypack
---- mod1.d
---- mod2.d We have two modules in a package called `mypack`. Let’s say that `mod1.d` has grown extremely large and we’re starting to worry about maintaining it. For one, we want to ensure that private members aren’t manipulated outside of class declarations with hundreds or thousands of lines in between. We want to split the module into smaller ones, but at the same time we don’t want to break user code. Currently, users can get at the module’s symbols by importing it with `import mypack.mod1`. We want that to continue to work. Here’s how we do it:

-- mypack
---- mod1
------ package.d
------ split1.d
------ split2.d
---- mod2.d We’ve split `mod1.d` into two new modules and put them in a package named `mod1`. We’ve also created a special `package.d` file, which looks like this:

module mypack.mod1;

public import mypack.mod1.split1,
              mypack.mod1.split2; When the compiler sees `package.d`, it knows to treat it specially. Users will be able to continue using `import mypack.mod1` without ever caring that it’s now split into two modules in a new package. The key is the module declaration at the top of `package.d`. It’s telling the compiler to treat this package as the module `mod1`. And instead of automatically importing all modules in the package, the requirement to list them as public imports in `package.d` allows more freedom in implementing the package. Sometimes, you might want to require the user to explicitly import a module even when a `package.d` is present.

Now users will continue seeing mod1 as a single module and can continue to import it as such. Meanwhile, encapsulation is now more stringently enforced internally. Because split1 and split2 are now separate modules, they can’t touch each other’s private parts. Any part of the API that needs to be shared by both modules can be annotated with package protection. Despite the internal transformation, the public interface remains unchanged, and encapsulation is maintained.

Wrapping up

The full list of access modifiers in D can be defined as such:

  • public - accessible everywhere.

  • package - accessible to modules in the same package.

  • package(some.pack) - accessible to modules in the package some.pack and to the modules in all of its descendant packages.

  • private - accessible only in the module.

  • protected (classes only) - accessible in the module and in derived classes.

Hopefully, this article has provided you with the perspective to think in D instead of your “native” language when thinking about encapsulation in D.

Thanks to Ali Çehreli, Joakim Noah, and Nicholas Wilson for reviewing and providing feedback on this article.

DMD 2.083.0 Released

Nov 2, 2018 • DBlogAdmin • #Code, #Compilers & Tools, #DMD Releases

Version 2.083.0 of DMD, the D reference compiler, is ready for download. The changelog lists 47 fixes and enhancements across all components of the release package. Notable among them are some C++ compatibility enhancements and some compile-time love.

C++ compatibility

D’s support for linking to C++ binaries has been evolving and improving with nearly every release. This time, the new things aren’t very dramatic, but still very welcome to those who work with both C++ and D in the same code base.

What’s my runtime?

For a while now, D has had predefined version identifiers for user code to detect the C runtime implementation at compile time. These are:

  • CRuntime_Bionic

  • CRuntime_DigitalMars

  • CRuntime_Glibc

  • CRuntime_Microsoft

  • CRuntime_Musl

  • CRuntime_UClibc

These aren’t reliable when linking against C++ code. Where the C runtime in use often depends on the system, the C++ runtime is compiler-specific. To remedy that, 2.083.0 introduces a few new predefined versions:

  • CppRuntime_Clang

  • CppRuntime_DigitalMars

  • CppRuntime_Gcc

  • CppRuntime_Microsoft

  • CppRuntime_Sun

Why so much conflict?

C++ support also gets a new syntax for declaring C++ linkage, which affects how a symbol is mangled. Consider a C++ library, MyLib, which uses the namespace mylib. The original syntax for binding a function in that namespace looks like this:

/*
 The original C++:
 namespace mylib { void cppFunc(); }
*/
// The D declaration
extern(C++, mylib) void cppFunc();

This declares that cppFunc has C++ linkage (the symbol is mangled in a manner specific to the C++ compiler) and that the symbol belongs to the C++ namespace mylib . On the D side, the function can be referred to either as cppFunc or as mylib.cppFunc.

In practice, this approach creates opportunities for conflict when a namespace is the same as a D keyword. It also has an impact on how one approaches the organization of a binding.  It’s natural to want to name the root package in D mylib, as it matches the library name and it is a D convention to name modules and packages using lowercase. In that case, extern(C++, mylib) declarations will not be compilable anywhere in the mylib package because the symbols conflict.

To alleviate the problem, an alternative syntax was proposed using strings to declare the namespaces in the linkage attribute, rather than identifiers:

/*
 The original C++:
 namespace foo { void cppFunc(); }
*/
// The D declaration
extern(C++, "foo") void cppFunc();

With this syntax, no mylib symbol is created on the D side; it is used solely for name mangling. No more conflicts with keywords, and D packages can be used to match the C++ namespaces on the D side. The old syntax isn’t going away anytime soon, though.

New compile-time things

This release provides two new built-in traits for more compile-time reflection options. Like all built-in traits, they are accessible via the __traits expression. There’s also a new pragma that lets you bring some linker options into the source code in a very specific circumstance.

Are you a zero?

isZeroInit can be used to determine if the default initializer of a given type is 0, or more specifically, it evaluates to true if all of the init value’s bits are zero. The example below uses compile-time asserts to verify the zeroness and nonzeroness of a few default init values, but I’ve saved a version that prints the results at runtime, for more immediate feedback, and can be compiled and run from the browser.

struct ImaZero {
    int x;
}

struct ImaNonZero {
    int x = 10;
}

// double.init == double.nan
static assert(!__traits(isZeroInit, double));

// int.init == 0
static assert(__traits(isZeroInit, int));

// ImaZero.init == ImaZero(0)
static assert(__traits(isZeroInit, ImaZero));

// ImaNonZeror.init == ImaZero(10)
static assert(!__traits(isZeroInit, ImaNonZero));

Computer, query target.

The second new trait is getTargetInfo, which allows compile-time queries about the target platform. The argument is a string that serves as a key, and the result is “an expression describing the requested target information”. Currently supported strings are “cppRuntimeLibrary”, “floatAbi”, and “ObjectFormat”.

The following prints all three at compile time.

pragma(msg, __traits(getTargetInfo, "cppRuntimeLibrary"));
pragma(msg, __traits(getTargetInfo, "floatAbi"));
pragma(msg, __traits(getTargetInfo, "objectFormat")); On Windows, using the default (`-m32`) DigitalMars toolchain, I see this:

snn
hard
omf With the Microsoft Build Tools (via VS 2017), compiling with `-m64` and `-m32mscoff`, I see this:

libcmt
hard
coff ### Yo! Linker! Take care of this, will ya?

D has long supported a lib pragma, allowing programmers to tell the compiler to pass a given library to the linker in source code rather than on the command line. Now, there’s a new pragma in town that let’s the programmer specify specific linker commands in source code and behaves rather differently. Meet the linkerDirective pragma:

pragma(linkerDirective, "/FAILIFMISMATCH:_ITERATOR_DEBUG_LEVEL=2"); The behavior is specified as “Implementation Defined”. The current implementation is specced to behave like so:
  • The string literal specifies a linker directive to be embedded in the generated object file.

  • Linker directives are only supported for MS-COFF output.

Just to make sure you didn’t gloss over the first list item, look at it again. The linker directive is not passed by the compiler to the linker, but emitted to the object file. Since it is only supported for MS-COFF, that means its only a thing for folks on Windows when they are compiling with -m64 or -m32mscoff. And some say the D community doesn’t care about Windows!

Of course there’s more!

The above are just a few cherries I picked from the list. For a deeper dive, see the full changelog. And head over to the Downloads page to get the installer for your platform. It looks a lot nicer than the boring list of files linked in the changelog.

Interfacing D with C: Arrays Part 1

Oct 17, 2018 • DBlogAdmin • #Code, #D and C, #Tutorials

This post is part of an ongoing series on working with both D and C in the same project. The previous post showed how to compile and link C and D objects. This post is the first in a miniseries focused on arrays.

When interacting with C APIs, it’s almost a given that arrays are going to pop up in one way or another (perhaps most often as strings, a subject of a future article in the “D and C” series). Although D arrays are implemented in a manner that is not directly compatible with C, the fundamental building blocks are the same. This makes compatibility between the two relatively painless as long as the differences are not forgotten. This article is the first of a few exploring those differences.

When using a C API from D, it’s sometimes necessary to translate existing code from C to D. A new D program can benefit from existing examples of using the C API, and anyone porting a program from C that uses the API would do well to keep the initial port as close to the original as possible. It’s on that basis that we’re starting off with a look at the declaration and initialization syntax in both languages and how to translate between them. Subsequent posts in this series will cover multidimensional arrays, the anatomy of a D array, passing D arrays to and receiving C arrays from C functions, and how the GC fits into the picture.

My original concept of covering this topic was much smaller in scope, my intent to brush over the boring details and assume that readers would know enough of the basics of C to derive the why from the what and the how. That was before I gave a D tutorial presentation to a group among whom only one person had any experience with C. I’ve also become more aware that there are regular users of the D forums who have never touched a line of C. As such, I’ll be covering a lot more ground than I otherwise would have (hence a two-part article has morphed into at least three). I urge those for whom much of said ground is old hat not to get complacent in their skimming of the page! A comfortable experience with C is more apt than none at all to obscure some of the pitfalls I describe.

Array declarations

Let’s start with a simple declaration of a one-dimensional array:

int c0[3]; This declaration allocates enough memory on the stack to hold three `int` values. The values are stored contiguously in memory, one right after the other. `c0` may or may not be initialized, depending on where it’s declared. Global variables and `static` local variables are default initialized to `0`, as the following C program demonstrates.

definit.c

#include <stdio.h>

// global (can also be declared static)
int c1[3];

void main(int argc, char** argv)
{
    static int c2[3];       // static local
    int c3[3];              // non-static local

    printf("one: %i %i %i\n", c1[0], c1[1], c1[2]);
    printf("two: %i %i %i\n", c2[0], c2[1], c2[2]);
    printf("three: %i %i %i\n", c3[0], c3[1], c3[2]);
}

For me, this prints:

one: 0 0 0
two: 0 0 0
three: -1 8 0 The values for `c3` just happened to be lying around at that memory location. Now for the equivalent D declaration:


int[3] d0; _[Try it online](https://run.dlang.io/is/moXqNt)_

Here we can already find the first gotcha.

A general rule of thumb in D is that C code pasted into a D source file should either work as it does in C or fail to compile. For a long while, C array declaration syntax fell into the former category and was a legal alternative to the D syntax. It has since been deprecated and subsequently removed from the language, meaning int d0[3] will now cause the compiler to scold you:

Error: instead of C-style syntax, use D-style int[3] d0 It may seem an arbitrary restriction, but it really isn’t. At its core, it’s about consistency at a couple of different levels.

One is that we read declarations in D from right to left. In the declaration of d0, everything flows from right to left in the same order that we say it: “(d0) is an (array of three) (integers)”. The same is not true of the C-style declaration.

Another is that the type of d0 is actually int[3]. Consider the following pointer declarations:

int* p0, p1; The type of both `p0` and `p1` is `int*` (in C, only `p0` would be a pointer; `p1` would simply be an `int`). It’s the same as all type declarations in D—type on the left, symbol on the right. Now consider this:


int d1[3], d2[3];
int[3] d4, d5; Having two different syntaxes for array declarations, with one that splits the type like an infinitive, sets the stage for the production of inconsistent and potentially confusing code. By making the C-style syntax illegal, consistency is enforced. Code readability is a key component of maintainability.

Another difference between d0 and c0 is that the elements of d0 will be default initialized no matter where or how it’s declared. Module scope, local scope, static local… it doesn’t matter. Unless the compiler is told otherwise, variables in D are always default initialized to the predefined value specified by the init property of each type. Array elements are initialized to the init property of the element type. As it happens, int.init == 0. Translate definit.c to D and see it for yourself (open up run.dlang.io and give it a go).

When translating C to D, this default initialization business is a subtle gotcha. Consider this innocently contrived C snippet:

// static variables are default initialized to 0 in C
static float vertex[3];
some_func_that_expects_inited_vert(vertex); A direct translation straight to D will not produce the expected result, as `float.init == float.nan`, not `0.0f`!

When translating between the two languages, always be aware of which C variables are not explicitly initialized, which are expected to be initialized, and the default initialization value for each of the basic types in D. Failure to account for the subtleties may well lead to debugging sessions of the hair-pulling variety.

Default initialization can easily be disabled in D with = void in the declaration. This is particularly useful for arrays that are going to be loaded with values before they’re read, or that contain elements with an init value that isn’t very useful as anything other than a marker of uninitialized variables.

float[16] matrix = void;
setIdentity(matrix); On a side note, the purpose of default initialization is not to provide a convenient default value, but to make uninitialized variables stand out (a fact you may come to appreciate in a future debugging session). A common mistake is to assume that types like `float` and `char`, with their “not a number” (`float.nan`) and invalid UTF–8 (`0xFF`) initializers, are the oddball outliers. Not so. Those values are great markers of uninitialized memory because they aren’t useful for much else. It’s the integer types (and `bool`) that break the pattern. For these types, the entire range of values has potential meaning, so there’s no single value that universally shouts “Hey! I’m uninitialized!”. As such, integer and `bool` variables are often left with their default initializer since `0` and `false` are frequently the values one would pick for explicit initialization for those types. Floating point and character values, however, should generally be explicitly initialized or assigned to as soon as possible.

Explicit array initialization

C allows arrays to be explicitly initialized in different ways:

int ci0[3] = {0, 1, 2};  // [0, 1, 2]
int ci1[3] = {1};        // [1, 0, 0]
int ci2[]  = {0, 1, 2};  // [0, 1, 2]
int ci3[3] = {[2] = 2, [0] = 1}; // [1, 0, 2]
int ci4[]  = {[2] = 2, [0] = 1}; // [1, 0, 2]

What we can see here is:

  • elements are initialized sequentially with the constant values in the initializer list

  • if there are fewer values in the list than array elements, then all remaining elements are initialized to 0 (as seen in ci1)

  • if the array length is omitted from the declaration, the array takes the length of the initializer list (ci2)

  • designated initializers, as in ci3, allow specific elements to be initialized with [index] = value pairs, and indexes not in the list are initialized to 0

  • when the length is omitted from the declaration and a designated initializer is used, the array length is based on the highest index in the initializer and elements at all unlisted indexes are initialized to 0, as seen in ci4

Initializers aren’t supposed to be longer than the array (gcc gives a warning and initializes a three-element array to the first three initializers in the list, ignoring the rest).

Note that it’s possible to mix the designated and non-designated syntaxes in a single initializer:

// [0, 1, 0, 5, 0, 0, 0, 8, 44]
int ci5[] = {0, 1, [3] = 5, [7] = 8, 44};

Each value without a designation is applied in sequential order as normal. If there is a designated initializer immediately preceding it, then it becomes the value for the next index, and all other elements are initialized to 0. Here, 0 and 1 go to indexes ci5[0] and ci5[1] as normal, since they are the first two values in the list. Next comes a designator for ci5[3], so ci5[2] has no corresponding value in this list and is initialized to 0. Next comes the designator for ci5[7].  We have skipped ci5[4], ci5[5], and ci5[6],  so they are all initialized to 0. Finally, 44 lacks a designator, but immediately follows [7], so it becomes the value for the element at ci5[8]. In the end, ci5 is initialized to a length of 9 elements.

Also note that designated array initializers were added to C in C99. Some C compiler versions either don’t support the syntax or require a special command line flag to enable it. As such, it’s probably not something you’ll encounter very much in the wild, but still useful to know about when you do.

Translating all of these to D opens the door to more gotchas. Thankfully, the first one is a compiler error and won’t cause any heisenbugs down the road:

int[3] wrong = {0, 1, 2};
int[3] right = [0, 1, 2];

Array initializers in D are array literals. The same syntax can be used to pass anonymous arrays to functions, as in writeln([0, 1, 2]). For the curious, the declaration of wrong produces the following compiler error:

Error: a struct is not a valid initializer for a int[3] The `{}` syntax [is used for `struct` initialization](https://dlang.org/spec/struct.html#static_struct_init) in D (not to be confused with struct literals, which can also [be used to initialize a `struct` instance](https://dlang.org/spec/struct.html#struct-literal)).

The next surprise comes in the translation of ci1.

// int ci1[3] = {1};
int[3] di1 = [1];

This actually produces a compiler error:

Error: mismatched array lengths, 3 and 1 What gives? First, take a look at the translation of `ci2`:
// int ci2[] = {0, 1, 2};
int[] di2 = [0, 1, 2];

In the C code, there is no difference between ci1 and ci2. They both are fixed-length, three-element arrays allocated on the stack. In D, this is one case where that general rule of thumb about pasting C code into D source modules breaks down.

Unlike C, D actually makes a distinction between arrays of types int[3] and int[]. The former is, like C, a fixed-length array, commonly referred to in D as a static array. The latter, unlike C, is a dynamic-length array, commonly referred to as a dynamic array or a slice. Its length can grow and shrink as needed.

Initializers for static arrays must have the same length as the array. D simply does not allow initializers shorter than the declared array length. Dynamic arrays take the length of their initializers. di2 is initialized with three elements, but more can be appended. Moreover, the initializer is not required for a dynamic array. In C, int foo[]; is illegal, as the length can only be omitted from the declaration when an initializer is present.

// gcc says "error: array size missing in 'illegalC'"
// int illegalC[]
int[] legalD;
legalD ~= 10; `legalD` is an empty array, with no memory allocated for its elements. Elements can be added via the append operator, `~=`.

Memory for dynamic arrays is allocated at the point of declaration only when an explicit initializer is provided, as with di2. If no initializer is present, memory is allocated when the first element is appended. By default, dynamic array memory is allocated from the GC heap (though the compiler may determine that it’s safe to allocate on the stack as an optimization) and space for more elements than needed is initialized in order to reduce the need for future allocations (the reserve function can be used to allocate a large block in one go, without initializing any elements). Appended elements go into the preallocated slots until none remain, then the next append triggers a new allocation. Steven Schveighoffer’s excellent array article goes into the details, and also describes array features we’ll touch on in the next part.

Often, when translating a declaration like ci2 to D, the difference between the fixed-length, stack-allocated C array and the dynamic-length, GC-allocated D array isn’t going to matter one iota. One case where it does matter is when the D array is declared inside a function marked @nogc:

@nogc void main()
{
    int[] di2 = [0, 1, 2];
}

Try it online

The compiler ain’t letting you get away with that:

Error: array literal in @nogc function D main may cause a GC allocation The same error isn’t triggered when the array is static, since it’s allocated on the stack and the literal elements are just shoved right in there. New C programmers coming to D for the first time tend to reach for `@nogc` almost as if it goes against their very nature not to, so this is something they will bump into until they eventually come to the realization that [the GC is not the enemy of the people](https://dlang.org/blog/the-gc-series/).

To wrap this up, that big paragraph on designated array initializers in C is about to pull double duty. D also supports designated array initializers, just with a different syntax.

// [0, 1, 0, 5, 0, 0, 0, 8, 44]
// int ci5[] = {0, 1, [3] = 5, [7] = 8, 44};
int[] di5 = [0, 1, 3:5, 7:8, 44];
int[9] di6 = [0, 1, 3:5, 7:8, 44];

Try it online

It works with both static and dynamic arrays, following the same rules and producing the same initialization values as in C.

The main takeaways from this section are:

  • there is a distinction in D between static and dynamic arrays, in C there is not

  • static arrays are allocated on the stack

  • dynamic arrays are allocated on the GC heap

  • uninitialized static arrays are default initialized to the init property of the array elements

  • dynamic arrays can be explicitly initialized and take the length of the initializer

  • dynamic arrays cannot be explicitly initialized in @nogc scopes

  • uninitialized dynamic arrays are empty

This is the time on the D Blog when we dance

There are a lot more words in the preceding sections than I had originally intended to write about array declarations and initialization, and I still have quite a bit more to say about arrays. In the next post, we’ll look at the anatomy of a D array and dig into the art of passing D arrays across the language divide.

Symmetry Autumn of Code is Underway

Sep 15, 2018 • DBlogAdmin • #Community, #Companies, #D Foundation, #SAoC

Earlier this year, Laeeth Isharc brought an idea to the D Foundation for Symmetry Investments to sponsor a summer of code. He was eager to provide a few motivated individuals the incentive to get some great work done for D and the D community. Sporadic email discussions preceded some chatting at DConf 2018 Munich and the idea subsequently began to pick up steam. By the time the details were sketched out, it had transformed into the Symmetry Autumn of Code.

The SAoC projects

Eight applicants submitted their resumes and project proposals for three slots. Nearly all of the proposals were taken from the SAoC suggestions page at the D Wiki. Given the limited window, the selection process went fairly quick. Of the three selected, two had mentors attached. It took a little while to find a mentor for the third selection, so we extended the milestone deadline for that participant. Now I can happily say that all three are well underway.

A fork-based concurrent GC for DRuntime

Francesco Mecca proposed this project. The goal is to take Leandro Lucarella’s original D1 fork-based GC, port it to D2, adapt it to DRuntime’s GC interface, and culminate with a pull request for DRuntime to present the work and open discussion. Leandro agreed to mentor this project and is working with Francesco to develop a test suite that the port must pass as part of Milestone 2. They have also included documentation in their milestone list, which is good news.

vibe.d HTTP/2 implementation

This one came from Francesco Galla, who is currently pursuing a MSc in Network and Security. The goal here is to enhance the vibe-http library to support HTTP/2. Fittingly, Sönke Ludwig, the maintainer of vibe.d, agreed to mentor the project, but requested someone to share the load due to his schedule. Sebastian Wilzbach stepped up as co-mentor.

This project involves rewriting the current HTTP/1 API, ensuring it works as expected, then incrementally adding support for HTTP/2. Portions of the rewrite were already completed before SAoC came along, but had not yet been tested. As such, testing and bug fixing will be a significant portion of the first milestone.

Porting the Mago debugger to D

This project was proposed by László Szerémi. He’s a heavy user of debuggers in D and wants to see the situation improved. He believes that porting the Mago debugger to D is a major step in that direction. His first two milestones are concerned with translating, testing, and bug fixing. To cap off the SAoC event, he intends to get a GUI frontend up and running with some basic features.

László had no mentor when he applied, and no one had specifically volunteered to mentor any debugger projects, so we put out a call in the forums and I reached out directly to a few people. In the end, Stefan Koch agreed to take it on.

SAoC is not the end

I know that one of the goals Laeeth had behind his initial suggestion for this event was to enhance the D ecosystem. None of the selected projects are simple, one-shot tasks. These are projects which will all require attention, care, and effort beyond SAoC. The participants are getting them started and we hope they’ll continue to maintain them for some time to come, but in the end, these are projects for the community. When SAoC 2018 is behind us, it will ultimately be up to the community to determine if the projects live long and prosper or die young.

I’ll post more about SAoC and the participants as the event goes on. We wish them the best in meeting their milestones!

DMD 2.082.0 Released

Sep 4, 2018 • DBlogAdmin • #Code, #Compilers & Tools, #DMD Releases

DMD 2.082.0 was released over the weekend. There were 28 major changes and 76 closed Bugzilla issues in this release, including some very welcome improvements in the toolchain. Head over to the download page to pick up the official package for your platform and visit the changelog for the details.

Tooling improvements

While there were several improvements and fixes to the compiler, standard library, and runtime in this release, there were some seemingly innocuous quality-of-life changes to the tooling that are sure to be greeted with more enthusiasm.

DUB gets dubbier

DUB, the build tool and package manager for D that ships with DMD, received a number  of enhancements, including better dependency resolution, variable support in the build settings, and improved environment variable expansion.

Arguably the most welcome change will be the removal of the regular update check. Previously, DUB would check for dependency updates once a day before starting a project build. If there was no internet connection, or if there were any errors in dependency resolution, the process could hang for some time. With the removal of the daily check, upgrades will only occur when running dub upgrade in a project directory. Add to that the brand new --dry-run flag to get a list of any upgradable dependencies without executing the upgrades.

Signed binaries for Windows

For quite some time users of DMD on Windows have had the annoyance of seeing a warning from Windows Smartscreen when running the installer, and the occasional false positive from AntiVirus software when running DMD.

Now those in the Windows D camp can do a little victory dance, as all of the binaries in the distribution, including the installer, are signed with the D Language Foundation’s new code signing certificate. This is one more quality-of-life issue that can finally be laid to rest. On a side note, the cost of the certificate was the first expense entered into our Open Collective page.

Compiler and libraries

Many of the changes and updates in the compiler and library department are unlikely to compel anyone to shout from the rooftops, but a handful are nonetheless notable.

The compiler

One such is an expansion of the User-Defined Attribute syntax. Previously, these were only allowed on declarations. Now, they can be applied to function parameters:

// Previously, it was illegal to attach a UDA to a function parameter
void example(@(22) string param)
{
    // It's always been legal to attach UDAs to type, variable, and function declarations.
    @(11) string var;
    pragma(msg, [__traits(getAttributes, var)] == [11]);
    pragma(msg, [__traits(getAttributes, param)] == [22]);
}

Run this example online

The same goes for enum members (it’s not explicitly listed in the highlights at the top of the changelog, but is mentioned in the bugfix list):

enum Foo {
@(10) one,
@(20) two,
}

void main()
{
pragma(msg, [__traits(getAttributes, Foo.one)] == [10]);
pragma(msg, [__traits(getAttributes, Foo.two)] == [20]);
}

Run this example online

The DasBetterC subset of D is enhanced in this release with some improvements. It’s now possible to use array literals in initializers. Previously, array literals required the use of TypeInfo, which is part of DRuntime and therefore unavailable in -betterC mode. Moreover, comparing arrays of structs is now supported and comparing arrays of byte-sized types should no longer generate any linker errrors.

import core.stdc.stdio;
struct Sint
{
    int x;
    this(int v) { x = v;}
}

extern(C) void main()
{
    // No more TypeInfo error in this initializer
    Sint[6] a1 = [Sint(1), Sint(2), Sint(3), Sint(1), Sint(2), Sint(3)];
    foreach(si; a1) printf("%i\n", si.x);

    // Arrays/slices of structs can now be compared
    assert(a1[0..3] == a1[3..$]);

    // No more linker error when comparing strings, either explicitly
    // or implicitly such as in a switch.
    auto s = "abc";
    switch(s)
    {
        case "abc":
            puts("Got a match!");
            break;
        default:
            break;
    }

    // And the same goes for any byte-sized type
    char[6] a = [1,2,3,1,2,3];
    assert(a[0..3] >= a[3..$]);

    puts("All the asserts passed!");
}

Run this example online

DRuntime

Another quality-of-life fix, this one touching on the debugging experience, is a new run-time flag that can be passed to any D program compiled against the 2.082 release of the runtime or later, --DRT-trapException=0. This allows exception trapping to be disabled from the command line.

Previously, this was supported only via a global variable, rt_trapExceptions. To disable exception trapping, this variable had to be set to false before DRuntime gained control of execution, which meant implementing your own extern(C) main and calling _d_run_main to manually initialize DRuntime which, in turn, would run the normal D main—all of which is demonstrated in the Tip of the Week from the August 7, 2016, edition of This Week in D (you’ll also find there a nice explanation of why you might want to disable this feature. HINT: running in your debugger). A command-line flag is sooo much simpler, no?

Phobos

The std.array module has long had an array function that can be used to create a dynamic array from any finite range. With this release, the module gains a staticArray function that can do the same for static arrays, though it’s limited to input ranges (which includes other arrays). When the length of a range is not knowable at compile time, it must be passed as a template argument. Otherwise, the range itself can be passed as a template argument.

import std.stdio;
void main()
{
    import std.range : iota;
    import std.array : staticArray;

    auto input = 3.iota;
    auto a = input.staticArray!2;
    pragma(msg, is(typeof(a) == int[2]));
    writeln(a);
    auto b = input.staticArray!(long[4]);
    pragma(msg, is(typeof(b) == long[4]));
    writeln(b);
}

Run this example online

September pumpkin spice

Participation in the #dbugfix campaign for this cycle was, like last cycle, rather dismal. Even so, I’ll have an update on that topic later this month in a post of its own.

Three of eight applicants were selected for the Symmetry Autumn of Code, which officially kicked off on September 1. Stay tuned here for a post on that topic as well.

The blog has been quiet for a few weeks, but the gears are slowly and squeakily starting to grind again. Other posts lined up for this month include the next long-overdue installment in the GC Series and the launch of a new ‘D in Production’ profile.

Funding code-d

Jul 13, 2018 • DBlogAdmin • #Community, #D Foundation, #Donations

At the DConf 2018 Hackathon, I announced that the D Language Foundation would be raising money to further the development of the suite of tools comprising code-d, the Visual Studio Code plugin. The project is developed and maintained by Jan Jurzitza, a.k.a. Webfreak001.

The plan was that if we reached $3,000 in donations on our Open Collective page, we’d make the money available to Jan. When we finally reached the goal, I was quite happy to tweet about it:

Thanks to a burst of donations in the past few days, we’ve hit our goal at the #dlang Open Collective to support the code-d/serve-d for VS Code. Great work, everyone! Details to come. https://t.co/T1KtXJS7gL

— D Language (@D_Programming) July 1, 2018

Jan and I still hadn’t finalized the details at that point, but we finally got it all sorted a few days later. Before I show how we’ll be using the money, I’d first like to talk about why we’re doing this.

The motivation

As the D programming language community has grown, so has the amount of work that needs to be done. Since D is a community-driven language, much of the work that gets completed is done by volunteers. Unfortunately, the priorities for those who actively contribute aren’t necessarily aligned with those of everyone else, and this can lead to some users feeling that there is little progress at all, or that their concerns are being deliberately ignored. That’s just the nature of the beast. More potential pain points and unfulfilled wishlists means there’s a continual need for more manpower.

Therein lies one of the most fundamental issues that has plagued D’s development for years: how can more people be motivated to directly participate so we can keep pace with the consequences of progress?

To be clear, everyone involved in the core team is aware that lack of manpower is not the only negative side effect we suffer from growth, but it is certainly a big one. Part of my role these days is to figure out, along with Sebastian Wilzbach, ways to mitigate it. My inbox is littered with conversations exploring this topic, along with recommendations from Andrei to research how other projects handle one thing or another for inspiration.

To that end, in early February I launched the #dbugfix campaign with the goal of increasing community participation in the bug fixing process by providing a simple way for folks to bring attention to their pet peeve issues (keep it going people – participation has slowed and we need to pick it up!). We also announced the State of D Survey, initiated by Seb, at the end of February to get an idea of what some of the personal pain points are and where people think things are going swell. We’ve got other ideas in development, one of which I’ll be telling you about in the coming days.

Among the data that jumped out at us from the survey is that Visual Studio and Visual Studio Code were the most popular IDEs/editors among participants. The plugins for both Visual Studio (Visual D) and VS Code (code-d/serve-d/workspace-d) are each maintained by solo developers.

That’s a heavy workload for each of them. Look at the issues list for code-d and you’ll find several there (39 open/132 closed as I write), but the PR list is much shorter (1 open/21 closed as I write). What I infer from that is that the demand for improvements is higher than the supply for implementing them. And from personal experience, I know there are projects like code-d whose maintainers would love to add new features, or improve existing ones, but simply can’t (or aren’t motivated to) make the time for it.

Here then is a good place for a little experiment. What if the D Language Foundation started raising money to help fund the development of specific projects like code-d? By creating the opportunity for those with extra money but little time to directly contribute toward fixing their personal pain points and shrinking their wishlists, can development be focused toward getting bugs fixed and improvements implemented more quickly?

Long story short, I picked the VS Code plugin for our trial run because it was prominent in the survey data, it’s cross-platform, and the serve-d component can be beneficial to other IDE/editor plugins. Open Collective launched their goal system just as I was trying to determine how best to go about it, so it seemed the obvious choice (though we didn’t learn until after our announcement that it wasn’t the system we thought it was, i.e. it’s not a system of multiple, targeted fund raising goals but instead a milestone on the global donation progress bar at the top of our Open Collective page). I contacted Jan just before DConf to gauge his interest, then finally had a meeting with the Foundation board the night before the Hackathon to get approval and decide how to go about doling out the money (we decided the money should be paid out in increments based on completion of milestones, with the option to provide money as needed for targeted development costs, e.g. a Windows license, bug bounties, etc). It all came together fairly quickly.

We plan to continue this initiative and raise money for more work in the future. Some will be like this trial run, with the goal of motivating project maintainers to focus development and spend that extra hour or two a day to get things done. Some will be aimed at specific development tasks, as bounties or as payment for a single part-time or full-time developer. We’ll experiment and see what works.

We won’t be using Open Collective’s goals for this (or any other service) going forward. Instead, we’re going to add a page to the web site where we can define targets, allow donations through Open Collective or PayPal, and track donation progress. Each target will allow us to lay out exactly what the donations are being used for, so potential donors can see in advance where their money is going. We’ll be using the State of D Survey as a guide to begin with, but we’ll always be open to suggestions, and we’ll adapt to what works over what doesn’t as we go along.

As long as there are people willing to put either time or money into the D language, community, and ecosystem, this will surely pay off in the long run with more visible progress, hopefully solving a wider range of pain points than currently happens in our “what-I-want-to-work-on” system of community development. We’re looking forward to see the synergy that results from this and other initiatives going forward.

The milestones

Jan has some specific ideas about how he wants to improve his plugin tools and was happy to take this opportunity to put them all front and center. I asked him to make a list, then estimate how much time it would take to complete each task if he were working on it full time, then divide that list into roughly equal milestones of 3–4 weeks each. That does not mean he will be working full time, nor that each milestone will take 3–4 weeks; payment is not dependent on deadlines. It was simply a means of organizing his task list.

We ended up with two sets of tasks as a starting point and agreed to a $500 payout upon completion of both milestones. Once all the tasks in a milestone are completed, the $500 will be paid. At the end of the second milestone, we’ll determine what to do next – another milestone of feature improvements, a heavy round of bug fixing, or something else – and how to make use of the remaining $2000.

Following is a list of the tasks in both of the initial milestones, with a brief description of each directly from Jan (lightly edited).

Improvements & maintainence

  • add automated tests for windows 7, 10, archlinux, ubuntu, fedora, osx for startup, from scratch project initialization, reloading, adding dependencies
    There are automated tests in the workspace-d repo which test different scenarios and combinations with dub, DCD, and d-scanner, ranging from general usage to weird setups. Identifying more issues (especially on windows), creating more tests and especially running all the tests on several platforms would be a good first step towards a less buggy version of serve-d which should work on all platforms equally well.

  • Single file mode
    A requested feature and one which makes the usage of the plugin a lot easier would be single file usage. Currently, single file usage is very limited and without an open workspace it doesn’t work at all. Adding this (with quick startup time and low resource usage) will make code-d a good alternative to vim or notepad for quickly editing a D file.

  • fix unknown DDoc macro parsing
    Currently, if any unknown DDoc macros are used, they are omitted from the documentation renderer, which makes some documentation unreadable and weird. For this, libddoc needs to be worked on by identifying where exactly the macros are discarded, how they could be kept track of, and how to make it more obvious in the API without breaking backwards compatibility.

  • project template creator as separate plugin (with standardized template format if possible)
    Currently the “Create Project” command is a very static command reading from a directory in the plugin. Making this a separate VS Code plugin with an API for other plugins to extend with more functionality would make it more flexible and open it to new platforms and APIs. Also, research on if there are already some standards for this and if we can parse other template formats.

  • detect used imports & add command to remove unused ones & add Remove unused imports + sort command
    Add a way to find out if imports are used and add a command to remove unused ones. DCD could probably be extended for this.

  • code folding on grammar
    The expand/collapse feature on arrays & functions (bracket based) and lambdas. (Instead of indentation based)

  • parse dub commandline arguments
    Dub command line arguments should be able to be passed into the build buttons and future tasks. This will require parsing them because dub is used as an API and not as a program.

  • experiment with adding a mode to DCD to make all symbols available
    A mode to always give all public symbols regardless of imports and additionally provide the import where the symbol comes from (so when completing it, the IDE adds an import)

UX & Tools for beginners

  • improve UI/UX for building and running projects (implement tasks api) 
    As the tasks API is now actually released, I can incorporate the different build types into the task system of VS Code. This would include having build (debug, release), run, test for each detected dub project and submodule.

  • **make workspace symbol search show more symbols **
    The workspace symbol search (Ctrl-P -> #) currently is using just DCD to find exact matches for types. I would include DScanner ctag/etag results in this response to make the search better.

  • code action: improve implement interface (parse existing functions and not include them)
    Test cases need to be written, existing interfaces must be parsed, cross-file it should all still work, inherit documentation, etc. Need to check in real world applications where interfaces and classes are used. Also add support for abstract classes.

  • auto complete in diet templates (both HTML tags and D code)
    Add static HTML auto completion for tags & attributes and create a virtual D file for D code for DCD for autocompletion.

  • code action: create getter/setter for variable
    For code like int _foo;, create the property functions int foo() @property const { return _foo; } int foo(int value) @property { return _foo = value; } with automatic detection if const can be used. This makes writing classes easier (snippets already exist, but this is more intuitive) and shows beginners how properties are created. The property should be added where other properties are. It could check the comments for “Getters”, “Setters” and “Properties” if possible, otherwise just put it after a block of members.

  • code action: convert boolean parameter to Flag (+import)
    This should change the definition of a function void foo(bool createDirectory) to alias CreateDirectory = Flag!"createDirectory"; void foo(CreateDirectory createDirectory) and add the appropriate import if it doesn’t exist yet.

  • code action: make foreach parallel (with import if neccessary)

Replace a foreach (a; array) with foreach (a; parallel(array)) and import std.parallelism.

  • code action: convert enum to bitflag enum (isBitFlagEnum assert + modify values)
    Convert any enum enum Foo { a, b, c } to enum Foo { a = 1 << 0, b = 1 << 1, c = 1 << 2 }, add static assert(isBitFlagEnum!Foo); and import std.typecons if neccessary.

  • search for unknown symbol or import online
    Use dpldocs.info. Example: import mongoschema; -> Error importing -> Search online code fix -> Add dub package mongoschemad, etc. For undefined symbols the same: check local projects for symbols, otherwise see if online providers know anything.

  • show local references from DCD & local rename
    Highlight a symbol in the document when hovering on a member and provide the ability to rename it (not cross-file, because DCD doesn’t support that).

Thank you

Thanks again to all of the donors who pushed us over the goal line for this. Though we realize not everyone will be entirely pleased with the milestone list, and that it would have been better if we had finalized everything before asking for donations rather than coming at it haphazardly like we did, we hope that most of you find more here to like than not.

It will be a little while yet before we set up the new system and announce the next goal, as we’ve got other initiatives taking priority at the moment. So please be patient. We haven’t forgotten about the State of D Survey, nor are we going to let the results sit and bitrot. This train may be slow sometimes, but it’s rolling on.

DMD 2.081.0 Released

Jul 4, 2018 • DBlogAdmin • ##dbugfix, #Code, #Compilers & Tools, #D Foundation, #DMD Releases

DMD 2.081.0 is now ready for download. Things that stand out in this release are a few deprecations, the implementation of a recently approved DIP (D Improvement Proposal), and quite a bit of work on C++ compatibility. Be sure to check the changelog for details.

Improving C++ interoperability

D has had binary compatibility with C from the beginning not only because it made sense, but also because it was relatively easy to implement. C++ is a different beast. Its larger-than-C feature set and the differences between it and D introduce complexities that make implementing binary compatibility across all supported platforms a challenge to get right. Because of this, D’s extern(C++) feature has been considered a work in progress since its initial inception.

DMD 2.081.0 brings several improvements to the D <-> C++ story, mostly in the form of name mangling bug fixes and improvements. The mangling of constructors and destructors in extern(C++) now properly match the C++ side, as does that of most of D’s operator overloads (where they are semantically equivalent to C++).

Proper mangling of nullptr_t is implemented now as well. On the D side, use typeof(null):

alias nullptr_t = typeof(null);
extern(C++) void fun(nullptr_t); The alias in the example is not required, but may help with usability and readability when interfacing with C++. As typing `null` everywhere is likely reflexive for most D programmers, `nullptr_t` may be easier to keep in mind than `typeof(null)` for those special C++ cases.

Most of the D operator overloads in an extern(C++) class will now correctly mangle. This means it’s now possible to bind to operator overloads in C++ classes using the standard D opBinary, opUnary, etc. The exceptions are opCmp, which has no compatible C++ implementation, and the C++ operator!, which has no compatible D implementation.

In addition to name mangling improvements, a nasty bug where extern(C++) destructors were being placed incorrectly in the C++ virtual table has been fixed, and extern(C++) constructors and destructors now semantically match C++. This means mixed-language class hierarchies are now possible and you can now pass extern(C++) classes to object.destroy when you’re done with them.

Indirectly related,  __traits(getLinkage, ...) has been updated to now tell you the ABI with which a struct, class, or interface has been declared, so you can now filter out your extern(C++) aggregates from those which are extern(D) and extern(Objective-C).

The following shows some of the new features in action. First, the C++ class:

#include <iostream>
class CClass {
private:
    int _val;
public:
    CClass(int v) : _val(v) {}
    virtual ~CClass() { std::cout << "Goodbye #" << _val << std::endl; }
    virtual int getVal() { return _val; }
    CClass* operator+(CClass const * const rhs);
};

CClass* CClass::operator+(CClass const * const rhs) {
    return new CClass(_val + rhs->_val);
}

And now the D side:

extern(C++) class CClass
{
    private int _val;
    this(int);
    ~this();
    int getVal();
    CClass opBinary(string op : "+")(const CClass foo);
}

class DClass : CClass
{
    this(int v)
    {
        super(v);
    }
    ~this() {}
    override extern(C++) int getVal() { return super.getVal() + 10; }
}

void main()
{
    import std.stdio : writeln;

    writeln("CClass linkage: ", __traits(getLinkage, CClass));
    writeln("DClass linkage: ", __traits(getLinkage, DClass));

    DClass clazz1 = new DClass(5);
    scope(exit) destroy(clazz1);
    writeln("clazz1._val: ", clazz1.getVal());

    DClass clazz2 = new DClass(6);
    scope(exit) destroy(clazz2);
    writeln("clazz2._val: ", clazz2.getVal());

    CClass clazz3 = clazz1 + clazz2;
    scope(exit) destroy(clazz3);
    writeln("clazz3._val: ", clazz3.getVal);
}

Compile the C++ class to an object file with your C++ compiler, then pass the object file to DMD on the command line with the D source module and Bob’s your uncle (just make sure on Windows to pass -m64 or -m32mscoff to dmd if you compile the C++ file with the 64-bit or 32-bit Microsoft Build Tools, respectively).

This is still a work in progress and users diving into the deep end with C++ and D are bound to hit shallow spots more frequently than they would like, but this release marks a major leap forward in C++ interoperability.

DIP 1009

Given the amount of time DIP 1009 spent crawling through the DIP review process, it was a big relief for all involved when it was finally approved. The DIP proposed a new syntax for contracts in D. For the uninitiated, the old syntax looked like this:

int fun(ref int a, int b)
in
{
    // Preconditions
    assert(a > 0);
    assert(b >= 0, "b cannot be negative");
}
out(result) // (result) is optional if you don't need to test it
{
    // Postconditions
    assert(result > 0, "returned result must be positive");
    assert(a != 0);
}
do
{
 	// The function body
    a += b;
    return b * 100;
}

Thanks to DIP 1009, starting in DMD 2.081.0 you can do all of that more concisely with the new expression-based contract syntax:

int fun(ref int a, int b)
    in(a > 0)
    in(b >= 0, "b cannot be negative")
    out(result; result > 0, "returned result must be positive")
    out(; a != 0)
{
    a += b;
    return b * 100;
}

Note that result is optional in both the old and new out contract syntaxes and can be given any name. Also note that the old syntax will continue to work.

Deprecations

There’s not much information to add here beyond what’s already in the changelog, but these are cases that users should be aware of:

The #dbugfix campaign

The inaugural #dbugfix round prior to the release of DMD 2.080 was a success, but Round 2 has been much, much quieter (few nominations, very little discussion, and no votes).

One of the two nominated bugs selected from Round 1 was issue #18068. It was fixed and merged into the new 2.081.0 release. The second bug selected was issue #15984, which has not yet been fixed.

In Round 2, the following bugs were nominated with one vote each:

I’ll hand this list off to our team of bug fixing volunteers and hope there’s something here they can tackle.

Round 3 of the #dbugfix campaign is on now. Please nominate the bugs you want to see fixed! Create a thread in the General Forum with #dbugfix and the issue number in the title, or send out a tweet containing #dbugfix and the issue number. I’ll tally them up at the end of the cycle (September 28).

And please, if you do use the #dbugfix in a tweet, remember that it’s intended for nominating bugs you want fixed and not for bringing attention to your pull requests!

How an Engineering Company Chose to Migrate to D

Jun 20, 2018 • BasVeel • #Code, #Companies, #Compilers & Tools, #Guest Posts, #User Stories

Bastiaan Veelo is the lead developer of a specialised program for the computer aided geometric design of ship hulls called Fairway, for the company SARC in the Netherlands.


Imagine there is this little-known programming language in which you enjoy programming in your free time. You know it is ready for prime time and you dream about using it at work everyday. This is the story about how I made a dream like that come true.

My early acquaintance with D

Back when “google” was not yet a common verb, I was doing a web search for “parsing C++”. The reason was that writing a report for an assignment had derailed into writing a syntax highlighter for noweb using bison and flex, and I found out firsthand that C++ is not easy to parse. That web search brought up this page (present version) with an overview of the D Programming Language, and the following statement has had me hooked ever since:

D’s lexical analyzer and parser are totally independent of each other and of the semantic analyzer. This means it is easy to write simple tools to manipulate D source perfectly without having to build a full compiler. It also means that source code can be transmitted in tokenized form for specialized applications.

“Genius,” I thought, “here we have someone who knows what he’s doing.” This is representative of the pragmatic professionalism that still radiates from the D community, and it combines with an unpretentious flair that makes it pleasant to be around. This funny quote decorated its homepage for many years:

“Great, just what I need.. another D in programming.” – Segfault

Nevertheless, I didn’t have many opportunities to use the language and I largely remained sitting on the fence, observing its development.

Programming professionally

With mostly academic programming experience, I started programming professionally in 2006 for SARC, a Dutch engineering company serving the maritime industry. Since the early ’80s they have been developing software for ship design and onboard loading calculations, which today amounts to roughly half a million lines of code. I think their success can partly be attributed to their choice of programming language: Extended Pascal (the ISO 10206 standard, not one of the many proprietary extensions of Pascal).

Extended Pascal was a great improvement over ISO 7185 Pascal. Its compiler, by Prospero Software from England, was fast and well documented. The language is small enough and its syntax appropriately verbose to make engineering professionals quickly productive in programming. Personally though, I spent most of my time programming in C++, modernizing their system for computer aided design of ship hulls using Qt and Coin3D.

When your company outlives a programming language

Although selecting an ISO standard in favor of a proprietary Pascal dialect seemed wise at the time, it is apparent now that the company has outlived the language. Prospero Development Software Ltd was officially dissolved 15 years ago. Still, its former director, Tony Hetherington, continued giving support many years after, but he’d be close to 86 years old now and can no longer be reached. Its website is gone, last archived in 2013. There’s GNU Pascal, which also supports ISO 10206, but that project has stopped moving and long ago lost synchrony with gcc. Although there is no immediate crisis, it is clear that something needs to happen sometime if the company wants to continue its activities in the coming decades.

Changing the odds

A couple of years ago, I secretly started playing with the fantasy of replacing Extended Pascal with D. Even though D’s syntax is somewhat different from Pascal, it shares at least four important similarities: support for nested functions, boundary checking, modules, and compilation speed. In addition, it has many traits that make the language attractive to engineers: good focus on performance and numerics, garbage collection, dynamic arrays, easy parallelization, understandable templates, contract programming, memory safety, unit tests, and even wysiwyg strings and formatted numerals. D’s language features encourage experimentation, which resonates well with engineers.

So I wondered what I could do to highlight D’s significance to my employer and show it’s an attractive language to switch to. I thought I could make a compelling case if I could write a parser in D that would take Extended Pascal source and transpile it to D source. At least I would have fun trying!

So I went over to code.dlang.org to see if there were any D alternatives to flex and bison. There, I found Pegged, and instantly the fun began. Pegged combines the functionality of flex and bison in one incredibly easy to use package, for which its creator Philippe Sigaud obviously enjoyed writing excellent documentation. Nowadays, Pegged is part of the D language tour and you can try it out on-line without having to install a thing. The beauty is that the grammar from the Extended Pascal language specification maps almost linearly to the PEG from which Pegged generates the parser. For this it makes heavy use of D’s generic programming capabilities and compile-time function evaluation — it can generate a parser at compile time if you want it to!

However, it wasn’t smooth sailing all along. As I was testing D, I suddenly found myself being tested as well. I learned the hard way that there is a phenomenon called left-recursion, from which a PEG parser typically cannot break out of. And the Extended Pascal grammar is left-recursive in several ways. Consequently, I spent many evenings and weekends researching parsing theory, until eventually I managed to extend Pegged with support for all kinds of left-recursion! From one thing came another, and I added longest match alternation, case insensitive literals, the toHTML() method for dynamically browsing the syntax tree, and a tracer for logging the parsing process.

Obviously, I was having fun. But more importantly, I was demonstrating that the D programming language is accessible enough that a naval architect can understand other people’s code and expand it in non-trivial ways. The icing on the cake came when I was asked to present my experiences at DConf 2017 in Berlin, which you can watch here (and here’s the extra bit I presented at lunch time for the livestream audience).

At this time, I was able to automatically translate the following trivial example:

program hello(output);

begin
    writeln('Hello D''s "World"!');
end.

into D:

import std.stdio;

// Program name: hello
void main(string[] args)
{
    writeln("Hello D's \"World\"!");
}

Language competition

Having come this far, the founder of SARC agreed that it was time to investigate the merits of various alternative programming languages. We would do a thorough and objective comparison based on trial translations of a comprehensive set of language features. Due to the amount of manual labor that this requires, we had to drastically prune the space of programming languages in an initial review round. Note that what I am about to present does not declare which programming language is the best in our industry. What we are looking for is a language that allows an efficient transition from Extended Pascal without interrupting our business, and which enables us to take advantage of modern insights and tools.

In the initial review round we looked at general language characteristics. Here I’ll just highlight what fell through the sieve and why.

Performance is important to us, which is why we did not consider interpreted languages. C++ is in use for one component of our software, but that was written from the ground up. We feel that the options for translation are not favorable, that its long compile times are a serious hindrance to productivity, and that there are too many ways in which one can shoot one’s self in the foot. We cannot require our expert naval architects to also become experts in C++.

Nowadays, whenever D is publicly evaluated, the younger languages Go and Rust are often brought up as alternatives. Here, we need not go into an in-depth comparison of these languages because both Rust and Go lack one feature that we rely on heavily: nested functions with access to variables in their enclosing scope. Solutions for eliminating nested functions, like bringing them into global scope and passing extra variables, or breaking files up into smaller modules, we find unattractive because it would complicate automated translation, and we’d like to preserve the structure and style of our current code. GNU C does offer nested functions, but it is a non-standard extension and it has been predicted that many will move away from C due to its unsafe features. After this initial pruning, three languages remained on our shortlist: Free Pascal, Ada and D.

As a basis for our detailed comparison, we wrote fifteen small programs that each used a specific feature of Extended Pascal that is important in our current code base. We then translated those programs into each language on our shortlist. We kept a simple score board on how well these features were represented in each language: +1 if the feature is supported or can be implemented, 0 if the lack of the feature can be worked around, and -1 if it can’t. This is what came out of that evaluation:

Test Free Pascal Ada D
Arrays beginning at arbitrary indices +1 +1 +1
Sets 0 0 +1
Schema types 0 0 +1
Types with custom initial values -1 0 +1
Classes +1 +1 +1
Casts +1 +1 +1
Protection against use of dangling pointers -1 +1 +1
Thread safe memory [de]allocation +1 +1 +1
Calling into Windows API +1 +1 +1
Forwarding Windows callbacks to nested functions +1 +1 +1
Speed of calculations +1 +1 +1
Calling procedures written in assembly +1 0 +1
Calling procedures in a DLL +1 +1 +1
Binary compatibility of strings 0 +1 +1
Binary compatible file i/o -1 0 0
**Score** **6** **10** **14**

So, Free Pascal is the only candidate with negative scores, Ada positions itself in the middle, and D achieves an almost perfect score. Not effortlessly, though; we’ll talk about some of the technical challenges later. Because Free Pascal is, like D, fully Open Source and written in itself, extending the language and filling in the gaps is theoretically possible. Although some of its deficiencies could certainly be resolved that way, others would be quite complicated and/or unlikely to be accepted upstream.

We also estimated the productivity of the languages. Free Pascal scored high because it is closest to what we are used to. Despite its dissimilar syntax, D scored high because of its expressiveness and flexibility. Ada scored lowest because of its rigidity and because of the extra work the programmer has to put in (most importantly casts and conversions). Ada is more verbose than Pascal which was disliked by some of us because it can somewhat obscure the essence of what a piece of code tries to express, and frequently the code became not only verbose but cryptic, which was unanimously disliked.

Third, we estimated the future prospects and the advantages each language could bring to the table. Although Free Pascal has a more active community than we expected it to have, we do not see great potential for growth. Ada is renowned for its support for writing reliable code (although it has no monopoly in that field) but it does come at a cost and requires real effort. D has a dynamic and open community, supports both script-like productivity and high performance, includes various features for writing reliable software (approaching Ada but at a much lower cost), and offers some unique advanced features with which wonders can be accomplished.

Finally, we estimated the effort of translation. Although Free Pascal is very similar to Extended Pascal, missing features pose a real problem and would require a high degree of manual translation and rewriting. Although p2ada exists, it only works partially in our case and does not fully support Extended Pascal. Because Ada frequently requires additional code (casting to the correct type, pulling in a package, instantiating a generic type, adding a pragma, splitting up Put_Lines etc.), writing or extending a reliable transpiler into Ada would be more difficult than doing the same into D.

Selecting a winner

I gave away the winner in the title, but we landed at that conclusion as follows. Ada was the first language to be dropped. We really felt that the extra work that the programmer has to put in is a brake on productivity and creativity. Although it barely played a role in our evaluation, illustrative is the difference between the Ada and D equivalents to the Expressive C++17 Challenge. The D solution is both concise and expressive, the Ada solution is hardly expressive and consists of more lines than I want to write or read. Also of secondary importance, but difficult to ignore, is the difference between the communities surrounding the languages, which in Ada’s case is AdaCore Support, who has no problems demanding annual five-figure subscription fees.

Although akin to our current language, Free Pascal was mainly dropped due to its porting challenges and our estimation that its potential is lower and its future outlook is less optimistic than that of D. If we were to choose Free Pascal, we would basically invest a lot of effort only to arrive at a technological solution that we felt would be of lower quality than we currently have.

And that’s were I saw a dream come true: A clap on the table by the company founder and it was decided to commit to the effort of bringing twenty-five years worth of Extended Pascal code to D!

What makes a difference

In short, my experience is that if a feature is not present in the language, D is powerful enough that the feature can be implemented in a library. Translating each sample program by hand has really helped to focus on replicating functionality, leaving the translation process for later concern. This has led to writing a compatibility library with types and functions that are vital for the conversion. Now that equivalents are known and the parser is done, I just have to implement code generation.

Below follows another example that currently translates automatically and executes identically. It iterates over a fixed length array running from 2 to 20 inclusive, fills it with values, prints the memory footprint and writes it to binary file:

program arraybase(input,output);

type t = array[2..20] of integer;
var a : t;
    n : integer;
    f : bindable file of t;

begin
  for n := 2 to 20 do
    a[n] := n;
  writeln('Size of t in bytes is ',sizeof(a):1); { 76 }
  if openwrite(f,'array.dat') then
    begin
      write(f,a);
      close(f);
    end;
end.

Transpiled to D (or should I say Dascal?) and post-processed by dfmt to fix up formatting:

import epcompat;
import std.stdio;

// Program name: arraybase
alias t = StaticArray!(int, 2, 20);

t a;
int n;
Bindable!t f;

void main(string[] args)
{
    for (n = 2; n <= 20; n++)
        a[n] = n;
    writeln("Size of t in bytes is ", a.sizeof); // 76
    if (openwrite(f, "array.dat"))
    {
        epcompat.write(f, a);
        close(f);
    }
}

Of course this is by no means idiomatic D, but the fact that it is recognizable and readable is nice, especially for my colleagues who will have to go through an unusual transition. By the way, did you notice that code comments are preserved?

One very-nice-to-have feature is binary file compatibility; In fact it may have been the killer feature, without which D might not have been so victorious. The case is that whenever a persistent data structure is extended in our software, we make sure that we can still read and convert that structure from its prior format. That way, if a client pulls out an old design from its archives and runs it through our current software, it will still work without the user even being aware that conversion occurs, possibly in multiple steps. Not having to give up that ability is very attractive.

But it wasn’t easy to get there. The main difficulty is the difference in how strings are represented in D and the Prospero implementation of Extended Pascal, in memory and on file. This presented the challenge of how to preserve binary compatibility in file I/O with data structures that contain string members.

Strings

In Prospero Extended Pascal, strings are implemented as a schema type, which is a parameterized type that can be used in the following ways:

type string80 = string(80);
var str1 : string80;
    str2 : string(60);
procedure foo(s : string); This defines `string80` to be an alias for a string type discriminated to have a capacity of 80 characters. Discriminated string variables, like `str1` and `str2`, can be passed to functions and procedures that take undiscriminated strings as arguments, like `foo`, which thereby work on strings of any capacity. In memory, `str1` is laid out as a sequence of 80 `char`s, followed by a `ushort` that encodes the length of the string. I say _encodes_ because a shorter string is padded with `\0`s up to the capacity and the `ushort` actually contains the length of that padding. This way, when a pointer to the string is passed to a C function and the contents of the string occupy its full capacity, the `0` in the padding length doubles as the terminating `\0` of the C string.

My first thought was to mimic this data representation with a D template. But that would require procedures like foo to be turned into templates as well, which would escalate horribly into template bloat, a problem with multiple string arguments and argument ordering, and would complicate translation. Besides, schema types can also be discriminated at run time, which does not translate to a template.

Could some sort of inheritance scheme be the solution? Not really, because instances of D classes live on the heap, so a string embedded in a struct would just be a pointer instead of the char array and ushort.

But binary layout is actually only relevant in files, and in a stroke of insight I realized that this must be why user-defined attributes, or UDAs, exist. If I annotate the string with the correct capacity for file I/O, then I can just use native D strings everywhere, which genuinely must be the best possible translation and solves the function argument issue. Annotation can be done with an instance of a struct like

struct EPString
{
    ushort capacity;
}

The above Pascal snippet then translates to D like so:

@EPString(80) struct string80 { string _; alias _ this; }
string80 str1;
@EPString(60) string str2;
void foo(string s);

Notice how the string80 alias is translated into the slightly convoluted struct instead of a normal D alias, which would have looked like

@EPString(80) alias string80 = string;
</code> Although that compiles, there is no way to retrieve the UDA in that case because plain `alias` does not introduce a symbol. Then `hasUDA!(typeof(str1), EPString)` would have been equivalent to `hasUDA!(string, EPString)` which evaluates to `false`. By using the `struct`, `string80` is a symbol so `typeof(str1)` gives `string80`, and `hasUDA!(string80, EPString)` evaluates to `true` in this example.

There is one side effect that we will have to learn to accept, and that is that taking a slice of a string does not produce the same result in D as it does in Extended Pascal. That is because string indices start at 1 in Extended Pascal and at 0 in D. My strategy is to eliminate slices from the source and replace them with a call to the standard substr function, which I can implement with index correction. Finding all string slices can be accomplished with a switch in the transpiler that makes it insert a static if to test if the slice is being taken on a string, and abort compilation if it is. (Arrays are transpiled into a custom array type that handles slices and indices compatibly with Extended Pascal.)

Binary compatible file I/O

Now, to write structs to file and handle any embedded @EPString()-annotated strings specially, we can use compile-time introspection in an overload to toFile that acts on structs as shown below. I have left out handling of aliased strings for clarity, as well as shortstring, which is a legacy string type with yet a different binary format.

void toFile(S)(S s, File f) if (is(S == struct))
{
    import std.traits;
    static if (!hasIndirections!S)
        f.lockingBinaryWriter.put(s);
    else
        // TODO unions
        foreach(field; FieldNameTuple!S)
        {
            // If the member has itself a toFile method, call it.
            static if (hasMember!(typeof(__traits(getMember, s, field)), "toFile") &&
                       __traits(compiles, __traits(getMember, s, field).toFile(f)))
                __traits(getMember, s, field).toFile(f);
            // If the member is a struct, recurse.
            else static if (is(typeof(__traits(getMember, s, field)) == struct))
                toFile(__traits(getMember, s, field), f);
            // Treat strings specially.
            else static if (is(typeof(__traits(getMember, s, field)) == string))
            {
                // Look for a UDA on the member string.
                static if (hasUDA!(__traits(getMember, s, field), EPString))
                {
                    enum capacity = getUDAs!(__traits(getMember, s, field), EPString)[0].capacity;
                    static assert(capacity > 0);
                    writeAsEPString(__traits(getMember, s, field), capacity, f);
                }
                else static assert(false, `Need an @EPString(n) in front of ` ~ fullyQualifiedName!S ~ `.` ~ field );
            }
            // Just write other data members.
            else static if(!isFunction!(__traits(getMember, s, field)))
                f.lockingBinaryWriter.put(__traits(getMember, s, field));
        }
}

At the time of writing, I still have work to do for unions, which are used in the translation of variant records (including considering the use of one of the seven existing library solutions 1, 2, 3, 4, 5, 6, 7).

Currently, detecting unions is a bit involved . Also, there is a complication in the determination of the size of a union when the largest variant contains strings: the D version of that variant may not be the largest because D strings are just slices. I’ll probably work around this by adding a dummy variant that is a fixed size array of bytes to force the size of the union to be compatible with Extended Pascal. This is the reason why D scored a mere 0 in file format compatibility. It is amazing what D allows you to do though, so I may be able to do all of that automatically and award D a perfect score retroactively. On the other hand, it is probably easiest to just add the dummy variant in the Pascal source at the few places where it matters and be done with it.

The way forward

Obviously, this is long term planning. It has taken years to grow into D; it will possibly take a year, and probably longer, to migrate to D. Unless others turn up who are in the same boat as us (please contribute!) it’ll be me who has to row this ship to D-land and I still have my regular duties to attend to. My colleagues will continue to develop in Extended Pascal as usual, and once my transpiler is able to translate all or almost all of it, we will make the switch to D overnight. From then on, we’ll be in it for the long run. We trust to be with D and D to be with us for decades to come!

DasBetterC: Converting make.c to D

Jun 11, 2018 • WalterBright • #BetterC, #Code, #Core Team

Walter Bright is the BDFL of the D Programming Language and founder of Digital Mars. He has decades of experience implementing compilers and interpreters for multiple languages, including Zortech C++, the first native C++ compiler. He also created Empire, the Wargame of the Century. This post is the third in a series about D’s BetterC mode


D as BetterC (a.k.a. DasBetterC) is a way to upgrade existing C projects to D in an incremental manner. This article shows a step-by-step process of converting a non-trivial C project to D and deals with common issues that crop up.

While the dmd D compiler front end has already been converted to D, it’s such a large project that it can be hard to see just what was involved. I needed to find a smaller, more modest project that can be easily understood in its entirety, yet is not a contrived example.

The old make program I wrote for the Datalight C compiler in the early 1980’s came to mind. It’s a real implementation of the classic make program that’s been in constant use since the early 80’s. It’s written in pre-Standard C, has been ported from system to system, and is a remarkably compact 1961 lines of code, including comments. It is still in regular use today.

Here’s the make manual, and the source code. The executable size for make.exe is 49,692 bytes and the last modification date was Aug 19, 2012.

The Evil Plan is:

  1. Minimize diffs between the C and D versions. This is so that if the programs behave differently, it is far easier to figure out the source of the difference.

  2. No attempt will be made to fix or improve the C code during translation. This is also in the service of (1).

  3. No attempt will be made to refactor the code. Again, see (1).

  4. Duplicate the behavior of the C program as exactly and as much as possible, bugs and all.

  5. Do whatever is necessary as needed in the service of (4).

Once that is completed, only then is it time to fix, refactor, clean up, etc.

Spoiler Alert!

The completed conversion. The resulting executable is 52,252 bytes (quite comparable to the original 49,692). I haven’t analyzed the increment in size, but it is likely due to instantiations of the NEWOBJ template (a macro in the C version), and changes in the DMC runtime library since 2012.

Step By Step

Here are the differences between the C and D versions. It’s 664 out of 1961 lines, about a third, which looks like a lot, but I hope to convince you that nearly all of it is trivial.

The #include files are replaced by corresponding D imports, such as replacing #include <stdio.h> with import core.stdc.stdio;. Unfortunately, some of the #include files are specific to Digital Mars C, and D versions do not exist (I need to fix that). To not let that stop the project, I simply included the relevant declarations in lines 29 to 64. (See the documentation for the import declaration.)

#if _WIN32 is replaced with version (Windows). (See the documentation for the version condition and predefined versions.)

extern (C): marks the remainder of the declarations in the file as compatible with C. (See the documentation for the linkage attribute.)

A global search/replace changes uses of the debug1, debug2 and debug3 macros to debug printf. In general, #ifdef DEBUG preprocessor directives are replaced with debug conditional compilation. (See the documentation for the debug statement.)

/* Delete these old C macro definitions...
#ifdef DEBUG
-#define debug1(a)       printf(a)
-#define debug2(a,b)     printf(a,b)
-#define debug3(a,b,c)   printf(a,b,c)
-#else
-#define debug1(a)
-#define debug2(a,b)
-#define debug3(a,b,c)
-#endif
*/

// And replace their usage with the debug statement
// debug2("Returning x%lx\n",datetime);
debug printf("Returning x%lx\n",datetime);

The TRUE, FALSE and NULL macros are search/replaced with true, false, and null.

The ESC macro is replaced by a manifest constant. (See the documentation for manifest constants.)

// #define ESC     '!'
enum ESC =      '!';

The NEWOBJ macro is replaced with a template function.

// #define NEWOBJ(type)    ((type *) mem_calloc(sizeof(type)))
type* NEWOBJ(type)() { return cast(type*) mem_calloc(type.sizeof); }

The filenamecmp macro is replaced with a function.

Support for obsolete platforms is removed.

Global variables in D are placed by default into thread-local storage (TLS). But since make is a single-threaded program, they can be inserted into global storage with the __gshared storage class. (See the documentation for the __gshared attribute.)

// int CMDLINELEN;
__gshared int CMDLINELEN D doesn’t have a separate struct tag name space, so the typedefs are not necessary. An [`alias`](https://github.com/DigitalMars/Compiler/commit/a25bee1ba3a3b5231fd80aa1b24aa18ce3eb3a34#diff-14c44b49b9f46b1aeab33a6684214e55R113) can be used instead. (See the documentation [for `alias` declarations](https://dlang.org/spec/declaration.html#alias).) Also, [`struct`](https://github.com/DigitalMars/Compiler/commit/a25bee1ba3a3b5231fd80aa1b24aa18ce3eb3a34#diff-14c44b49b9f46b1aeab33a6684214e55R131) is omitted from variable declarations.
/*
typedef struct FILENODE
        {       char            *name,genext[EXTMAX+1];
                char            dblcln;
                char            expanding;
                time_t          time;
                filelist        *dep;
                struct RULE     *frule;
                struct FILENODE *next;
        } filenode;
*/
struct FILENODE
{
        char            *name;
        char[EXTMAX1]  genext;
        char            dblcln;
        char            expanding;
        time_t          time;
        filelist        *dep;
        RULE            *frule;
        FILENODE        *next;
}

alias filenode = FILENODE;

macro is a keyword in D, so we’ll just use MACRO instead.

Grouping together multiple pointer declarations is not allowed in D, use this instead:

// char *name,*text;
// In D, the * is part of the type and 
// applies to each symbol in the declaration.
char* name, text; [C array declarations](https://github.com/DigitalMars/Compiler/commit/a25bee1ba3a3b5231fd80aa1b24aa18ce3eb3a34#diff-14c44b49b9f46b1aeab33a6684214e55L111) are transformed to [D array declarations](https://github.com/DigitalMars/Compiler/commit/a25bee1ba3a3b5231fd80aa1b24aa18ce3eb3a34#diff-14c44b49b9f46b1aeab33a6684214e55R131). (See the documentation [for D’s declaration syntax](https://dlang.org/spec/declaration.html#declaration_syntax).)

// char            *name,genext[EXTMAX+1];
char            *name;
char[EXTMAX+1]  genext; [`static`](https://github.com/DigitalMars/Compiler/commit/a25bee1ba3a3b5231fd80aa1b24aa18ce3eb3a34#diff-14c44b49b9f46b1aeab33a6684214e55L159) has no meaning at module scope in D. `static` globals in C are equivalent to `private` module-scope variables in D, but that doesn’t really matter when the module is never imported anywhere. They still need to be `__gshared` and that can be [applied to an entire block of declarations](https://github.com/DigitalMars/Compiler/commit/a25bee1ba3a3b5231fd80aa1b24aa18ce3eb3a34#diff-14c44b49b9f46b1aeab33a6684214e55R191). (See the documentation [for the `static` attribute](https://dlang.org/spec/attribute.html#static))
/*
static ignore_errors = FALSE;
static execute = TRUE;
static gag = FALSE;
static touchem = FALSE;
static debug = FALSE;
static list_lines = FALSE;
static usebuiltin = TRUE;
static print = FALSE;
...
*/

__gshared
{
    bool ignore_errors = false;
    bool execute = true;
    bool gag = false;
    bool touchem = false;
    bool xdebug = false;
    bool list_lines = false;
    bool usebuiltin = true;
    bool print = false;
    ...
}

Forward reference declarations for functions are not necessary in D. Functions defined in a module can be called at any point in the same module, before or after their definition.

Wildcard expansion doesn’t have much meaning to a make program.

Function parameters declared with array syntax are pointers in reality, and are declared as pointers in D.

// int cdecl main(int argc,char *argv[])
int main(int argc,char** argv) [`mem_init()` expands to nothing](https://github.com/DigitalMars/Compiler/commit/a25bee1ba3a3b5231fd80aa1b24aa18ce3eb3a34#diff-14c44b49b9f46b1aeab33a6684214e55L248) and we previously removed the macro.

C code can play fast and loose with arguments to functions, D demands that function prototypes be respected.

void cmderr(const char* format, const char* arg) {...}

// cmderr("can't expand response file\n");
cmderr("can't expand response file\n", null);

Global search/replace C’s arrow operator (->) with the dot operator (.), as member access in D is uniform.

Replace conditional compilation directives with D’s version.

/*
 #if TERMCODE
    ...
 #endif
*/
    version (TERMCODE)
    {
        ...
    }

The lack of function prototypes shows the age of this code. D requires proper prototypes.

// doswitch(p)
// char *p;
void doswitch(char* p) [`debug` is a D keyword](https://github.com/DigitalMars/Compiler/commit/a25bee1ba3a3b5231fd80aa1b24aa18ce3eb3a34#diff-14c44b49b9f46b1aeab33a6684214e55L335). Rename it to `xdebug`.

The \n\ line endings for C multiline string literals are not necessary in D.

Comment out unused code using D’s /+ +/ nesting block comments. (See the documentation for line, block and nesting block comments.)

static if can replace many uses of #if. (See the documentation for the static if condition.)

Decay of arrays to pointers is not automatic in D, use .ptr.

// utime(name,timep);
utime(name,timep.ptr); [Use `const` for C-style strings](https://github.com/DigitalMars/Compiler/commit/a25bee1ba3a3b5231fd80aa1b24aa18ce3eb3a34#diff-14c44b49b9f46b1aeab33a6684214e55L658) derived from string literals in D, because D won’t allow taking mutable pointers to string literals. (See the documentation [for `const` and `immutable`](https://dlang.org/spec/const3.html#const_and_immutable).)

// linelist **readmakefile(char *makefile,linelist **rl)
linelist **readmakefile(const char *makefile,linelist **rl) [`void*` cannot be implicitly cast to `char*`](https://github.com/DigitalMars/Compiler/commit/a25bee1ba3a3b5231fd80aa1b24aa18ce3eb3a34#diff-14c44b49b9f46b1aeab33a6684214e55R920). Make it explicit.
// buf = mem_realloc(buf,bufmax);
buf = cast(char*)mem_realloc(buf,bufmax);

Replace unsigned with uint.

inout can be used to transfer the “const-ness” of a function from its argument to its return value. If the parameter is const, so will be the return value. If the parameter is not const, neither will be the return value. (See the documentation for inout functions.)

// char *skipspace(p) {...}
inout(char) *skipspace(inout(char)* p) {...}

arraysize can be replaced with the .length property of arrays. (See the documentation for array properties.)

// useCOMMAND  |= inarray(p,builtin,arraysize(builtin));
useCOMMAND  |= inarray(p,builtin.ptr,builtin.length) String literals are immutable, so it is necessary to [replace mutable ones with a stack allocated array](https://github.com/DigitalMars/Compiler/commit/a25bee1ba3a3b5231fd80aa1b24aa18ce3eb3a34#diff-14c44b49b9f46b1aeab33a6684214e55L1683). (See the documentation [for string literals](https://dlang.org/spec/lex.html#string_literals).)

// static char envname[] = "@_CMDLINE";
char[10] envname = "@_CMDLINE"; [`.sizeof` replaces C’s `sizeof()`](https://github.com/DigitalMars/Compiler/commit/a25bee1ba3a3b5231fd80aa1b24aa18ce3eb3a34#diff-14c44b49b9f46b1aeab33a6684214e55L1688). (See the documentation [for the `.sizeof` property](https://dlang.org/spec/property.html#sizeof)).
// q = (char *) mem_calloc(sizeof(envname) + len);
q = cast(char *) mem_calloc(envname.sizeof + len);

Don’t care about old versions of Windows.

Replace ancient C usage of char * with void*.

And that wraps up the changes! See, not so bad. I didn’t set a timer, but I doubt this took more than an hour, including debugging a couple errors I made in the process.

This leaves the file man.c, which is used to open the browser on the make manual page when the -man switch is given. Fortunately, this was already ported to D, so we can just copy that code.

Building make is so easy it doesn’t even need a makefile:

\dmd2.079\windows\bin\dmd make.d dman.d -O -release -betterC -I. -I\dmd2.079\src\druntime\import\ shell32.lib ## Summary

We’ve stuck to the Evil Plan of translating a non-trivial old school C program to D, and thereby were able to do it quickly and get it working correctly. An equivalent executable was generated.

The issues encountered are typical and easily dealt with:

  • Replacement of #include with import

  • Lack of D versions of #include files

  • Global search/replace of things like ->

  • Replacement of preprocessor macros with:

* manifest constants

 	
* simple templates

 	
* functions

 	
* version declarations

 	
* debug declarations
  • Handling identifiers that are D keywords

  • Replacement of C style declarations of pointers and arrays

  • Unnecessary forward references

  • More stringent typing enforcement

  • Array handling

  • Replacing C basic types with D types

None of the following was necessary:

  • Reorganizing the code

  • Changing data or control structures

  • Changing the flow of the program

  • Changing how the program works

  • Changing memory management

Future

Now that it is in DasBetterC, there are lots of modern programming features available to improve the code:

Action

Let us know over at the D Forum how your DasBetterC project is coming along!

Driving Continuous Improvement in D

Jun 2, 2018 • JackStouffer • #Code, #Compilers & Tools

Jack Stouffer is a member of the Phobos team and a contributor to dlang.org. You can check out more of his writing on his blog.


In my previous article, I went over the techniques we use in the D Standard Library (a.k.a Phobos) to develop a wide variety of testing mechanisms. I also briefly mentioned our style checker, dscanner. In this article, I’ll detail how we use dscanner to prevent style, documentation, and best practices regressions in the code, none of which can be covered by standard unit tests.

Keeping a level of quality in software projects is a neverending battle. Bad practices and shortsighted design decisions make their way into code over time, whether from poor oversight, rushing things through, or simple code rot. There are three things we do in Phobos, other than tests, to fight this entropy.

First, Pull Requests are required to be about one thing and one thing only. What counts as “one thing” is subjective, but, for example, a PR to fix a bug in a specific piece of code mustn’t also edit the documentation of that function. This allows Phobos reviewers to merge the documentation change quickly if there’s an issue in the bug fix preventing it from being merged (or vice versa). At the same time, it keeps the reviewers focused on one set of changes.

Second, PRs need to be small; ideally less than 100 lines of code changed. If that’s not possible, they need to be broken into multiple commits of smaller changes. This really helps reviewers keep D best practices in mind, while also fully understanding and internalizing the new code.

Third, we continuously improve the existing code with dscanner, which, among other things, is D’s official linting tool.

What dscanner Can Do

As an example of the checks that dscanner can provide, let’s take a look at some code that contains a very hard-to-spot bug. The following code creates a type that mimics an int, but allows a null state:

struct NullableInt
{
    private int value;
    bool isNull;

    int get()
    {
        assert(!isNull, "can't get a null");
        return value;
    }

    void nullify() { isNull = true; }

    bool opEquals(T rhs) // D's equality overloading function
    {
        if (isNull && rhs.isNull)
            return true;
        if (isNull != rhs.isNull)
            return false;
        return value == rhs.value;
    }
}

The bug is not in the code that’s there, it’s in the code that’s not there. All structs in D have default versions of the standard operator overloading functions if they aren’t defined by the user. One of those functions provides a hash to represent the value to use in D’s built-in associative arrays. The default version uses all the type fields to make the hash, which is a problem for NullableInt, as we’ve decided that all instances of the type that are null are equal. Here’s an illustration of the bug:

void main()
{
    auto a = NullableInt(10, false);
    auto b = NullableInt(10, false);
    auto c = NullableInt(10, true);
    auto d = NullableInt(0, true);

    assert(a == b);
    assert(b != c);
    assert(c == d);

    import core.internal.hash : hashOf; // D's internal hashing function
    assert(c.hashOf != d.hashOf);
}

dscanner will emit a message every time it finds a type that defines a custom opEquals, but doesn’t define a custom toHash as well, alerting us to this bug.

Continuous Improvement and “Ratcheting” Quality

dscanner ties into continuous improvement, the philosophy that improvements to processes or designs should be made in a periodic, timely manner, rather than as one-time “breakthroughs”. Such large breakthrough changes tend to be pushed back infinitely; in a phrase, it’s letting the perfect be the enemy of the good. This easily fits with the continuous delivery or rapid release philosophies, and can vastly reduce bugs in software given enough time.

By using the warnings from dscanner, we can ratchet Phobos’s quality over time. If you’ve never used a socket wrench, a ratchet is a mechanism that allows the user to freely move the wrench back and forth, but allows the bolt to spin only when the wrench moves clockwise. Similarly, we can move the quality of Phobos forward, while not letting it ever slip backwards, with dscanner. It works as follows:

  • Run dscanner with every check but one turned off on all files.

  • Populate a black list for that check in dscanner’s configuration file containing each of the files that issued a warning.

  • Repeat until the current code passes with all checks turned on with the black-list enabled.

Once we have our list-of-blacklists, new PRs will trigger warnings for issues detected by dscanner for files that weren’t included in the blacklists, thereby keeping the status quo of quality.

Next, we ratchet quality by making a PR that fixes one issue in one file, and then removing that blacklist entry from the configuration. This way, that file will be checked in the future for every new PR. Over time, quality issues will be removed from Phobos, and they won’t crop up again. For example, from the release of 2.076 to now, Phobos has gone from 624 public symbols without examples, to 328.

Currently, we’re using this ratchet technique with dscanner to

  • remove unused variables.

  • add const or immutable to variables that aren’t modified after their construction.

  • make sure every public symbol is documented.

  • make sure every symbol in Phobos that has documentation, has a code example in that documentation.

  • force every assert to have an error message to print in case it fails.

  • make sure only Exceptions, and not Errors, are caught in try/catch blocks.

among other things.

This process also has the added benefit of dogfooding dscanner, which helps us find bugs and know which features would be helpful to add from a user perspective. If you have a project that isn’t using a linting tool as part of your test suite, it’s only a matter of time before code rot creeps in.