Nikolaus Rath's Website

MATLAB is a terrible programming language

I consider it fairly uncontroversial that, as a programming language, MATLAB is a terrible choice. However, I found out that to some people this isn't actually obvious at all - especially when their first exposure to programming was through MATLAB. Explaining why the MATLAB language is so bad isn't easy to do in a quick hallway conversation, so I wrote this blog post as a resource I can refer people to.

This post is inspired by Eevee's excellent PHP: A fractal of bad design blog post. While I wouldn't say that MATLAB is quite as bad as PHP, there are some interesting similarities. The MATLAB language was originally designed for numerical computation (like PHP was designed to insert small dynamic elements into mostly static HTML pages), but then kept gaining features that turn it into something closer to a general purpose programming language. And as for PHP, it is difficult to point at one specific thing and say that this is what makes it a bad language - it's more that there's a ton of small things that are all slightly wrong. Individually, none of them make the language bad, but taken together, they make writing and maintaining non-trivial MATLAB code a rather painful exercise.

Here comes the list:

Lack of Documentation

The first problem with the MATLAB language is that it is not documented. I am not talking about documentation of the functions that the language provides - there is plenty of that. I am talking about the syntax and semantics of the language itself. Every worthwhile general purpose programming language that I know has a description of the language, and a description of the standard library (though it may not be called that). MATLAB only has the latter. If you still don't know what I'm talking about maybe an example will help: the Python language is documented in the Python Language Reference, the Python standard library is documented in the Python Library Reference. Take a look at both, and you will (hopefully) note how the latter describes the functions that Python provides, while the former describes how to write Python code. In the MATLAB documentation, this part is missing. The only way to find out what is valid MATLAB code and what isn't is to look at other code or via trial & error. A language reference would answer questions like "where exactly can I write end"? From a first glance at MATLAB code, you can tell that end is used to signal the end of e.g. an if block (if foo; do_something(); end). On a second glance, you may notice that you can also use it to index into an array (my_numbers(end), and even my_numbers(end-1)). You may then conclude that you can also pass end as a function parameter (print_number(my_numbers, end)) or save it in a variable (idx = end). Will this work? There is no way to find out from the documentation.

Ambigious Syntax

What do you think happens in the following code: multiply(2)? You may be forgiven if you think it calls a function called multiply with a parameter of 2. You may also be forgiven if you think it returns the second element of the multiply vector. This is because you can't tell. To understand what this code does, you first have to find out how multiply is actually defined. On the other hand, MATLAB has a datatype called cell array (that is similar to e.g. a Python list) which you have to index with curly braces (bru{3}) and that gives an error when indexed with (). So there are two kinds of braces supported by MATLAB, and instead of using one kind for function calls and the other kind for indexing, one kind of brace is overloaded to mean both, and the second kind is (seemingly randomly) assigned to work with just some datatypes.

Counterintuitively limited syntax

Suppose I told you that get_numbers() calls a function that returns an array of numbers, and that bar(4) accesses the 4th element of the array bar, what do you think get_numbers()(4) will do? If you think it gives you the 4th element of the vector returned by the get_numbers() function, you think like me and you are wrong. It will actually give you an error. Instead, you have to first assign the result into a variable, and then index into it (tmp = get_numbers(); tmp(4)). The same applies to a variety of other operations - they don't work on arbitrary expressions, but only on some specific ones (e.g. just on plain variable names).

Function semantics are needlessly overloaded

Several MATLAB functions fulfill multiple, but completely unrelated purposes. For example, the exist function checks if its argument is either a variable declared in the current workspace, a file or directory in the current directory, a file with an extension known to MATLAB somewhere in the MATLAB search path, or a Java class - unless MATLAB is started with the --nojvm argument. If you want to check specifically if a file with a given name exists, you have to pass an extra parameter to exist telling it to look only for files and directories - but even then you have to take extra care because calling exist('myfile', 'file') still returns a true value if there is no myfile, but myfile.m exists somewhere in the search path. This is ridiculous. Why would you ever want to check if something is either a file or a builtin MATLAB function? This is just inviting trouble. There should be separate functions (e.g. file_exist, class_exist, var_exist) for separate purposes.

Everything is in the same namespace

MATLAB does not have namespaces, everything sits in the same global namespace. This means that nobody is able to remember what names are already in use by MATLAB, and which ones aren't. Predefined names are also all over the place, from short and obscure abbreviations like lqrd over common verbs like find and who to long descriptions like SimulinkRealTime. There is no convention you could follow to avoid accidentally redefining (or using) a name that MATLAB already uses for something else.

Parameter names are treated as strings

Many MATLAB functions accept parameters in the form function(arg1, arg2, 'NameOfArg3', arg3, 'NameOfArg4', arg4). In other words, parameter names are passed as parameters itself. This is fundamentally bad because parameter names are not strings and should not be treated as such.

No 1-D arrays

MATLAB does not have support for one-dimensional arrays (or lists, cell arrays, etc). This means that if you need to represent an ordered set of elements, you have to make an awkward decision between representing it as an 1xN or an Nx1 data structure. Even worse, the two variants are treated the same way in some situations, but differently in others. For example, you can use a single index (foo(3)) to index both an 1xN and an Nx1 array, but if you attempt to loop over it (for el=array), it will work with only 1xN arrays. To add insult to injury, some functions (like Simulink.sdi.getAllRunIDs) return 1xN arrays when they have something to return, but 0x1 arrays when the list is empty. To handle this correctly, you have to use incantations like for el=reshape(array, 1, []).

Cell Array Iteration is awkward

Another iteration problem comes about when iterating over cell arrays: instead of assigning the loop variable to each element of the array, the loop variable is assigned to a one-element cell array. So the following won't work:

data = { 'foo', 'bar', 'com' };
for el=data
    fprintf('Processing %s...\n', el);
end

Instead you have to index into the loop variable first:

data = { 'foo', 'bar', 'com' };
for el=data
    el = el{1};
    fprintf('Processing %s...\n', el);
end

Semicolon Changes Semantics

In MATLAB, the semicolon acts both as a statement terminator and to suppress printing of the evaluated expression. This is reasonable, but it turns out that sometimes whether an expression is printed determines how the expression behaves. Calling tg = slrt('barf'); will give you an object to communicate with the barf system. Calling tg = slrt('barf'), however, will also attempt to connect to the system - i.e. it may block or return a whole new class of errors.

Functions are too clever

Many MATLAB functions try to be particularly clever in anticipating the users needs. Unfortunately, that cleverness cannot be turned off when it is not wanted. For example, the delete function can be used to delete files from disk. (It may also be used to unset a variable or release memory, which is another example of pointless overloading of function semantics, but this is not what I complain about here). The problem is that if delete finds that the given filename contains an asterisk (*), it magically expands the asterisk and deletes all matching files. I have no doubt that this can be useful, but it is terrible when it cannot be switched off. If I have code that is intended to delete one file and I pass it a filename containing *, I expect it to delete a file with exactly that name. If filenames with * characters are not supported (Hello, Microsoft Windows) I expect the function to return an error, and not to silently delete other files.

No way to store static data

If you have a some static data that you'd like to use in multiple MATLAB files (eg. a mapping from error codes to textual descriptions), there is no direct way to do that. You can create a .m file that defines a variable, but you cannot get access to this variable from another file without terrible contortions (aka loading the file as a string and passing it to eval()). The only feasible workaround is to create a class that encapsulates your single variable, and then work with that class, or to define a function that re-creates the data on every call and then returns it.

Programmatic error handling is near impossible

This is one of my biggest gripes: programmatic handling errors in MATLAB in a reliable way is near impossible. There problems are:

  • The documentation does not contain any information about how a given MATLAB function may fail. The only way to find out is to somehow trigger all the error cases you can think of, and then call the function to figure out what exception it will throw (if any), or if it is going to return a special return value, or if it will silently do nothing.
  • There is no consistency in the failure modes. Some functions may throw a exception if the relevant file isn't found (eg. renamefile()), others may return a special value (eg. fopen()). In other words, there is no way to generalize the knowledge you have empirically gained.
  • When functions raise exceptions, the exception identifiers are way too broad to be useful. For example, any kind of problem with files (if it leads to an exception in the first place) gives you a generic I/O error. To distinguish between e.g. "file not found" or "permission denied" you have to parse the string representation of the error message.
  • Inexplicably, MATLAB's exception handling construct doesn't allow you to restrict the exceptions that you want to catch -- it's all or nothing. This means that every exception handling code first starts with an if statement to determine if this is actually an exception that should be handled, and has to re-throw the exception if not.

All in all, this means that "error-aware" code in MATLAB typically looks like this:

try
    res = do_something();
    if res == special_value1
       % Handle problem
    elseif res == special_value2
       % Handle problem
    end
catch exc
    if strcmp(exc.identifier, 'IOError') && \
       strfind(exc.msg, 'File not found')
       % Handle problem
    else
       exc.rethrow()
end

The amount of boilerplace here is incredible. Why can't this be written as eg.:

try
    res = do_something();
catch 'IOError:FileNotFound' as exc
    % Handle problem
catch 'OtherError1' as exc
    % Handle other problem
end % pass through all other problems

Programming

Comments