PHP: Removing blank strings from arrays and array_unique() - the right way
edited March 24 2018

array_filter() and array_unique() are broken in PHP and I don't think most people realize it. Let's take a look.

Using array_filter() to strip empty strings from arrays

You have an array, something like, array("hello", "", "world"), and you want to remove any empty strings from it. You didn't find anything in the PHP docs — because there are no PHP builtins to do this — and you're reluctant to write a function just to do this one simple thing, so maybe you went to Google and you came across this StackOverflow article:

Simply use array_filter(), which conveniently handles all this for you: print_r(array_filter($linksArray));

(from the top answer on that page, with at least 400-odd positive votes so far.)

What happens if you mix a few types in your array?

$ php -r "print_r(array_filter(array(0, 1, 2, true, false, '', 'hello', 'world')));"
Array
(
    [1] => 1
    [2] => 2
    [3] => 1
    [6] => hello
    [7] => world
)
    

array_filter() removed ''. It also removed 0 and false.

What if you can guarantee your array has nothing but strings in it? Maybe you're feeding the result from explode() into it?

$ php -r "print_r(array_filter(explode(' ', 'hello world, there are 0 php bugs today')));"
Array
(
    [0] => hello
    [1] => world,
    [2] => there
    [3] => are
    [5] => php
    [6] => bugs
    [7] => today
)
    

Whoopsie. array_filter() without a callback function will also remove the string "0". It will remove any value that PHP's loose type comparison will evaluate to "false". The array_filter() documentation is fairly explicit about this behavior, but the StackOverflow responses mostly missed it (and there are no warnings about this behavior anywhere near the top answer on that page).

I think this behavior should be considered wrong in any case, or at the very least there should be a STRICT flag like in_array() has, but that's not my fight to fight.

And no, don't use array_diff()

There are some suggestions floating around to use array_diff() in place of array_filter(). The documentation for array_diff() says, “Two elements are considered equal if and only if (string) $elem1 === (string) $elem2”. Seems reasonable. Let's see what happens:

$ php -r "print_r(array_diff(array(false, '', 0, '0', 1), array(false)));"
Array
(
    [2] => 0
    [3] => 0
    [4] => 1
)
    

Both false and '' were removed from the input array. The way I read the documentation, that means that array_diff() thinks false === '', I guess. I don't know. It doesn't make a whole lot of sense. Maybe it's time for a beer.

But wait, there's more: array_unique()

So PHP thinks that false and '' are equivalent in array_diff(), and false, 0, '0', and '' are equivalent in array_filter(). What about array_unique()?

The documentation for array_unique() says the same thing as array_diff(): “Two elements are considered equal if and only if (string) $elem1 === (string) $elem2”.

$ php -r "print_r(array_unique(array(false, '', 0, '0', 1, true)));"
Array
(
    [0] => 
    [2] => 0
    [4] => 1
)
    

false and '' are equivalent, but false and 0 are not, but 0 and '0' are, and 1 and true are.

And the coup de grâce: array_flip()

Since array_unique() is hilariously slow, there are all kinds of suggestions out there to use array_flip() instead:

$ php -r "print_r(array_keys(array_flip(array('a', 0, '0', 1, '1'))));"
Array
(
    [0] => a
    [1] => 0
    [2] => 1
)
    

array_flip() does loose type comparisons too. As the docs note, it also really hates arrays that have values that aren't a string or integer type:

$ php -r "print_r(array_keys(array_flip(array('a', 0, '0', 1, '1', true, false, array(1)))));"
PHP Warning:  array_flip(): Can only flip STRING and INTEGER values! in Command line code on line 1
PHP Warning:  array_flip(): Can only flip STRING and INTEGER values! in Command line code on line 1
PHP Warning:  array_flip(): Can only flip STRING and INTEGER values! in Command line code on line 1
Array
(
    [0] => a
    [1] => 0
    [2] => 1
)
    

Oh who cares, anyway?

I'm not a PHP hater. It's a programming language, all programming languages have warts. These particular functions are all dangerously broken, but big deal. Write your own:

function array_drop ($array, $filter_values, $unique = false)
{
    //  Remove matching values from an array and optionally make them unique.
    //  This replaces array_unique() and some uses of array_filter()
    //  or array_diff(). It is very careful to use strict type checking.
    //  This function isn't real fast. Don't use it unless necessary.
    //  array_drop() is only intended to work on sequential arrays at this time.
    //  array_drop() will preserve keys; if you want to reindex the array,
    //  just use array_values().
    //  array_drop() does not change the order of values in the input array.
    //  It would be slightly faster to preload $drop_elements with all known
    //  PHP types but that wouldn't be forward-compatible; PHP might add new
    //  types in the future or change the way that gettype() works.
    $drop_elements = array();
    $out_array = array();
    foreach ($filter_values as $filter_value)
    {
        //  $drop_elements gets elements added to it in buckets
        //  organized by the type of the value to be filtered.
        $value_type = gettype($filter_value);
        if ( ! array_key_exists($value_type, $drop_elements) ) $drop_elements[$value_type] = array();
        $drop_elements[$value_type][] = $filter_value;
    }
    //  in_array() becomes safe to use if it's only looking at values of identical types.
    if ( $unique )
    {
        foreach ($array as $key => $value)
        {
            $value_type = gettype($value);
            if ( ! array_key_exists($value_type, $drop_elements) ) $drop_elements[$value_type] = array();
            if ( ! in_array($value, $drop_elements[$value_type]) ) $drop_elements[$value_type][] = $out_array[$key] = $value;
        }
        return $out_array;
    }
    foreach ($array as $key => $value)
    {
        $value_type = gettype($value);
        if ( ! array_key_exists($value_type, $drop_elements) || ! in_array($value, $drop_elements[$value_type]) ) $out_array[$key] = $value;
    }
    return $out_array;
}
    

But I wonder how much broken code has been written that's followed misleading documentation and comments on programmer forums?