Tracking down the origins of slow SQL code in WordPress

I’ve been optimising a WP All Import job for a client with 600,000 products and while doing so I was hitting refresh on SHOW FULL PROCESSLIST to catch queries which take too long to run. I kept seeing the following query:

SELECT status, COUNT( * ) FROM wp_posts WHERE post_type = 'product' GROUP BY post_status;

This might look like a simple query, but what that query does is count all the statuses for all of your products. Statuses are strings like ‘publish’, ‘pending’, ‘draft’, ‘trash’. There is no real filter on this query other than post_type = ‘product’ so this query is almost certainly going to perform a table scan. Possibly, depending on the statistics on your table, this query could perform an index scan rather than a table scan, but still this is a big query.

The type_status_date index is probably going to be scanned by this query, but still on large WooCommerce stores it involves reading most of the data from your wp_posts table

This query gets slower and slower the larger your database becomes. In this case, my client with 600,000 products is seeing a 30 second query time added to most wp-admin pages.

How slow this aggregate query is from a cold cache – 30 seconds for a 600,000 product store

Finding the culprit

Firstly, I performed a grep on the codebase:

grep "select post_status, count( * )" . -r

This highlighted the function created in wp_count_posts in wp_includes/post.php and called by MANY WooCommerce functions.

wp_count_posts is defined in wp-includes.post.php and referenced in quite a few plugins

If you look inside the wp_count_posts function, you can see that it does try to cache the results of this query and fetch the result from the cache the next time it runs. However, if you are running an import job then this cache will be getting busted constantly so the cache will never be used.

Depending on which importing tool you are using, probably it runs imports in batchs of X number of rows. At the end of this number of rows, a full WP PHP script reload will probably occur, at which point, any function which calls this wp_count_rows will be doing so against an invalidated cache.

Knowing the likely culprit of the function which is generating this SQL is one thing, but discovering for sure which functions are calling it is another thing. We need to be sure in order to optimise, so I have this little pattern I use when I don’t have access to xDebug to place breakpoints inside functions.

$e = new Exception;
error_log('wp_count_posts callstack');
error_log(print_r($e->getTraceAsString(), true));

What the code above does is output a stack trace to our debug.log file. The stack trace is great for showing us the sequence of events which lead to this specific code point being reached.

Depending on the scenario, I may put the code inside a conditional to check for a specific piece of data and avoid flooding the debug.log with too much irrelevant data. In this case I just want to get the trace into debug.log when it counts statuses for products. The altered code ended up looking like this:

Adding my stack trace debug code to wp_count_posts

Now continuing the import job, I get a few of these kind of entries in my debug.log file:

Output of getTraceAsString which is very useful to help identify how we reached this point of the code, particularly when we don’t have access to xDebug

What that tells you, at #1, is the last function prior to wp_count_posts was WC_Install::is_new_install(). The code for that function is located in woocommerce/includes/class-wc-install.php and looks like this:

    public static function is_new_install() {
        $product_count = array_sum( (array) wp_count_posts( 'product' ) );
        return is_null( get_option( 'woocommerce_version', null ) ) || ( 0 === $product_count && -1 === wc_get_page_id( 'shop' ) );
    }

Fixing the is_new_install() function

Two little lines of code and so much damage to larger stores. Two lines that do this:

  1. Count all the statuses for all the posts, regardless of which WooCommerce version number we are on, i.e. EVERY TIME this function is called from wp-admin.
  2. If the version is null OR there are 0 products and the shop page has not been set then return true, else return false

I’m going to let that sink in for a little bit. On pretty much EVERY page on wp-admin this function gets called by WooCommerce to check if we are dealing with a new installation. On every single fresh batch of our import job we are rerunning this code. Yes, there is code inside the function to cache the code, but still there are plenty of occasions when the cache gets busted – like adding a new product, or deleting a product. Here is how this code SHOULD look:

Rather than scanning the entire table just to check if a single product exists, there are many other ways we can code this to have the query return in 0.001 second

For your convenience, if you are suffering from this problem, here is the code snippet replacement for the is_new_install() function:

public static function is_new_install() {
	global $wpdb;
	// this query will give you either 0 or 1 depending on whether a product exists or not
	$product_exists = $wpdb->get_var("select CASE WHEN EXISTS (SELECT * from $wpdb->posts where post_type = 'product') THEN 1 ELSE 0 END as product_exists;");
	return is_null( get_option( 'woocommerce_version', null ) ) || ( !$product_exists && -1 === wc_get_page_id( 'shop' ) );
}

What the solution above does is read one single row. Regardless of how large your database becomes, it will read one single row to check if a single product exists. That one single row will be read from the index on post_type, so it doesn’t even touch the actual table. One single row versus your entire database means you can scale. On my foundthru store, this makes wp-admin load in under 1 second rather than 35 seconds. If you are not regularly importing or adding new products, you might not notice as much impact, but if you are adding products or performing imports this will help transform the speed of your imports by avoiding these index scans every time wp-admin is loaded.

It basically becomes infinitely faster to query in the way I’ve done versus the way WooCommerce check if a product exists

Another instance of wp_count_posts being misused

After fixing the above, I re-ran my imports and found another case of wp_count_posts being called. Again, the culprit is WooCommerce onboarding related, this time in the OnboardingTasks.php file.

The OnboardingTasks.php file also calls wp_count_posts just to check if there are more than 4 products

On this occasion, this function is actually deciding on a particular template to show and in this case, it may depend on whether the current user is admin or not.

Inside wp_count_posts, the post_status field may also be checked depending on user privileges

However, if you look at the function definition you will see that $perm defaults to an empty string and the calling function does not change this default, so post_status is not checked and the query is again identical to the one avoided above.

$perm is set to default in wp_count_posts and OnBoardingTasks does not override it when checking that at least 4 products exist

However, we do see that this line checks for the count against post_status = publish like so:

if ( $products->publish >= 4 ) {

So – to replace this in efficient fashion we need a query similar to this:

By performing a simple SELECT with LIMIT 5 and no ORDER BY statement and no GROUP BY statement we turn this 30 second query into a 0.001 second query

The above query fetches the first 5 published product IDs from the database. No order by, no table scan, just a very quick index seek which will always only read 5 rows maximum from that index. We get 0.001 second query time versus 30 second query time.

Adding this WooCommerce code optimsation to OnboardingTasks is fairly simple having investigated the underlying code

Measuring the impact of these optimisations

Whilst I started this investigation to speed up my clients imports, this fix also speeds up their wp-admin. For brand new imports we are seeing imports boosted to about 500 rows per minute rather than ……..

For wp-admin, the impact is also severe (severely good!) – previously while running imports, 4 seconds was added to every wp-admin page load.

4.5 seconds added to wp-admin page load because of wp_count_posts calls made by WooCommerce

Quit Moaning and become part of the solution

Normally, when I discover performance issues like this I add a fix to my plugin Scalability Pro to somehow fix or bypass the poor coding performed by other plugins or themes. But in this case, there really is no filter or action I can hook into to fix this.

You can see one other instance of wp_count_posts which needs to be fixed but there’s no real pull request possible for this. Because of this, I’m trying to fix this inside Scalability Pro and you can see the entire coverage of that in my article about the latest update for Scalability Pro.

Theoretically, I could – I could hook into the ‘query’ filter and check the contents of the stack trace and if this was called by is_new_install() I could rewrite the function, but performing a string search like this on every single database query is going to slow down the rest of the website.

Instead, I’ve submitted my pull request to WooCommerce in the hope that they apply this fix fairly quickly.

https://github.com/woocommerce/woocommerce/pull/35061

Hopefully we can move towards having a far more scalable ecosystem with WordPress and WooCommerce. If this pull request goes well, I’ll be submitting further pull requests in future.

In the meantime, if you want to take benefit of these performance boosts, the code is provided above – there are two files you need to edit to bring your wp-admin pages back to sub-second speed.

We will be happy to hear your thoughts

Leave a reply

Super Speedy Plugins
Logo