What is Crawlability in SEO

What is Crawlability in SEO? How to Boost Crawlability on a Startup Site

Pawel Grabowski

In this guide, you'll learn everything about crawlability in SEO. You'll discover why Google still hasn't crawled and indexed some of your web pages, and what you could do to speed the process up.

You know, even though it mightn't seem like it, crawlability is actually a pretty big deal in SEO.  

For one, it’s a common misconception that Google indexes all pages on the Internet.

It actually doesn't.

Far from it, actually.

Now, to be fair, in some cases, we consciously block parts of our websites from being indexed.

But equally often, we also unintentionally prevent search engines from accessing, crawling, analyzing, and, ultimately, ranking web pages.

The result is always the same. Those pages never enter the index and never appear in the SERP.

Here's a quick snapshot of indexing report from Google Search Console showing the fluctuations in indexing over time.

SEO Crawlability.

And here's another one showing the various reasons for so many pages being non indexed.

Another image showing crawlability in SEO.

Now, surely, some of these are intentional. Some pages are deliberately blocked from crawlers with the "noindex" directive.

But others seem more of a result of something being wrong.

For example, some pages haven't been indexed because of other 4xx issue. Others are duplicates. And there's a whole bunch of pages that Google has crawled but decided not to index.

Now, here's the important bit:

Issues with crawlability mightn’t be a massive problem for a huge site. A teeny tiny percentage of their thousands of pages isn’t getting crawled… Big deal. Most likely, they have hundreds of other, equally good pages that could rank for those keywords anyway.  

However, the situation is quite different for your startup. For you, every page matters. If some of those aren’t getting crawled, it means losing significant opportunities to build search visibility and generate organic growth.

Let me drive this idea home even further:

When some of your pages remain out of index, you lose more than just the opportunity to rank them.

  • These pages don't help you build topical authority, too. Google ignores them (or doesn't even know they exist) and can't build the image of your authority based on them.
  • You lose potential for internal linking and boosting the overall strength of your site, too. And that's connected to topical authority as well.
  • And needless to say, you give away precious clicks to others - most likely your competitors.  

That's why, in this guide, I decided to show you how avoid all that by minimizing the crawlability issues on your site.So.... let's go.

What is Crawlability in SEO

When we talk about crawlability in SEO, we mean the search engine’s ability to access and crawl content on the page.

Now, I admit, explaining crawlability like that doesn’t really make anything clear. So, let me explain further.

The whole issue of crawlability relates to the process of ranking a page.

To simplify it, for a page to rank, search engine crawlers need to do 4 things:

  1. They need to discover it first.
  2. Then, they need to crawl it. In practice, it means accessing the page and going through all of its content (and the code that makes it up, too.)  
  3. Based on that crawl, they understand the context and what keywords or search queries to rank that page for.
  4. Finally, with all that information at hand, they add it to the index to rank for relevant queries.

It’s only then, once all those four steps are done, that the page can appear in the search results, start gaining better rankings, and attract organic traffic.

And in theory, it seems like a relatively simple process, isn’t it?

In fact, it's quite logical to say that as long as your content goes live, the search engine crawler should have no problems accessing it, crawling it, understanding it, and indexing the page.

Well, unfortunately, that’s not always the case.

Quite often, many seemingly small and insignificant factors might prevent the crawler from even discovering your new content. Before I show you what they are, let’s discuss how crawlers find, access, and evaluate content in the first place.

How search engine crawlers work

The process is actually quite simple.

Firstly, crawlers follow links on pages.

Crawlers behave in pretty much the same thing you when you land on a page.

They use the navigation, as well as both internal and external links to find and access other content. Crawlers go from link to link and collect information about each page they encounter. This information, in turn, helps the search engine index those pages correctly and rank for relevant search queries.

This pretty much means that they can learn about the new content because you (or someone else, in a case of an external link) have linked to the page.

Crawlers also scan your sitemap for any new (or newly updated) pages.

In this case, you notify them of any changes to your content by placing relevant information in the sitemap (which, in most cases, happens automatically through your CMS.)  

Here's an image showing the sitemap where the crawler learned about the page.

Image of a sitemap.

The above process is exactly what we’re referring to as crawlability in SEO.

And it straight away suggests that if there’s any problem with links pointing to your new page (or there are no links pointing to a new page,) and the page isn't in the sitemap, crawlers will have a hard time accessing it.

Naturally, there are other issues too, so let's go through that.

What causes problems with crawlability?

We've covered issues with internal or external links already.

You know that if no link points to the page, crawlers might not be able to find it.

Similarly, if you don't have a sitemap, crawlers might not be able to learn about the new page.

But...

A similar thing will occur if the page is buried too deep in the site structure.

In this case, a crawler would have to “click” too many times, so to speak, just to get to it. And it might decide not do it. It might drop off along the way or go elsewhere, without ever reaching that page.

Broken redirects will also stop crawlers in its tracks.

So, if, at some point, you’ve redirected the page to a new URL, and either have an error (like a typo) in that redirect, or that new URL no longer exists, the crawler will hit a wall. And since it can’t go anywhere, it won’t.

The same happens if you have too many redirects.

Let’s say that you redirected one page to a new URL, then changed that URL and redirected it to a new one, and so on. In this case, the crawler might decide not to follow that chain of redirects. In other words, it will drop off as well and never reach the new page.

Image showing how a redirect chain affects crawlability in SEO.

Server errors can also cause crawlers to stop and go elsewhere.

If a page is inaccessible at the very moment the crawler tries to reach it, it'll go elsewhere and won't try again. At least not during this crawl.

There are situations where you might have deliberately blocked crawlers from accessing the page, and then forgot to remove those restrictions.

  • For example, you may have restricted crawling because the page was not ready. The page eventually went live but it’s still blocked with the "noindex" tag or in robots.txt file.
  • You may have been working on a new site template, and had it blocked from crawling, well, again, because it wasn’t live. And the same situation happens, it went live but the directive restricting crawlers from accessing it remained.
  • Or you may have blocked those pages by mistake. I’ve seen it happen too, particularly if you switch to a new SEO plugin in your CMS. It’s easy to just tweak the wrong setting, for example, and block public access to a page.
Crawlability and indexing setting in a CMS SEO plugin.

(A setting in the Yoast SEO Plugin controlling whether search engines are allowed to crawl the page.)

And finally, the crawler might have actually accessed the page but decided not to index it.

This is, probably, the most frustrating issue with crawlability. In this case, search engine knows about your page but for some reason, it doesn't consider it worthy of indexing.

There are actually two separate scenarios for this issue:

1/ The search engine learns about the page but decides not to crawl it.

You can tell which pages are affected by this by the "Discovered - Currently not indexed" error in the Google Search Console's Pages report.

Indexing issue.

This is, by far, the lesser issue of the two. In most cases, what happens is that the search engine actually wants to crawl and index the page, but decides to put it off. It's as if it were too busy to do it now, so it adds it to a list to do later. And, usually, that's exactly what happens. After a while, the search engine finally gets to crawling your page, and indexes it.

So, with this one, I'd recommend that you wait and let the search engine crawl the page naturally.

The other issue is more problematic.

2/ The search engine has crawled your page and decided it's not worthy of indexing.

Such pages are marked with the "Crawled - Currently not indexed" issue, and this one is actually quite big.

Another indexing issue caused by crawlability.

This issue suggests that something may be wrong with the page:

  • Either its content isn't up to the search engine's standards or
  • Another (usually technical) issue renders this content not worthy of indexing.

Unfortunately, Google rarely provides any indication of what the issue is so you need to do a little bit of investigative work to figure it out. There might be hundreds of such issues but, from experience, they usually center around things like this:

  • The page's content only repeats what others have said already. In this case, it really does make no sense for the page to rank.
  • The content is poorly written, or its readability is questionable.
  • You already have other pages that cover the same information. Again, it's quite logical that G decides not to clutter its index with your yet another page on the same topic.
  • The page is a duplicate of your or someone else's content.
  • The page has poor UX.

Review your page and be honest with yourself. Is this content really worth being in the index? Is it really original? Does it provide any meaningful information to users?

Remember, Google's (and other search engines') goal is to provide the most valuable information to users. So, they won't index anything that does not help them deliver on that objective.

How to Prevent Crawlability Issues on a Startup Site

There are several things that you can do:

1. Crawl your site regularly

This is, probably, the most important thing to do to manage crawlability (and one that's usually greatly overlooked.)

Many SEO platforms have a site audit capability. These audits regularly as they will identify any crawlability issues and fix them before they become a problem.

SEMrush, the platform I use, shows me if I have any orphaned pages (ones with no internal links pointing to them) in my site's taxonomy.

Site audit data showing crawlability issues.

And this is just one of the reports that can help me identify crawlability issues.

Website report showing SEO crawlability issues.

You can also use a dedicated site crawler and crawl any new page manually.

This can also confirm whether the content is accessible to search engines and if not, what might be causing the problem.

For example, this report from Screaming Frog, an amazing site crawler that I've been using since pretty much forever shows data for a particular single URL, along with crawlability information.

Site crawler data with crawlability and indexability issues.

2. Link internally to any new page

This is by far the simplest method to help crawlers learn about any new content.

But...

I admit that this might be difficult to do when you’re only starting to publish SEO content.

After all, you don’t have many pages where you could add links to and you shouldn't (I mean it!) clog your existing pages with hundreds of internal links. This would never look good to Google.

However, if you can, and if you have other content and such internal links would make sense on that content, place them.

Also, if you can, place a link on the homepage to any super important content that you've published.

This is particularly helpful for new sites. Homepage might be the most crawled page on your site, and so, having a link to that new content, even temporarily, will increase the chances of crawlers discovering it.

Again, I wouldn't recommend placing too many links on the homepage. But if you've published a major pillar page, for example, add a link to it from the footer. It will boost that page's authority, too, and will help you establish your startup in a relevant category.

Example, here's how I'm doing it on one of my client's site. We have a ton of content (I've been managing their SEO since 2019,) but we link only to the most important assets on the homepage.

Page with internal links to pillar content.

3. Keep your sitemap up to date

Crawlers use the sitemap to learn about new content too, after all.

So, having the page listed there might increase your chances of having it crawled.

The trouble is that many startup sites are static. I've found that especially if the company is run by a technical founder, the website tends to be hardcoded without any CMS in place.

This isn't a major problem, of course. But it means that there is no system in place to automatically update the sitemap.xml file every time you publish a new content.

So, if you are in that situation, remember to manually update the file every time a new page goes live.

Crawlers like Screaming Frog or Sitebulb offer the option to create a sitemap for a domain and download it as an .xml file.

4. Request manual crawl for problematic pages

Sometimes (and unfortunately, more often that it should be happening) you'll have pages remaining non crawled, and that's in spite of you doing everything that you should.

In such cases, you can request the search engine to crawl them.

You do that by first inspecting the URL in Google Search Console, and then, clicking the Request Indexing button.

Request indexing option in Google Search Console.

A quick note - Requesting indexing doesn't warrant immediate crawl. By clicking the button, you are only notifying the search engine that this URL has changed, and you recommend that it crawls it. But once you've requested indexing, you need to remain patient. The search engine crawler will do this in its own time. For the same reason, there is no point in requesting indexing more than once. It will have no effect on when your page gets crawled.

Something to remember about crawlability on a startup site

To close this guide off, I want to touch on one final issue. It's super important so please do not skip this section.

You see... even though you'll start publishing new content regularly, interlink to pages, etc. Google will be slow with crawling your site at first.

It will not visit it too often. It might not crawl the entire site during each crawl (and that's even though your site is small,) and it will have little trust in you at first.

You might see more pages stuck in the "crawled - currently not indexed" or "discovered - currently not indexed" state, too. And that's in spite of you doing everything by the book - Creating amazing content, interlinking, updating the sitemap regularly, etc.

This is perfectly normal and nothing to worry about.

Here's why:

  • Your site is new (and that's even if you launched it a couple of months ago. It's new from the SEO perspective.)
  • Google does not know whether you will stick around.
  • It does not know whether you are an expert (or even, what you are an expert at.)

As a result, it usually treads carefully. It doesn't dedicated too many resources into your site, primarily because it does not know whether you'll stick around. But as you continue publishing content and your site's authority grows, the frequency of crawls will naturally increase. Until that happens, though, you will have to be patient and slowly build up the foundation for your SEO program.

And that’s it.

Portrait picture of Pawel Grabowski, owner of Stacking Pancakes.

Hey there...

My name is Pawel Grabowski. I am a startup SEO consultant specializing in helping early-stage startups develop and deploy successful SEO programs.

Learn more about me or hire me to run SEO for your startup.