If you are not seeing a page in your index, there are a few different ways to troubleshoot and identify the problem.
Step 1: Check to see if the page has been indexed
First, check if the page is included in the index by searching for it within the crawler log. To do this you can go to your Settings > Crawlers, and click on the date to view the Crawler Log.
Here, you can see the response message from the crawler. All pages that have been indexed successfully will state so.
If a page was NOT included in an index, it may be missing required fields, have a canonical tag, or have a noindex tag. You can search missing in the crawler log to view pages that were not indexed due to missing fields.
Step 2: Test the crawler
The next step to troubleshooting is to test the crawler with a specific page in question. To test your crawler, go to the Settings > Select the Crawler > Step 4 Test your crawler. If a page was not able to be indexed, there will be red text with the reason why it was not indexed.
Troubleshooting why a page will not be crawled:
Returned HTTP status code NotFound instead of 200 (OK) and will be ignored
The page is not a valid URL. Review your site for the correct or new URL and re-test the URL.
Out of base domains
The page is not included within the domains to be crawled in Step 1. To add this page to the index, add the URL under domains to be crawled. Keep in mind, domains to be crawled is simply the starting location for the crawler and any pages that follow the URL structure will be included to be crawled if they are linked to the main domain to be crawled.
Another method to adding the single page is to whitelist the URL within Advanced Settings. Whitelisting a URL will add the page to the index and will not crawl any pages beyond it.
Missing required fields: Title, Description, Category and/or will be ignored
The page does not have the required fields needed to appear in the index. The best way to fix the issue is to review your settings for Fields to be crawled for content pages in Step 2. Title and description are mandatory fields as they are needed to display information in the search engine results page. Depending on the settings, the page is not being indexed because it does not have the specified field.
To fix this:
- Edit the page itself to include the field that is being crawled for the specified missing field.
- Adjust the settings for the field to be crawled to capture a field that the page includes. NOTE: This will affect all pages in your index and how they will be captured. Automatic extraction is recommended. If you have questions about getting a page to be included, please submit a support ticket.
- Category: If categories are set up and the page is not being indexed due to a missing category, you can create a new category to capture the page or set the Default value if the string isn't found to create a category for pages that do not fall within the other categories.
Has a canonical tag that refers to another page and will be ignored
Canonicalization helps with duplicated content. In this example, the page in question has a canonical tag that tells the crawler to reference another page that may be outside of the domains to be crawled, in which it will be ignored. To fix this, where the rel canonical is referencing and add the page within domains to be crawled or updated the rel canonical as needed.
Please feel free to contact support with any specific questions you have regarding your index!