Skip to content

Include and Exclude Pages

By default, Sitepager scans all accessible pages it can find from your Website URL. Use these settings to control exactly which pages are included in a scan.

Find these settings under Advanced Settings > Update scan scope in your scan settings.

Sitepager uses three controls to determine which pages to scan:

Website URL sets the starting point. Sitepager crawls from this URL and follows the links to discover pages.

Exclude URL patterns removes pages that match a pattern. Any page matching an exclude pattern is skipped.

Include URLs adds specific pages that are always scanned, even if they match an exclude pattern.

Exclude patterns are applied first. Include URLs override them.

  • Sitepager starts from your Website URL and discovers pages by following links
  • Discovered pages are checked against exclude patterns. Matches are skipped.
  • Include URLs are always scanned, even if they match an exclude pattern
  • If Crawl included URLs is enabled, links found on included pages are also followed and still subject to exclude patterns
  • The total number of pages scanned is capped by your max pages setting

Subdomains are excluded by default. Each subdomain should be scanned separately so it can have its own baseline and run history.

To make these concepts easier to understand, here is an example site structure we will use throughout this page:

Example site structure

The Website URL defines the entry point of the crawl and determines which pages are included in your scan. Here is how to pick the right Website URL based on your goals.

What you want to scanWebsite URLAdditional configuration
The entire siteHomepage (yoursite.com)Use Exclude URL patterns to skip sections
A specific sectionSection URL (yoursite.com/features)Use Exclude URL patterns for finer control
Specific key pages onlyHomepage (yoursite.com)Use Include URLs for exact pages. Use Crawl included URLs to control depth
A subdomainSubdomain URL (subdomain.yoursite.com)Subdomains must be scanned separately. Each as its own Website URL

Use exclude patterns to skip pages or sections you do not need in your scan.

Open Advanced Settings > Update scan scope and add patterns under Exclude URL patterns. Enter multiple patterns separated by commas or press Enter after each one.

Examples:

  • /blog skips URLs containing /blog (for example /blog/post-1)
  • /admin skips URLs containing /admin (for example /admin/dashboard)
  • /api skips URLs containing /api (for example /api/v1)

Patterns support simple text matching. If you enter /blog, any URL containing /blog is skipped.

For advanced control, regex patterns can also be used. For example, ^(?!.*\/blog\/).*$ excludes all pages that are not under /blog/.

Use Include URLs to add specific pages that are always scanned, regardless of exclude patterns.

Add URLs under Include URLs in the scan scope settings. You can enter full URLs like https://yoursite.com/pricing or paths like /pricing.

Crawl included URLs is an option below the include list.

  • Disabled: only the exact URLs you listed are scanned
  • Enabled: Sitepager also follows links found on those pages and scans them too, within your max pages limit

Goal: Scan all public pages but skip blog posts.

Exclude blog section

Configuration:

  • Website URL: https://sitepager.io
  • Exclude URL patterns: /blog

Result: Sitepager crawls all pages linked from the homepage except anything under /blog. The admin.sitepager.io subdomain is skipped by default. Subdomains require a separate scan.

Scan a specific section and skip a subsection

Section titled “Scan a specific section and skip a subsection”

Goal: Scan all pages under /features but skip beta features.

Exclude beta features

Configuration:

  • Website URL: https://sitepager.io/features
  • Exclude URL patterns: /features/beta

Result: Sitepager crawls all pages under /features, excluding anything under /features/beta.

Goal: Scan only your most important pages. Homepage, pricing, and features.

Include key pages

Configuration:

  • Website URL: https://sitepager.io
  • Include URLs: https://sitepager.io/pricing, https://sitepager.io/features
  • Crawl included URLs: Disabled or Enabled

Result:

  • Crawl included URLs disabled: Sitepager scans only the pricing page and features page. Child pages under /features are not scanned.
  • Crawl included URLs enabled: Sitepager scans the pricing page, features page, and all pages linked from the included URLs.

To include the homepage, add / to the include list.