2.19.5.1. Site map (sitemap.xml)

The sitemap.xml file, in its standardized format, provides search engines with a list of pages to be indexed. A description of the protocol is available on the official site.

Important points:

  • The sitemap.xml file must have exactly this name and must be encoded in UTF-8.
  • A single sitemap.xml file must not exceed 50 MB. If the file is larger than 50 MB, you must either compress it (using the xml.zip or xml.tar extension) or create a group consisting of multiple sites.
  • A single sitemap.xml file should contain no more than 50000 links.
  • The sitemap.xml file must be placed in the site root directory and be accessible via a browser at a URL of the form http://www.example.com/sitemap.xml.
  • All links in the site map must be absolute (in the format http://www.example.com/).
  • The site map must meet the requirements of the relevant search engine crawler, as some of them have specific conditions for using this file.
  • The site map used by search engine crawlers is merely a recommendation. Crawlers may ignore it if there are errors in the site map itself or for other reasons of their own.
  • Certain special characters must be escaped.

When creating a site map, you must follow a specific syntax. A basic site map with correct syntax looks something like this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://example.com/</loc>
   </url>
</urlset>

The following tags are used in the sitemap.xml file:

  • <?xml version="1.0" encoding="UTF-8"?> — prologue of an XML file. This line specifies the encoding and version of the XML. This line must always be the first one and is required. Required tag
  • <urlset>...</urlset> — parent tag that contains all subsequent references to site pages using the <url> tags. Required tag
    The opening tag must specify the current protocol, like this:
    <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">...</urlset>
  • <url>...</url> — tag that contains the URL itself and information about it. Required tag
  • <loc></loc> — a tag that specifies a particular URL. Required tag
  • <lastmod></lastmod> — date of last modification. Optional tag
  • <changefreq></changefreq> — expected frequency of changes to this page. This tag is for informational purposes only. Optional tag
    Valid values:
    • always — check for changes during each indexing.
    • hourly / daily / weekly / monthly / yearly — check for changes at regular intervals. Every: hour/day/week/month/year.
    • never — never check for changes.
  • <priority></priority> — the priority of a URL relative to other URLs listed in the site map. The value ranges from 0.0 to 1.0; the default value for all URLs is 0.5. Optional tag

    Warning!

    The priority tag does not affect how pages appear in search results. Its value only affects the order in which pages on the site are indexed.

In XML files, for all data (including URLs), the characters & (ampersand), ' (single quotes), " (double quotes), < (greater-than), and > (less-than) must be specified as HTML entities (starting with &).

If the sitemap.xml file is larger than 50 MB or contains more than 50000 links, you should split it into multiple files and create a sitemap.xml file that links to the other sitemap files.

Example of a sitemap file:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <sitemap>
      <loc>http://www.example.com/sitemap1.xml</loc>
   </sitemap>
   <sitemap>
      <loc>http://www.example.com/sitemap2.xml</loc>
   </sitemap>
</sitemapindex>

The sitemap index file has the following syntax:

  • <?xml version="1.0" encoding="UTF-8"?> — prologue of an XML file. This line specifies the encoding and version of the XML. This line must always be the first one and is required. Required tag
  • <sitemapindex>...</sitemapindex> — parent tag that contains all subsequent references to site map files. Required tag
  • <sitemap>...</sitemap> — tag that contains the URL pointing to the sitemap file and information about it. Required tag
  • <loc></loc> — tag that specifies a specific URL for the sitemap file. Required tag
  • <lastmod></lastmod> — date of last modification. Optional tag

Examples of services for generating and validating sitemap files.

Attention!

Hosting Ukraine is not affiliated with the services listed and cannot recommend a specific tool for performing any particular actions.
コンテンツ