Image indexation and other new features of Simple XML sitemap 2.10

26 Sep 2017
sitemap.xml

This is a technical description of the 2.x branch of the module. For the newer 3.x branch see this article; for 4.x, see this article.


New features of Simple XML sitemap

Version 2.10 of Simple XML sitemap is mainly a feature release with only a few minor bugs fixed. The new features are

  • the implementation of the changefreq parameter
  • the ability to set an interval at which to regenerate the sitemap
  • the ability to customize XML output
  • the ability to add arbitrary links to the sitemap
  • image indexation

See the 8.x-2.10 release page for details.
A new version has been released, please make sure to visit the project page.

Image indexationInclusion settings

Simple XML sitemap is now able to create Google image sitemaps through indexing all images attached to entities. This includes images uploaded through the image field as well as inline images uploaded through the WYSIWYG. The inclusion of images can be set at the entity type and bundle level but can be overridden on a per-entity basis giving you all the flexibility.

Please bear in mind, that all images attached to entities get indexed regardless of their file system location (public/private). Another thing worth noting is that the original images get indexed, not the derived styles. This should be considered before indexing entities with many high resolution images which could increase traffic.

Indexation of custom link images has not made it into this release, but the feature is already available in the development version of the module.

Adding arbitrary links to the sitemap

Most use cases dictate the inclusion of internal links which can be achieved through adding entity links to the index. For non-entity pages like views, there has been the possibility to add custom links through the UI or the API. In both cases however the system only allows internal links which are accessible to anonymous users. The new version of the module provides a way to add any link to the index, even ones Drupal does not know about:

  1. /**
  2.  * Use this hook to add arbitrary links to the sitemap.
  3.  *
  4.  * @param array &$arbitrary_links
  5.  */
  6. function hook_simple_sitemap_arbitrary_links_alter(&$arbitrary_links) {
  7.  
  8. // Add an arbitrary link.
  9. $arbitrary_links[] = [
  10. 'url' => 'http://example.com',
  11. 'priority' => '0.5',
  12. 'lastmod' => '2012-10-12T17:40:30+02:00',
  13. 'changefreq' => 'weekly',
  14. 'images' => [
  15. ['path' =>'http://path-to-image.png']
  16. ]
  17. ];
  18. }

As the example shows, all properties of the link like priority/lastmod/changefreq can be defined as well.

To alter links shortly before they get transformed to XML output, there is still the possibility to use the following:

  1. /**
  2.  * Alter the generated link data before the sitemap is saved.
  3.  * This hook gets invoked for every sitemap chunk generated.
  4.  *
  5.  * @param array &$links
  6.  * Array containing multilingual links generated for each path to be indexed.
  7.  */
  8. function hook_simple_sitemap_links_alter(&$links) {
  9.  
  10. // Remove German URL for a certain path in the hreflang sitemap.
  11. foreach ($links as $key => $link) {
  12. if ($link['path'] === 'node/1') {
  13. // Remove 'loc' URL if it points to a german site.
  14. if ($link['langcode'] === 'de') {
  15. unset($links[$key]);
  16. }
  17. // If this 'loc' URL points to a non-german site, make sure to remove
  18. // its german alternate URL.
  19. else {
  20. if ($link['alternate_urls']['de']) {
  21. unset($links[$key]['alternate_urls']['de']);
  22. }
  23. }
  24. }
  25. }
  26. }

Basic alteration of the XML output

The following two new hooks can now be used to alter the XML output:

  1. /**
  2.  * Alters the sitemap attributes shortly before XML document generation.
  3.  * Attributes can be added, changed and removed.
  4.  *
  5.  * @param array &$attributes
  6.  */
  7. function hook_simple_sitemap_attributes_alter(&$attributes) {
  8.  
  9. // Remove the xhtml attribute e.g. if no xhtml sitemap elements are present.
  10. unset($attributes['xmlns:xhtml']);
  11. }
  12.  
  13. /**
  14.  * Alters attributes of the sitemap index. shortly before XML document generation.
  15.  * Attributes can be added, changed and removed.
  16.  *
  17.  * @param array &$index_attributes
  18.  */
  19. function hook_simple_sitemap_index_attributes_alter(&$index_attributes) {
  20.  
  21. // Add some attribute to the sitemap index.
  22. $index_attributes['name'] = 'value';
  23. }

Other API changes

The API is now more forgiving allowing missing link setting arguments when using some of its inclusion altering methods. Here is en example of the simple_sitemap.generator API in action:

  1. \Drupal::service('simple_sitemap.generator')
  2. ->saveSetting('remove_duplicates', TRUE)
  3. ->enableEntityType('node')
  4. ->setBundleSettings('node', 'page', ['index' => TRUE, 'priority' => 0.5])
  5. ->removeCustomLinks()
  6. ->addCustomLink('/some/view/page', ['priority' => 0.5])
  7. ->generateSitemap();

More documentation can be found here. I hope the new version of this module will be of great use to you!

All info about the project

Comments

One question, if you don't mind:

You say "add any link to the index, even ones Drupal does not know about", but then suggest using a hook. As a result, I'm a bit confused.

I use a directory on my website as a "personal Imgur" - i.e. upload images there through SFTP to post them somewhere, e.g. on forums.

That's the images "Drupal does not know about", but I would like to have them indexed, as the contents is usually rather interesting, unique and on topic (not like memes on Imgur).

What do I need to do to make Drupal include those images into sitemap? In most cases those images aren't displayed on my site (i.e. aren't used in the contents of any node).

Thanks,
Dmitri

Usually you would not index images by themselves, instead you would index images as part of a webpage. For this page, it looks like this:
 

  1. <url>
  2. <loc>http://gbyte.co/blog/image-indexation-new-features-simple-xml-sitemap-2.10</loc>
  3. <xhtml:link href="http://gbyte.co/blog/image-indexation-new-features-simple-xml-sitemap-2.10" hreflang="en" rel="alternate">
  4. <lastmod>2017-12-12T22:36:26+01:00</lastmod>
  5. <changefreq>weekly</changefreq>
  6. <priority>0.7</priority>
  7. <image:image>
  8. <image:loc>http://gbyte.co/sites/default/files/public/images/blog/sitemap_8_0_0.png</image:loc>
  9. </image:image>
  10. <image:image>
  11. <image:loc>http://gbyte.co/sites/default/files/public/inline-images/bundle_settings_2_0.png</image:loc>
  12. </image:image>
  13. </xhtml:link>
  14. </url>

So if you have an accessible index page of these images, you can use hook_simple_sitemap_arbitrary_links_alter to add that page and its images like shown above, or, if it is a routed page, just add it to the index and use hook_simple_sitemap_links_alter to add images to it.

But if you are serious about the image drop functionality, the best thing would be to build this functionality in a way which makes Drupal know the images. I implemented this functionality on gbyte.co to share documents and images with my clients. There is some custom code involved, but most of the work is done by these modules:

  • ACL
  • Content Access
  • Download
  • User Protect

ACL and Content Access handle the permissions, Download makes it possible to download all the files attached to an entity. I also implemented group accounts where many people can use the same credentials to log in to their files. To prevent them from editing the account, the User Protect module can be utilized.

I would like to know exactly xml sitemaps works. I am also the developer of .net but I am confused with xml things how it works exactly

Thanks for the information.

Can anyone suggest me how to add extra elements to sitemap? For example: I want to add an additional element name inside the tag

  1. <url>
  2. <mobile:mobile type="pc">
  3. <loc>https://abc/</loc>
  4. <xhtml:link href="https://abc/in" hreflang="en-in" rel="alternate">
  5. <xhtml:link href="https://abc/cn" hreflang="zh-hans" rel="alternate">
  6. <priority>1.0</priority>
  7. </xhtml:link></xhtml:link></mobile:mobile>
  8. </url>

 

This is possible with the 8.x-3.x branch.

The easiest way to do it would be to implement your own sitemap generator plugin by extending Drupal\simple_sitemap\Plugin\simple_sitemap\SitemapGenerator\DefaultSitemapGenerator. You can then tweak the sitemap structure to your liking.

If you need URLs to also inject different data into the sitemap structure, implement a UrlGenerator plugin by extending the various URL generators, like Drupal\simple_sitemap\Plugin\simple_sitemap\UrlGenerator\EntityUrlGenerator.

In order to replace the old sitemap and URL generator plugins with the new ones, either use the API method to alter the default or add a new sitemap type definition:

  1. \Drupal::service('simple_sitemap.generator')
  2. ->setSitemapTypeDefinition('default_hreflang', $definition);

... or use some hooks to alter the definition at runtime as this API example exemplifies:

  1. function hook_simple_sitemap_types_alter(array &$sitemap_types) {
  2.  
  3. // Remove the custom links generator from the default sitemap type definition.
  4. $key = array_search('custom', $sitemap_types['default_hreflang']['url_generators']);
  5. unset($sitemap_types['default_hreflang']['url_generators'][$key]);
  6.  
  7. // Define a new sitemap type to be generated with the default sitemap generator.
  8. // Make it use only the custom and arbitrary link generators.
  9. $sitemap_types['fight_club_sitemap_type'] = [
  10. 'label' => t('Fight Club Sitemap'),
  11. 'description' => t('The second rule of Fight Club is...'),
  12. 'sitemap_generator' => 'default',
  13. 'url_generators' => [
  14. 'custom',
  15. 'arbitrary',
  16. ],
  17. ];
  18. }

Hit me with any questions you might have. I will be publishing a blog post about all new features of 8.x-3.1 as soon as 3.1 arrives.

How use Simple XML sitemap to generate automatically sitemap.xml with pages generated by views?
I know that I can use UI and add it manually, but it's little annoying when the pages often changing.

Thanks Pawel. I will check this.

Moreover is there a way to generate different sitemap for each domain.

Generally the sitemap will show all the urls of different langauge like english, french, chinese. Whereas my client wants to show only english urls on english domain and chinese urls on chinese domain.

Note : Its not a multisite. Its a multilingual site.

@Deepika The primary function of this module is to generate sitemaps according to the new Google so-called hreflang standard. AFAIK hreflang sitemaps should be understood by all major search engines. You need to tell your client. :)

But with 3.x there is nothing stopping you from writing your own sitemap type by writing your own stiemap and url generator. Then you could add say 3 sitemap variatns of that new type which you would name after langauages. This could be done dynamically via a hook to automatically determine active languages. There you go, you have a traditional sitemap.

An easier approach would be to use xmlsitemap which generates sitemaps according to the older standards, but I haven't been able to use that module because of its unfinished/buggy state.

@Tomasz Is your intent to generate a sitemap which holds links to your views pages? Or do you mean links to entities from views results?

If it is the former, there is no automated way of adding all views, but

  • you could write a link generator plugin that does exactly this (cleanest option).
  • Alternatively, use one of the hooks, to add links to views automatically. You could use hook_simple_sitemap_arbitrary_links_alter() or hook_simple_sitemap_links_alter() to add all views links automatically.
  • The third option would be to use the API to add links to all views on every cron run.

I don't think that a script that simple could rival the integration the Drupal module provides. But why not just try it?

So you want to generate an XML sitemap with links taken from views results?

In this case I would write a URL generator plugin for simple_sitemap that loads the views' results programmatically. If you wish support with this feature, drop me a line through the contact page so we can discuss privately.

Neuen Kommentar hinzufügen

Der Inhalt dieses Feldes wird nicht öffentlich zugänglich angezeigt.

Restricted HTML

  • Erlaubte HTML-Tags: <a href hreflang target> <em> <strong> <cite> <blockquote cite> <pre> <ul type> <ol start type> <li> <dl> <dt> <dd> <h4 id> <h5 id> <h6 id>
  • Zeilenumbrüche und Absätze werden automatisch erzeugt.
  • Website- und E-Mail-Adressen werden automatisch in Links umgewandelt.

Angebot innerhalb von 24 Stunden

Ob ein großes kommerzielles System, oder eine kleine Business Seite, wir schicken ein Angebot ab innerhalb von 24 Stunden nachdem Sie diese Taste drücken: Angebot anfordern