Search
  • Juan Ayala

Apache Sling Sitemap in AEM 6.5 & AEMaaCS

On almost every AEM project I've worked on, there has been a need to generate a Google sitemap. It is standard SEO practice. The first time I wrote one using JSP and a resource visitor. In the last couple of projects, I used the ACS Commons Sitemap Generator.


Recently I came to find out that the ACS generator is now deprecated. Its recommended replacement is the AEM WCM Core Components Sitemap feature. That is on AEMaaCS. For 6.5.10.0 and below, there is no such feature. That is until now with the release of 6.5.11.0.


This one feature is no reason to rush to upgrade. It is something to keep in mind should you find yourself working with that version. And definitely, if you are working on AEMaaCS.


The video on the tutorial page covers the configuration. You will generate sitemaps in no time. I will sum up those steps here. What that video does not cover is customization. Skip to the end if that is what you are looking for.


Prerequisites


If you are on AEM 6.5, make sure you have service pack 11 or greater. Once you have that, you will also need the SEO Index Package from the Software Distribution site. That is an oak:index definition that will index the sling:sitemapRoot property.


If you are on AEMaaCS, you are ready to go.


Setting Up on Author


I set up a new 6.5.11.0 instance and generated a project with the Maven archetype. You will need to do two things on the author instance


Configure the Apache Sling Sitemap - Sitemap Generator Manager to work on-demand. Create the file org.apache.sling.sitemap.impl.SitemapGeneratorManagerImpl.cfg.json in the config.author runmode folder

{
  "allOnDemand": true
}

And set the page property sling:sitemapRoot to true on your target root i.e. /content/mysite/us. That's it! Access http://localhost:4502/content/mysite/us.sitemap.xml and you have finished. Having the on-demand option is a good way to debug and preview.

​⚠️ If you created a brand new instance and deployed a new archetype project, you may see an empty sitemap. You need to publish /content/mysite/us/en first. When running on the author, the sitemap generator will only consider published pages.

Setting Up on Publish


On the publish instance we are going to set up an instance of the Apache Sling Sitemap - Scheduler. It will be stored at /var/sitemaps and instead of on-demand, it will be served from this generated source.


Create the file org.apache.sling.sitemap.impl.SitemapScheduler~jm.cfg.json in the config.publish runmode folder

{
  "scheduler.name": "My Daily Sitemap Scheduler",
  "scheduler.expression": "0 0 2 1/1 * ? *",
  "searchPath": "/content/mysite"
}

In either instance, the Sitemap Servlet will be serving up the XML. It has an extra configuration you can tweak. By default wcm/foundation/components/basicpage/v1/basicpage is the root resource type.

Sitemap Servlet OSGi Config

If that is not in the inheritance tree of your page component, you'll need to add your resource type here.


Customization


There are 2 bundles. Open the Felix Console and locate the Apache Sling Sitemap (org.apache.sling.sitemap) v1.04. That is an open-source project. As their README.md states, you have to do a few things to get started. Adobe has done them all.


The second bundle belongs to Adobe. Open the Felix Console and locate Adobe AEM - SEO Implementation (com.adobe.cq.wcm.com.adobe.aem.wcm.seo.impl) v1.0.6. This is not open-source. In the manifest, you will see the SitemapGenerator implementation com.adobe.aem.wcm.seo.impl.sitemap.PageTreeSitemapGeneratorImpl. There is also an implementation of the SitemapLinkExternalizer service.


Adobe's implementation will use the resource resolver mappings to shorten the URLs. The archetype project is set up to map /content/mysite/us/en/mypage to /en/mypage. What Adobe's generator lacks is an option to remove the .html extension.


Unfortunately, Adobe's implementation is not extensible. You will need to create your own generator and give it a larger ranking. Locate the PageTreeSitemapGeneratorImpl in the components console. Its ranking is 10. Here I am creating MySitemapGenerator that extends the abstract class ResourceTreeSitemapGenerator


@Component(service = SitemapGenerator.class)
@ServiceRanking(20)
public class MySitemapGenerator extends ResourceTreeSitemapGenerator {

    private static final Logger log = LoggerFactory.getLogger(MySitemapGenerator.class);

    @Reference
    private SitemapLinkExternalizer externalizer;

    @Override
    protected void addResource(final String name, final Sitemap sitemap, final Resource resource)
            throws
            SitemapException {

        final Page page = resource.adaptTo(Page.class);
        if (page == null) {
            log.warn("{} is not a page", resource.getPath());
            return;
        }
        final String location = this.externalizer.externalize(resource);
        final Url url = sitemap.addUrl(location);
        final Calendar lastmod = Optional.ofNullable(page.getLastModified())
                                         .orElse(page.getContentResource()
                                                     .getValueMap()
                                                     .get(JcrConstants.JCR_CREATED, Calendar.class));
        if (lastmod != null) {
            url.setLastModified(lastmod.toInstant());
        }
        log.debug("added {} to sitemap", url);
    }
}

I am using the SitemapLinkExternalizer implemented by Adobe. This is the barest of implementations. Whatever logic was in Adobe's generator is now lost and I would need to re-create in my implementation.


And the ranking is a brute-force approach. If you are working in a multi-site environment, this will affect all sites. A better approach would be to remove the ranking. This would allow Adobe's implementation to supersede yours. Instead, in the scheduler settings, you can specify your generator and content path.

Conclusion


The Apache Sling Sitemap bundle is open-source. Adobe's implementation of the SitemapGenerator is not. Adobe's generator should work for almost everyone. In case it does not you can create your own. Since their generator is not extensible you will be losing its functionality.


Think before you create your own. In my example above, one could argue that the Html extension has no impact on SEO ranking. That removing it is an aesthetic thing. I happen to agree with that. But, I needed an example when I was writing this 😉

2,447 views0 comments

Recent Posts

See All