Ben Szymanski

Software Engineer • πŸ‡ΊπŸ‡Έ & 🐊

Backstory

A few months ago, I was tasked with a project of adding custom search functionality to an Umbraco-powered website. Umbraco, by default I believe, comes bundled with a pretty nifty search indexer and API called Examine. It provides pretty much all the facilities to index CMS content, malliable configuartion which lets you customize your indexes, hooks to control and affect indexing behavior, and a pretty straightforward API to query against the generated indexes.

While it is easy (and well documented) to create a single Examine indexer that indexes two or more Umbraco document types, and control which properties (in aggregate) are available in the index, it''s a bit harder if you need to combine the results of one or more indexes into a single list of search results.

I have seen this done in a few ways. The first and most obvious way would be to query one index, query another, combine and then perform a descending sort on the index score property (a decimal). The major problem with this is assuming that each of the indexes are scoring their results consistently between each other before combining the query results into a single list.

To put it another way: Say Index1 has 1000 documents of of 3000 total documents that mention the word "Apple." Say another index, Index2, has 3 documents out of 50 total document that mention the word "Apple." The scoring of the relevenacy of the term "Apple" might be different between the two indexes. Thus, a largely irrelevent item mentioning the term "Apple" in Index2 could be pushed to the top of the search results list due to it having a high score in relation to Index2, but not to Index1.

Cognitive dissonance is setting in, as we are now left to figure out how to solve this new problem.

While I was working on adding this search feature to the website, I stumbled across mention of a largely unmentioned and undocumented Examine search provider named "MultiIndexSearcher."

Just created a multi-index search provider for #Examine... new service release coming out this week.

umbraco

- Shannon Deminick (@Shazwazza)

(I hope I am not revealing a great secret of the universe by posting about it here, but I thought it''d be helpful to have something on the internet about it.)

Like the rest of Examine, MultiIndexSearcher is pretty easy to set up. But it does differ from the way other searchers

Typically, other searchers are created using something like:

var searcher = ExamineManager.Instance.SearchProviderCollection["ExternalSearcher"];

It looks like we've got a little singleton action going on, acting as a factory to allocate and get references to a pool of Searchers. But for some reason, MultiIndexSearcher is not available if you try to allocate one through this property.

Instead, we need to create MultiIndexSearcher by hand, and thankfully it's not that difficult.

Example: allthingsapple.com

Let's suppose I am making a site that contains information about all Apple computers ever released. (This is a hypothetical website, please do not go to allthingsapple.com, I have no idea what you will find!) In my website, I have a mix of Umbraco document types, and a mix of informational PDFs about Apple computers. I'd like to provide search functionality that merged results from the ComputerIndexSet, with the PDFIndexSet.

I have installed the UmbracoCms.UmbracoExamine.PDF package using NuGet, created my indexes in the Examine config files (see addendum), and I'm ready to go! But how can I leverage the amazing power of the MultiIndexSearcher provider?

What I have come up with is something like this:

public ActionResult AppleSearcher(Models.SearchResultsModel model)
{
  // Create a NameValueCollection that maps the name of each
  // Examine Index Set (as defined in ExamineIndex.config),
  // and map to the "indexSets" key.
  var searcherNamedCollection = new NameValueCollection();
  searcherNamedCollection.Add("indexSets", "ComputerIndexSet");
  searcherNamedCollection.Add("indexSets", "PDFIndexSet");

  var searchTerm = Request.QueryString.Get("q");

  // Create the search provider by hand, calling Initialize() with
  // an arbitrary string name and also passing in our NameValueCollection
  // from above.
  var searchProvider = new Examine.LuceneEngine.Providers.MultiIndexSearcher();
  searchProvider.Initialize("computerSearcher", searcherNamedCollection);

  // Create search criteria, a standard feature in all other Examine 
  // documenation and guides.
  var searchCriteria = searchProvider.CreateSearchCriteria(BooleanOperation.Or);

  // Create a search query, and then perform your search.
  var searchQuery = searchCriteria.Field("nodeName", searchTerm).Or().Field("computerModelName", searchTerm).Compile();
  var searchResults = searchProvider.Search(searchQuery, 10).OrderByDescending(x => x.Score);

  var viewModel = new Models.SearchResultsModel();
  viewModel.searchTerm = searchTerm;
  viewModel.searchResultList = searchResults;

  return CurrentTemplate(viewModel);
}

The entire operation is wrapped in an MVC controller method, returning a standard ActionResult (Thank-you Umbraco devs for making things so consistent and clean!) The code and code comments should be pretty explanatory and illustrative, so I won't outline them further here. But essentially, this is all there is to it. Amazing!

And to display a list, I pass my view model into a Razor template file and generate a nice listing like so:

@model MultiIndexer.Models.SearchResultsModel

@{
  ViewBag.Title = "Search Results";
  var UmbracoHelper = new Umbraco.Web.UmbracoHelper(Umbraco.Web.UmbracoContext.Current);
}
<ul class="search-results-ul">
  @foreach (var result in Model.searchResultList)
  {
      IPublishedContent cmsItem = null;

      if (result.Fields["__IndexType"] == "content")
      {
          cmsItem = UmbracoHelper.Content(result.Id);
      }
      else if (result["__IndexType"] == "media")
      {
          cmsItem = UmbracoHelper.Media(result.Id);
      }

      <li>
          @if (cmsItem.DocumentTypeAlias == "appleDocumentType")
          {
              var title = string.Format("{0} {1}", cmsItem.GetPropertyValue("computerModelName"), cmsItem.GetPropertyValue("computerSeries"));

              <a href="@cmsItem.Url">
                  <h3>@title</h3>
              </a>

              <p><b>[Article]</b> @cmsItem.GetPropertyValue("notes")</p>
          }
          else if (cmsItem.DocumentTypeAlias == "File")
          {
              <a href="@cmsItem.Url">
                  <h3>@cmsItem.Name</h3>
              </a>

              <p><b>[PDF]</b></p>
          }
          else
          {
              <h3>@cmsItem.Name</h3>

              <p>[Non-Content]</p>
          }
      </li>
  }
</ul>

That's all there is to it, really. It's simple once you know that MultiIndexSearcher exists and have studied up on how to work with it.

Reference

Proudly powered by Pelican, which takes great advantage of Python.