When this article was first published, we used the example of the top comment per post rather than the author. However, Dave Burles pointed out a problem with our code’s queries. In order to fix the problem and retain the same functionality, the publications became too complex and obscured the point of the post.

So we chose a slightly simpler example that still demonstrates the same issues about joins that the original article made.

One of the benefits of building Meteor applications is that you don’t have to think about client-server APIs anymore, since Meteor’s built-in collections handle synchronizing data to the browser for you.

But this new paradigm comes with its own requirements: order to limit the amount of data and maintain privacy, any real-world app must also include a set of publications and subscriptions in order to make sure the right data is transmitted to the right users.

Consequently, publication design is a very important part of building a large scale Meteor app, and it’s often a new challenge for even the most experienced of web developers. So today, we’ll take a look at one of the trickiest problems of publishing data with Meteor: joining across collections.

If you’re not familiar with the way publications and subscriptions work in Meteor, we recommend you check out our previous article on the topic before reading on.

Top Posts & Authors

Although Meteor uses MongoDB, a NoSQL database which encourages denormalization and using a limited number of collections, there are good reasons for splitting your data model up into more collections (we also cover this topic more in detail in the Denormalization chapter of Discover Meteor).

As a concrete, example, let’s consider Microscope, the social news application that we’re building in the book. Microscope is a simple Reddit-style app where users can create posts, vote on them, as well as comment on a post.

On the front page on the site, suppose we wanted to display the top 30 posts. Although in the book we denormalize the author’s name onto the post for simplicity, as things get more complex we may want to start displaying things like the author’s avatar next to the post. At a certain point, it’s going to be easier to publish the 30 (or less) authors that are attached to the top 30 posts, than keep denormalizing onto the posts.

In Microscope, posts and users live in two separate Posts and Meteor.users collections. So how can we ensure that the 30 posts and (up to) 30 users we want are published to each user as they land on the front page?

The Naive Approach

Although most publications you’ll come across will usually return a single cursor, it turns out you can just as well return an array of cursors. So let’s try returning both a Post cursor and a Meteor.users cursor in the same publication:

Meteor.publish('topPostsWithTopComments', function() {
  // first, get the top 30 posts
  var topPostsCursor = Posts.find({}, {sort: {score: -1}, limit: 30});
  // then extract those posts' userIds
  var userIds = topPostsCursor.map(function(p) { return p.userId });

  // then return an array containing both the posts, and their corresponding comments
  return [
    topPostCursor,
    Meteor.users.find({_id: {$in: userIds}});
  ];
});
A possible implementation of our publish function.

At first, this looks like it might work. But it turns out this implementation has a subtle flaw, due to the way Meteor handles reactivity on the server.

With the current code, when a post’s author’s avatar changes you’ll see the author’s new info appear in the browser instantly, as it should.

However, when a new post is added to the top post list, although that post will pop in as expected it won’t have any author associated with it! So what exactly is going on?

“Cursor”?

A database cursor is a control structure that enables traversal over the records in a database.

More specifically, in MongoDB (and by extension, in Meteor apps) a cursor is a pointer to the result set of a Collection.find() query.

Client & Server: Two Flavors of Reactivity

The key thing to understand is that the term “reactivity” actually covers different behaviors on the client and the server.

On the client, reactivity is the “Meteor magic” you are used to: when data changes, your code will re-run as necessary to ensure the user interface layer reflects the data changes.

So let’s go back to our Meteor.publish() function from earlier:

Meteor.publish('topPostsWithTopComments', function() {
  var topPostsCursor = Posts.find({}, {sort: {score: -1}, limit: 30});
  var userIds = topPostsCursor.map(function(p) { return p.userId });

  return [
    topPostCursor,
    Meteor.users.find({_id: {$in: userIds}});
  ];
});

If this was client code (ignoring for a moment the fact that Meteor.publish() doesn’t exist on the client), you would expect this block of code to re-run whenever the list of posts changes, since it contains Collection.find() calls.

However on the server, Meteor’s reactivity is limited to cursors returned by Meteor.publish() functions. The direct consequence of this is that unlike on the client, code will not magically re-run whenever data changes. It will soon become apparent why that’s a problem in our case.

The Problem

When a new post enters the top 30 list, two things need to happen:

  • The server needs to send the new post to the client.
  • The server needs to send that post’s author to the client.

Meteor is observing the Posts cursor returned on line 6, and so will send the new post down as soon as it’s added, ensuring the client will receive the new post straight away.

However, consider the Meteor.users cursor returned on line 7. Even if the cursor itself is reactive, it’s now using an outdated value for the userIds array (which is a plain old non-reactive variable), which means its result set will be out of date as well.

This is why as far as that cursor is concerned, there is no need to re-run the query and Meteor will happily continue to publish the same 30 authors for the original 30 top posts ad infinitum.

So unless the whole code of the publication runs again (to construct a new list of userIds), the cursor is no longer going to return the correct information.

Now that we’ve established the problem, let’s explore a couple ways of dealing with joins in Meteor.

1. Overpublishing

The simplest way to deal with joins is getting rid of them altogether!

One way to achieve this is to overpublish, in other terms publish all the documents of the collection we want to join on.

For example, if we were dealing with a blog that has a limited number of authors, it would make sense to simply publish all authors and be done with it.

2. Denormalization

Another method of avoiding the need for joins is to denormalize, i.e. include the author as part of the post document. That’s where we started in this case, and perhaps we’ll decide it’s easier to keep the avatar embedded in each post rather than go to all this trouble.

The hard part of denormalization is keeping data up to date. In our author example, we would need to figure out a way to make sure the avatar embedded in the post document is always correct, and that’s probably easier said than done.

Denormalization can still be a good option for data that doesn’t need to change much, or a scenario where we don’t really care if we see a user’s old avatar.

3. Non-Reactive Joins

Sometimes it’s actually acceptable for our joins to remain non-reactive (i.e. our “naive approach”). In this case, maybe our user interface can handle some posts simply not having authors associated with them.

This might not work in most cases, but as long as you’re conscious of the drawbacks it’s still an option to consider.

4. Joining On The Client

We said earlier that we can’t use “magic” reactivity on the server… but we can on the client!

So this gives us another way around this problem: rather than try to maintain the correct userIds list on the server, let’s do it on the client where userIds can be reactively recalculated as needed!

Doing this will first require setting up two distinct publications on the server:

Meteor.publish('topPosts', function() {
  return Posts.find({}, {sort: {score: -1}, limit: 50});
});

Meteor.publish('authors': function(userIds) {
  return Meteor.users.find({_id: {$in: userIds}});
});
The publications code (on the server)

We will then handle the job of creating the postIds array on the client. Here’s what that would look like using Iron Router to handle our subscriptions:

Router.map(function() {
  this.route('topPosts', {
    waitOn: function() {
      // tell the router to wait until topPosts's data is available to load the route
      return Meteor.subscribe('topPosts');
    },
    data: function() {
      // return all posts currently available on the client as the route's data context
      return Posts.find();
    },
    before: function() {
      // let's make sure that the topPosts subscription is ready and the posts are loaded
      if (this.data()) {
        // we can then extract the userIds of the authors
        var userIds = this.data().map(function(p) { return p.userId });
        // and add the authors subscription to the route's waiting list as well
        this.subscribe('authors', userIds).wait();
      }
    }
  });
});
The router code (on the client)

We’re making use of Iron Router’s nifty waitOn feature to tell our app to wait until a certain dataset has been loaded before loading a route’s templates.

So the whole process looks something like this:

  • The user triggers the topPosts route.
  • The client subscribes to the topPosts subscription.
  • The client executes the before function but nothing happens because this.data() is still empty.
  • The server returns a list of posts and the topPosts subscription is now loaded.
  • As is the case whenever a waited on subscription becomes available, the router reruns all filters.
  • This time, this.data() returns something, so the client subscribes to (and waits on) the authors subscription.
  • The server returns the authors subscription.
  • With all the necessary data loaded, the router can now move on to loading the appropriate template.

The problem is the double latency, in other words the fact that we’re subscribing and returning data twice. Why make two round trips to the server when it could very well know (based on the posts it’s publishing) which authors need to be published?

So although this solution will guarantee the correct results and behavior and will work perfectly fine in many situations, it’s not the most efficient technique.

You can also check out a different take on this approach in this Evented Mind video.

5. Reactive Publish

So what if we want to avoid this extra round trip and join on the server? Well, this is where it gets (even more) complicated. Recently, two packages have appeared to tackle just this problem.

Reactive Publish enables reactivity within publish functions by observing any cursors created within a publication, and making the entire publication block re-run whenever they change in any way.

So in our case, any change to topPostCursor would also trigger the reevaluation of the whole topPostsWithAuthors block. This can seem like the ideal solution, yet it turns out there are some good reasons why Meteor isn’t reactive on the server.

On the client, Minimongo queries are essentially free since they all happen within a browser’s memory. So there’s no big downside in re-running code reactively in a fairly loose fashion. But on the server, queries against collections actually hit the database, and we need to be more controlled about when they run.

So while Reactive Publish’s approach works well, it also comes with its costs in terms of extra database access. So be careful that your app’s performance doesn’t end up paying the price.

6. Publish With Relations

Another way to achieve properly reactive publications is to take a page out of livedata’s book: set up your own observeChanges() callbacks, and use them to pipe data over the client-server connection.

We outlined this approach in the Advanced Publications chapter of Discover Meteor, and Vitaly Sorokin turned it into the Publish With Relations package.

Although Publish With Relations uses a slightly more cumbersome syntax to outline the relations between collections, it does use the minimum number of MongoDB queries to publish all the needed documents.

Conclusion

The various techniques explained here all have their pros and cons, but a word of caution: publications are so integral to the performance and scaling of your application that it’s definitely something you don’t want to get wrong.

While the packages we mentioned both work as advertised, we strongly feel that something this crucial should ideally be written by the core Meteor team and become part of the Meteor core.

So let’s hope that this post will soon be obsolete and Meteor will support one of the above approaches natively. Until then, hopefully we’ve helped you understand the issues involved and enabled you to make the best decision for your own application.