Monday, 23 September 2013

Field Complete and Profile2

So I thought I'd promote the latest dev version of my Field Complete module to a release candidate yesterday. There were two reasons for this: (a) it's pretty solid, and (b) people are more likely to use it if it's out of dev.

It wasn't a mistake. Within a couple of hours I had a new bug report: Doesn't work with Profile2.

Which surprised me a lot, why on earth would it not work with a popular entity-based module? My code is very standard and does nothing naughty.

The answer, of course, is that Profile2 is naughty. Or at least it's non-standard. I'm not going to go into detail but they bypassed standard edit-form processing and do odd awkward things with the display URL. So my form interceptions don't get a chance to work and the completeness progress bar (and the Incomplete fields block) can't work.

What to do?

On the one hand I don't see why I should hack my code to work with something non-standard. On the other, Profile2 is a very popular module and using Field Completeness with it is a natural thing to do. And I want people to use my module.

So I spent a couple of hours figuring out the nasty that Profile2 does and modifying my code to cope with it. It wasn't too horrible, but I can't say I enjoyed it as a process. My nice clean code now has hacks in it.

Oh well.

Friday, 9 August 2013

The History of the Field Complete module

A couple of months ago I was contracted to work on a project for a large UK organisation that provides accreditation for university, and other, further education courses of a certain type. The site had already been mostly designed and built so I had no say in its overall structure. Suffice to say: I wouldn't have done it like that.

Still, we play the hand we're dealt.

The nature of the project meant that the applicant had to fill in absolutely massive forms and provide huge amounts of evidence to show how their course delivered to the standards required for accreditation. But the form could not be submitted for review until it was complete. But the forms are so huge that nobody is going to be able to fill them in in one sitting. In some cases it's expected it would take weeks.

The original developers had decided (sensibly) to use the Content Complete module, which is a watered-down version of making a field "required" - you specify which fields need to be complete and then you can check every time the form is saved. Which was all well and good except for one tiny thing: these forms used Field Collections and Content Complete cannot cope with Field Collections.

The first task on my list was: get Content Complete working with the Field Collections.

I laughed til I stopped.

I did have a look but Content Complete is structurally incapable of handling Field Collections without a major rewrite. Anyway it was worse than that because I had to create new forms which required linking to other entities (via Entity Reference) and they needed to be checked for being complete as well.

Now there was a discussion in the Content Complete issue queue about developing a new version called Entity Complete. However the ideas were overly complex, wanted to carry on using the same interface (which I don't like), no actual work had been done, and the discussion had dried up months ago. Clearly it wasn't happening.

So I had a choice, somehow fix Content Complete, or start again and write the Entity Complete module from scratch myself. So that's what I did. Essentially it intercepts just two hooks to achieve the required result but there are plugins for specialised field types. And lots of frills.

You can find the module itself here. You'll notice it's called Field Complete instead of Entity Complete because someone had already grabbed the 'ec' short title.

And there is lots of lovely documentation with many pictures here.

Enjoy.

Thursday, 11 July 2013

Export UI, Features and Taxonomy

Here's a quicky. Let's say you've created some exportable content (using CTools) which references a term ID and you have to go from your dev site to the production site. And your taxonomy is also being exported using UUID.

Somehow you have to tie them together because sure as eggs is eggs the term IDs created on the production site are not going to be the same as the local ones. That's why you used UUID in the first place. Right?

Here's what you do: in your local exportable you will have a column for the term ID so include a column for the term UUID as well.

In your add/edit form for the exportable you'll have to include some code to automatically add the selected term's UUID - I did it in the form validation. Basically you read the selected term ID, load the selected term which will have the UUID in it (because it's added to the base table). Set the term UUID value in $form_state['values']. Assuming you're using CTools Export UI the UUID will be saved automatically.

Also, in the module install, add "no export" => TRUE to the TID field, so that Export UI does not include it in the feature. (This code works for Features, it doesn't work for single imports, I'll leave that as an exercise for the reader - hint: you can specify an "import callback".)

That's the easy bit, when you export your content it will be saved with the term's UUID and not the term's ID. The difficult bit is how to link the UUID of exported content when Features loads it into the new site.

Except it's not hard at all. In your export specification, in the schema, you have the "default hook", well CTools Export UI very kindly calls an drupal_alter() on the default items after it's loaded them. So we can do this:

/**
 * Implements hook_DEFAULT_HOOK_alter().
 *
 * This intercepts any defaults picked up from code and converts
 * their UUID category into the local TID (which might be different
 * on every site).
 *
 */
function mymodule_my_default_hook_alter(&$items) {
  $uuids = db_select('taxonomy_term_data', 't')
      ->fields('t', array('uuid', 'tid'))
      ->execute()->fetchAllKeyed();

  foreach ($items as $item) {
    if (empty($item->tid) && !empty($uuids[$item->uuid])) {
      $item->tid= $uuids[$item->uuid];
    }
  }
}

The database call creates an array which maps all UUIDs to TIDs in one go. If your site uses a lot of taxonomy terms - perhaps you have user tagging - you might want to restrict this call to a specific vocabulary.

The exact item property names will depend on what you set up in your schema.

Sorted.

Friday, 5 July 2013

Entity Reference Views Widget plus Organic Groups Nightmare

Let's suppose you are using the Entity Reference Views Widget to display items to be included in an Entity Reference field. This is actually quite a cool module while being a little awkward to use - essentially it gives you a views listing of candidate entities to add, with an AJAX-driven checkbox: click the box and the entity gets moved to the list on the left to display what's been chosen.

Which is all fine.

The project I'm currently working on uses Organic Groups to group certain users within an organisation allowing them to work on very specific types of node content. I had to create a completely new type of entity (though that's not important) for them so they could add one or more of these entities to one or more of their special content.

The new entity was made subject to the OG, and that too was fine.

So then I came to build the Entity Reference Views Widget to only display the new entities that belonged to the current user's OG.

Meltdown. Either I listed everything, or nothing. Filtering OGs is not the easiest thing in the world: You have to create an OG relationship for the entity in question (easy) and then add an argument which if it has no value (the desired state) uses the OG Context module to figure out what OGs are available.

Weird fact number #87654: The OG context module allows you to base OG on the current node and on a user currently being viewed or edited, but not on the current user. So I had to build a quick context for that:

/**
 * Implements hook_og_context_negotiation_info().
 */
function mymodule_og_context_negotiation_info() {
  return array(
    'user' => array(
      'name' => t('User'),
      'description' => t("Determine context by finding the current user's OG (if any)."),
      'callback' => 'mymodule_context_handler_user',
    ),
  );
}

/**
 * Implements hook_og_context_negotiation_info_alter().
 */
function mymodule_og_context_negotiation_info_alter(&$contexts) {
  $context['node']['menu path'][] = 'node/%/edit';
}

function mymodule_context_handler_user() {
  global $user;
  $account = clone $user;
  $contexts = _group_context_handler_entity('user', $account);
  return $contexts;
}

In fact this does two things: it expands the context checking for nodes to include nodes being edited and adds a context that looks at the current user.

Okay. Next factor: Entity Reference Views Widget has this neat facility for feeding the entity IDs of entities already selected back into the view and excluding them. This is great and it also uses an argument, which needs to be the first argument.

However, and this is the nastiness, if the ERVW argument does not exist (i.e. no entities have yet to be selected for the field) the second argument fails to fire and you see all the entities without any OG filtering.

The solution is thankfully quite simple: Edit the ERVW argument so that it has a default value of "all", this means it always exists and the second argument does fire and figure out the correct OG context puts it into the query and filtering actually works.

Obviously I went through various stages of thinking each handler was broken or that there was something weird about my newly created entity. But none of those things were true. The problem was simply one of configuration.

Hope that helps.

Saturday, 13 April 2013

Feed Sources and Self Nodes and Bears (oh my)

Seems it's been nearly a year since I last posted something. I can only say that I have been tied up with Drupal 6 projects and uninspiring Drupal 7 ones. However something interesting came up with a project of my own over the last day or so. So here we are:

The project in question involves creating a meta-search site on a specific subject with data gathered from various sites. Some of those sites have RSS feeds, some do not.

The Feeds module is fairly awesome if you need to get structured data from somewhere and turn it into nodes, or other entities. If you're reading, say from an RSS feed, it works out of the box and I must admit I was very impressed with what it can do. There's even a feeds crawler module that lets you scrape data.

I appreciate that may make some people feel awkward but this site does nothing except fetch information, allow someone to search and then direct the viewer to the original site. No more than Google or any other search engine.

However for this project there was an issue: even the RSS does not contain all the required information to fill the nodes created. To do that properly we have to go to the original page and scrape the relevant sections - for example, to get the full description and the tags that have been used.

There is another module Feeds Self Node Processor which allows a node to become its own feed, which means that you create an importer for the initial information and create the nodes. Then each of these new nodes can fetch its own specific information from the target URL.

I'm leaving out a lot of detail here but I hope this is enough to be understood.

Fetches can be scheduled to be performed during cron jobs or switched off completely so that's all fine. Except for one little thing:

There is no option to do a once-only fetch. No imagine you've imported 10,000 data items, the feeds_source table is now filled with 10,000 self-node rows. If you set the frequency of fetch to the maximum (4 weeks) the first self-node fetch won't happen for 4 weeks. But worse: if you set it to "as often as possible" the cron starts to cycle through the existing 10,000 nodes but (as far as I can tell) doesn't do it right and keeps repeating the same content. And new content gets ignored.

Oh dear. The initial joy of getting the feeds process functioning for three different sites was slowly eroded by the realisation that this was just not going to work.

I spent two days playing with various options. The Feeds module provides a wide array of hooks and lots of potential for customisation however none of them helped.

Of course I would not be writing this unless I had found a solution.

It became clear what was needed was to delete the relevant row from the feeds_source table. There's a nice class wrapper and method to achieve this which ensures all relevant information is also deleted (like the entry in the job_scheduler table).

And there's a "post import" hook. Theoretically just performing $source->delete() should do the job, unfortunately doing that in the "post import" hook doesn't work, the entries are simply recreated because the data is saved after the hook is run. And you can't touch the "last imported" time stamp to set it into the distant future. Simply deleting the relevant job_scheduler row is equally futile.

What was needed was a method of deleting the feeds_source row at some point after everything else had been done. The next page load was an attractive choice initially - I even toyed with the thought using $_SESSION but only for a few seconds.

The answer is that under-used hardly-mentioned-anywhere feature of Drupal 7: Queues. And here it is:


/**
 * Implements hook_feeds_after_import().
 *
 * @param $source
 *  FeedsSource object that describes the source that has been imported.
 */
function YOURMODULE_feeds_after_import(FeedsSource $source) {
  if (/* identify as a feeds source to be killed */) {
    // Get the queue (create if not existent)
    $queue = DrupalQueue::get('killJobQueue');

    // Build the required job data
    $job = array(
      'type' => $source->id,
      'id' => $source->feed_nid,
    );

    // And put it in the queue
    $queue->createItem($job);
  }
}

/**
* Implements hook_cron_queue_info().
*/
function YOURMODULE_cron_queue_info() {
  return array(
    'killJobQueue' => array(
      'worker callback' => '_YOURMODULE_kill_source',
      'time' => 5,
    ),
  );
}

function _YOURMODULE_kill_source($job) {
  feeds_source($job['type'], $job['id'])->delete();
}

We intercept the hook after the import and determine whether this is a feeds source we want to kill. I did this by using a naming convention, all my self-node importers are named "reprocess_[something]". We build the job data and add it to the queue.

It's assumed that many queue-using applications will want to process the queue during hook_cron() so the Queue API provides the functionality for you - apart from the bit that does the actual work.

Now it doesn't matter whether my cron function runs before or after the feeds cron, because somewhere between crons the feeds source rows will be correctly deleted. And it works. Apart from anything else it prevents the feeds_source and job_scheduler tables from getting clogged up with useless data.