thingelstad

Jamie Thingelstad's personal website

Tag: MediaWiki

Updated Dynamic Questy Captchas

A little over a year ago I shared a method of generating dynamic Questy Captchas for the MediaWiki ConfirmEdit extension. This method has been awesome for stopping registration spam on the thingelstad.com wiki farm and many other wiki admins have used it with success. Unfortunately it was more useful in it’s novelty than in it’s difficult to solve, and eventually some spammers wrote the logic to solve it and the registration spam started flooding in.

I decided to put a new method in place that is based on the same question. The previous question generated 8 characters and asked the user to provide one of them based on a random index. I’ve now changed this to generating a number between 100,000,000 and 999,999,999, turning that into spelled out words and then asking to identify one digit. It looks like this:

What is the sixth digit of the number nine hundred fifty-one million eight hundred ninety-eight thousand four hundred twenty-seven?

That turns out to be a somewhat hard question for a human too. I find I typically have to type out the number as I read it. The benefit of this is the solution isn’t in the text of the page. And while I’m sure there are great libraries for turning written numbers back to digits, it’s not immediately obvious.

Continue reading

MediaWiki LocalSettings for Farmers

I’ve been running a MediaWiki farm at thingelstad.com for a couple of years now hosting about a dozen wikis ranging from small to very large. Running a MediaWiki farm is a bit complicated and you can approach it a number of different ways. I recently pushed the settings that I use to run my farm into GitHub so others can see how I do it. The next step will be to also move the scripts that I use up, but those will be kept in another repository.

Hopefully this proves useful to others. It’s useful for me to finally have these very complicated settings (really code!) under version control.

Wanted: Dash Docset for MediaWiki

I really wish there were a Dash Docset for MediaWiki. The directions for creating docsets are pretty straightforward. It also strikes me that using pandoc to do wikitext to html you could convert existing help pages into HTML and make the Docset.

Maybe someone else will read this and think it is a great idea and do it.

Keybase and MediaWiki

I’m really intrigued by what Keybase.io is doing with identity. The ability to cryptographically prove your identity on the web without a centralized party like Twitter or Facebook owning the approval process is a needed function. I setup my profile and you can now prove on your own that I am who I say I am on Twitter, Github and five of my domains.

I’m trying to figure out how this could be extended to MediaWiki. I would love to be able to prove that my user account is me at:

This seems really hard. The method used on Github is to public a Gist, and you could certainly have a wiki user publish something on their user page, but that can also be edited by anybody.

But if we could do this, it would be a great way to allow Wikipedia editors to claim ownership with proof of their identities (if they wish) and would benefit thousands of self-hosted MediaWiki websites.

I think something more like the Twitter proof could work. How?

  1. Have the user in question edit their User page. The contents of the edit don’t matter, the Summary field is what will be looked at. (Limited to 255 characters)
  2. MediaWiki websites have a permanent revision history attached to each page. (This would be like a Tweet, see my change on May 11 2014)
  3. This information is accessible via the MediaWiki API, and there is a revid attached to each revision.
  4. This revision ID can be used to pull the proof forward for keybase.io.

Here are the last 5 revisions for my User page on WikiApiary (API call).

{
  "query": {
    "pages": {
      "43": {
        "ns": 2,
        "pageid": 43,
        "revisions": [
          {
            "comment": "This is my message for keybase.io to prove I am who I am!",
            "parentid": 884728,
            "revid": 904406,
            "timestamp": "2014-05-11T13:10:17Z",
            "user": "Thingles"
          },
          {
            "comment": "remove gittip button",
            "parentid": 569866,
            "revid": 884728,
            "timestamp": "2014-04-30T02:55:26Z",
            "user": "Thingles"
          },
          {
            "comment": "",
            "parentid": 569864,
            "revid": 569866,
            "timestamp": "2014-02-21T01:56:28Z",
            "user": "Thingles"
          },
          {
            "comment": "added badge",
            "parentid": 552628,
            "revid": 569864,
            "timestamp": "2014-02-21T01:54:59Z",
            "user": "Thingles"
          },
          {
            "comment": "add babel box (needs templates and styles)",
            "parentid": 528928,
            "revid": 552628,
            "timestamp": "2014-02-15T19:36:51Z",
            "user": "Thingles"
          }
        ],
        "title": "User:Thingles"
      }
    }
  },
  "query-continue": {
    "revisions": {
      "rvcontinue": 528928
    }
  }
}

Now that we have the revid we can find the information for this revision to continue to claim proof (API call). The key is that the title of the page “User:Thingles” matches with the user that made the change “Thingles”.

{
  "query": {
    "pages": {
      "43": {
        "ns": 2,
        "pageid": 43,
        "revisions": [
          {
            "comment": "This is my message for keybase.io to prove I am who I am!",
            "parentid": 884728,
            "revid": 904406,
            "timestamp": "2014-05-11T13:10:17Z",
            "user": "Thingles"
          }
        ],
        "title": "User:Thingles"
      }
    }
  },
  "query-continue": {
    "revisions": {
      "rvcontinue": 884728
    }
  }
}

And now I’ve proven my identity. The only trick is that the comment must contain all the data, and that should be easy since keybase already does something similar for Twitter.

And here is another example of me making a proven comment on my user page on Wikipedia (en) (API call).

{
  "query": {
    "pages": {
      "14604697": {
        "ns": 2,
        "pageid": 14604697,
        "revisions": [
          {
            "comment": "Here is my message for Keybase from Wikipedia.",
            "parentid": 551074105,
            "revid": 608054849,
            "timestamp": "2014-05-11T13:56:25Z",
            "user": "Thingles"
          }
        ],
        "title": "User:Thingles"
      }
    }
  },
  "query-continue": {
    "revisions": {
      "rvcontinue": 551074105
    }
  }
}

Stopping MediaWiki Spam with Dynamic Questy Captchas

This method of using Questy Captcha has been defeated by some spammers. Please check out my updated dynamic Questy Captcha method.

MediaWiki websites are often plagued by spammers. It’s annoying in the extreme. If you setup a blank MediaWiki website and do nothing it is likely that within a couple of weeks your site will be found, and in a matter of days you will have thousands of spam user accounts and tens of thousands of pages of spam. There are a number of ways to stop wikispam. I tried using Recaptcha to little benefit. I still got a large number of spam registrations on my publicly available wikis. I’ve found the combination below to be incredible efficient.

Continue reading

Display MediaWiki job queue size inside your wiki

I wanted to find an easy way to show the current size of the MediaWiki job queue on WikiApiary. When you make changes to templates that are used on thousands of pages the queue can get backed up and it’s nice to have an easy way to keep an eye on this. The job queue is even one of the data points WikiApiary tracks and graphs. But I wanted something that was as close to realtime as possible. It wasn’t hard to do. This solution uses the External Data (see WikiApiary usage page) extension.

First we need to get the data. Let’s start by calling the siteinfo API method. The magic words are used to make this generic, and this should result in the right URL. If you are using a protocol-relative server setting you will have to modify this.

{{#get_web_data: url={{SERVER}}{{SCRIPTPATH}}/api.php?action=query&meta=siteinfo&siprop=statistics&format=json
  | format=JSON
  | data=jobs=jobs}}

Now External Data has done the work and the value is stored for us. Now we get it by simply calling:

{{#external_value:jobs}}

I like to put that in a style so it’s big and obvious.

If you’re thinking ahead you’ll now be saying “Yeah, that’s neat, but it will be cached in MediaWiki for a hours!”. Yes, it will be unless you add a __NOCACHE__ directive to the page and use the MagicNoCache extension. This extension allows you to disable the MediaWiki cache on a page-by-page basis, very handy.

If you wanted to use this in multiple places you could even put an <onlyinclude> around it and transclude the job queue size in other pages, however I would be cautious about that if using the __NOCACHE__ directive as well.

MediaWiki Template Filter Title

I was recently doing some cleaning on our Read/Write Book Club website and ran into an interesting challenge. All of the books in the wiki are in a couple of categories, but I wanted them sorted right ignoring A, An and The beginning of the title. MediaWiki supports this in the category tag allowing you to specify [Category:Book|Sort Title] and early on in the wiki I had a second field in the form for Sort Title asking the person editing the book to do this.

The result was nobody did it and all the books with “The” in the beginning of the title were all under T. Shouldn’t this be easy to just deal with in the wiki itself?

Well, it turned out to be much harder than you would think in large part because MediaWiki doesn’t honor spaces in template tags. My first attempt to do this was rather brute force, simply look for the three cases that I want to get rid of in the title and chop it off.

<includeonly>
{{#if:{{{1|}}} | {{#vardefine:title_filter_temp|{{{1}}} }}
{{#if: {{#pos:{{#var:title_filter_temp}}|The }} | {{#ifexpr: {{#pos:{{#var:title_filter_temp}}|The }} = 0 | {{#vardefine:title_filter_temp| {{#sub:{{#var:title_filter_temp}}|4}} }} }} }}
{{#if: {{#pos:{{#var:title_filter_temp}}|A }} | {{#ifexpr: {{#pos:{{#var:title_filter_temp}}|A }} = 0 | {{#vardefine:title_filter_temp| {{#sub:{{#var:title_filter_temp}}|2}} }} }} }}
{{#if: {{#pos:{{#var:title_filter_temp}}|An }} | {{#ifexpr: {{#pos:{{#var:title_filter_temp}}|An }} = 0 | {{#vardefine:title_filter_temp| {{#sub:{{#var:title_filter_temp}}|3}} }} }} }}
{{#var:title_filter_temp}}
| No parameter passed to [[Template:Filter title]]. }}</includeonly>

This worked in many cases, but not all. A book like Antifragile got in trouble with this approach since the “An” matched it got sorted in “T”. You would think this would be an easy fix right? Don’t look for “An” but instead for “An “, including the space in the match. Unfortunately it is nearly impossible to pass a space into a MediaWiki template. MediaWiki effectively trims all template inputs of spaces so a space by itself becomes, effectively, null. A different approach was needed.

After some consideration I came up with this approach that uses the Arrays extension. I like it a lot more than the first attempt! The basic idea is to break the title into an array of strings on the space (note that #arraydefine allowed me to use a regex pattern to avoid the problem of not being able to pass in a space). I then check if the first element in that array matches a set of targets (in the #switch statement). If it does, set the index to 1, otherwise 0, and build a new array from that index offset. Like this:

<includeonly>{{
#arraydefine:filter_title_temp|{{{1|No title was provided}}}|/\s/}}{{
#switch: {{#arrayindex:filter_title_temp|0}}
 | A | An | The = {{#vardefine:filter_title_i|1}}
 | #default = {{#vardefine:filter_title_i|0}}
}}{{
#arrayslice: filter_title_new | filter_title_temp | {{#var:filter_title_i}} }}{{
#arrayprint: filter_title_new | _ | @@@@ | @@@@ }}{{
#arrayreset:filter_title_temp|filter_title_new}}</includeonly>

This works great with one exception. I still get confounded with the space problem when assembling the new title in the #arrayprint method. I decided to print the new title with underscores where the spaces were. Since this is used for the sorting condition, this is fine. The end user never sees the title and the wiki will sort right if given Title_of_the_Book.

Now the sortable titles are all generated and the Book Category page looks awesome.

Bookmarking with Semantic MediaWiki

I have been doing a lot of exploration using MediaWiki and the Semantic Mediawiki suite of extensions. I’ve deployed a number of wikis doing a wide variety of things. For a few months I had been pondering the idea of hosting my own bookmarking site using Semantic MediaWiki. I decided to give it a try and put together links_thing.

First a quick primer. Semantic MediaWiki is an extension that lets you store and query data in wiki pages. Wikis have been awesome at dealing with documents and text for a long time. But if you wanted to put a table of data in a wiki that didn’t work very well. And if you wanted to query that table of data? Well, that was just crazy. Semantic MediaWiki gives you the ability to associate properties with pages and then query them. So, for my bookmarks each wiki page in the category bookmark is a bookmark and has a number of properties, including things like Has URL, Created at, Has excerpt. You get the idea. You put all this logic into the Templates that the wiki uses, making them into Semantic Templates and even the data entry can be made user friendly using Semantic Forms to create fancy forms with a variety of standard controls.

Making it

Building this wiki was pretty easy. I mostly just thought for a while about the properties that a bookmark has. I wanted to get it right since I could use the scaffolding that Semantic MediaWiki has to create a “class” and template out all the basic stuff. It’s easy enough to add after the fact too. After making the class for Bookmarks there was only one real thing I needed to prove. I had to be able to have a Bookmarklet that would automatically populate the URL, Title and Excerpt for a bookmark. Of course the timestamp needed to be done to but I knew how to do that.

After some digging I figured out how to pass parameters into the Forms to pre-populate fields and also how to tell Semantic Forms to name the page based on a field in the form (namely, the TItle of the bookmark). After proving that out I was ready to go.

Importing

I wasn’t willing to lose any data, and I knew it was just a matter of shoveling. I used Pinboard’s JSON export and then whipped up a little Python program to turn that into a CSV that could be imported using the Data Transfer extension. I easily imported just under 4,000 links and had all my data there.

Fun Stuff

I’ve been using my new Bookmark wiki exclusively for the last few weeks and I’m absolutely loving it. Here are just some of the reasons why:

  1. It is mine. Put simply, I don’t need to rely on anyone else to keep it working of me. For an archive, this feels reassuring.
  2. This seems simple, but it’s so helpful to be able to do regular expression driven find and replace through all my bookmarks. I’ve probably done 50 of these cleaning things up. For example, I didn’t like that a lot of bookmarks had a title that ended in ” – Home” or ” – My Super Cool Blog”. A quick search and replace and they are gone.
  3. I thought it would be interesting to see my bookmarks on a calendar. Seems like a simple thing but I don’t think any bookmarking service does it. So I made a calendar view.
  4. Wouldn’t it be nice to be able to see YouTube and Vimeo videos I bookmark without having to go to the video pages one at a time? I made a video view.
  5. I really want my bookmarking tool to have URL Checking. I hate short URL’s because I suspect they will go away. I also don’t like analytics tags being bugged into my URLs. I have a Check URL Template that checks for these in my wiki, and a bot that cleans them up.
  6. I thought it would be cool to see statistics on my bookmarks so I created a Bookmark Statistics view.

This is just the beginning. I’m sure I’ll be adding a lot of other tweaks over time.

What’s next?

I’ve now building a little Python application called LinkBot. LinkBot runs on a schedule and validates URL’s for me. I’ll write up about that application separately.

I would love to share this suite of templates and properties with anyone else. It’s easy to export the pages off of my wiki and import them into your own. If you are interested in doing I have cloning information and feel free to comment here and we’ll connect.

MediaWiki Template Get Hostname

I was working on a template for one my personal wikis and needed to get the hostname for a given URL. Using the capabilities of the Parser Functions extension for MediaWiki I whipped up this template. I figured others may find this useful so here it is. The first version has a bunch of spaces and newlines added to make it more readable.

{{#vardefine: hoststart | {{#expr: {{#pos: {{{1|}}} | // }} + 2 }} }}
{{#vardefine: hostend | {{#pos: {{{1|}}} | / | {{#expr: {{#pos: {{{1|}}} | // }} + 2 }} }} }}
{{#vardefine: hostlen | {{#expr: {{#var: hostend }} - {{#var: hoststart }} }} }}
{{#sub: {{{1|}}} | {{#var: hoststart}} | {{#var: hostlen}} }}

To put it in your own MediaWiki, copy this version that removes the spaces and newlines.

To use this template put it on a page like Template:Get hostname and then call it in your pages as

{{Get hostname|http://thingelstad.com/another-reason-you-need-to-use-a-password-manager/}}

which will return thingelstad.com. You can also find this template on MediaWiki Cookbook.

WeSolver

I’ve been having a lot of fun working with MediaWiki and particularly the Semantic MediaWiki extensions. A few months ago a friend from when I worked at Dow Jones, Armistral, asked me for some input on how he could build this website he was working on. He wanted to create a site where people could work together to solve problems. Thus WeSolver was born. I strongly recommended that he use MediaWiki and he ran with it. The site is now live and he’s done a nice job setting it all up. Check it out and if your so inspired see what you can do to help with a solution!

© 2014 thingelstad

Theme by Anders NorenUp ↑