Logic

Logic, language and code

Today, I was doing a bit of hacking on this site's integration with Identica, which is written in PHP. Now, it's possible from this post that I'll reveal myself as either a lazy programmer or an ignorant computer scientist! But I'll continue nevertheless.

The task was this: I have a string, the post, which I want to categorise using Drupal's taxonomy system. In order to do this, I need a string of comma separated tags for each post. For the purposes of this, I'm defining a tag as either a hashtag (e.g. #topic), or a group (e.g. !topic). For example, the post:

I'm watching the #snooker on my computer using !linux.

should yield

snooker,linux

Make sense?

How I approached this is firstly to convert the string to an array of words as follows:

$words = preg_split('/ |,|\.|\?|\!|\)|\:|\;/', $post);

Then the task is to check which of these words is a tag, so that the entry can be categorised in Drupal's taxonomy. So, we iterate through the words, checking if the '#' or '!' is present. If so, then it's added to the taxonomy string.

$tagstring = "";

foreach ($words as $t) {

   // If this word contains a # or ! (but doesn't end in it).

   if (preg_match('/#(\w*)|\!(\w*)/', $t)) {

     // Then remove the # or ! and add a , at the end

     $match = substr($t, 1).",";

     // And add it to the current string of tags

     $tagstring = $tagstring.$match;

   }

}

Okay, so that's easy. But then let's consider a post like this:

I'm watching TV on !Linux. (#snooker is great)

The above would give us:

!Linux,(#snooker

which is obviously not what we want.

So, I used a 'while' loop to check that the '#' or '!' was the first character in the string before removing it and adding it to the tagstring. The 'while' loop, inserted straight after the 'if' statement should keep removing the first character in the string until it is either a '#' or a '!'. Still with me?

This is when the logic part of this post arrives. Now, when coding up things like this, I tend to rely on intuition quite a bit, and maybe it's due to a lack of recent practice (too much time spent with mathematics rather than code), but I wrote this:

// This loop makes sure that the # or ! is at the start of the tag, for example if it was in brackets, e.g. (#tag)

while ((substr($t, 0, 1) != "#") || (substr($t, 0, 1) != "!")):

   $t = substr($t, 1);

endwhile;

Look good? Well it's wrong - my intuition failed me, and this is where taking note of the precision of the English which should be used is important. In English, I'd say "keep looping while the first character is not a '#' or a '!'. Or, "keep looping while X is neither A nor B". As a fast coder, I used this intuition to come up with what's above and continued coding - until I ran it and it didn't work.

Revising this later, of course I realised that I needed an AND operator rather than an OR operator, as in this:

// This loop makes sure that the # or ! is at the start of the tag, for example if it was in brackets, e.g. (#tag)

while ((substr($t, 0, 1) != "#") && (substr($t, 0, 1) != "!")):

   $t = substr($t, 1);

endwhile;

This is basically saying "keep looping while the first character is not a '#' and the first character is not a '!'", or "keep looping while X is not A and X is not B". Clearly, the first character cannot be a '#' and a '!' at the same time, but English, or my everyday English at least, tends to be a little sloppy when it comes to this kind of precision! The tricky thing to get your head around here is that we're operating on negated values. It's perhaps easier to get to grips with using this 'if' statement, which is, as was my original 'while' loop, wrong:

if ((substr($t, 0, 1) != "#") || (substr($t, 0, 1) != "!"))

   // do something...

In fact, it is precisely NOT the above. We might say that the 'if' statement here is:

(!a || !b)

and we want the NOT of this, i.e.

!(!a || !b)

which I could just use directly. But, this is a little overcomplicated. Anyway, is this not just the equivalent of what I wrote in the original 'while' loop anyway? Again, bear in mind the effect that the negation has.

The question basically comes down to whether this is true:

!(!a || !b)) -> (a || b)

The first case is what the NOT of the 'if' statement would be, as we discussed. And doesn't it seem to make sense that those NOTs could be cancelled out, leaving us with an OR in the while?

Well, not if the working code above is to believed. At this point, I was trying hard to remember my predicate logic from my undergraduate days, and asked for a bit of clarification. Luckily, Simon Stanford and Matt Smart popped up on Twitter. On his blog here, Simon provided me with this truth table:

a

b

a || b

!a

!b

(!a || !b)

!(!a || !b)

FALSE

FALSE

FALSE

TRUE

TRUE

TRUE

FALSE

FALSE

TRUE

TRUE

TRUE

FALSE

TRUE

FALSE

TRUE

FALSE

TRUE

FALSE

TRUE

TRUE

FALSE

TRUE

TRUE

TRUE

FALSE

FALSE

FALSE

TRUE

As Simon points out, the colums a || b and !(!a || !b) are not equivalent, and the above implication does not hold. This is why the OR operator in the 'while' statement does not work. In fact, Matt pointed out that:

(a || b -> !(!a || !b)) -> (a || b -> a && b)

which is clearly crazy. But, this does provide us with the answer. Remember, I found that using the AND operator in the 'while' loop appeared to work. Let's consider now the following proposal instead:

!(!a || !b)) -> (a && b)

The truth table for this is:

a

b

a && b

!a

!b

(!a || !b)

!(!a || !b)

FALSE

FALSE

FALSE

TRUE

TRUE

TRUE

FALSE

FALSE

TRUE

FALSE

TRUE

FALSE

TRUE

FALSE

TRUE

FALSE

FALSE

FALSE

TRUE

TRUE

FALSE

TRUE

TRUE

TRUE

FALSE

FALSE

FALSE

TRUE

in which a && b and !(!a || !b) are clearly the same, which agrees with what Matt said.

Phew, so that's why my 'while' loop now works, and why the intuition gained from using sloppy English often doesn't provide you with the code you expected! I should point out that I am only learning PHP right now, and I'm sure that there's a better way of achieving all this (leave a comment below, if you have a suggestion). I've also managed to get most of the way through my PhD in Computer Science without touching first order logic, so I'm (obviously) a little rusty at this too. But, I'd just thought I'd write this up to get it straight, and potentially help anyone else trying to get their head around this for the first time.

Finally, thanks to Matt and Simon for helping me get to grips with my code!

Comments

Hi Pete, Nice article. I've

Hi Pete,

Nice article. I've fallen for the same trap more than once. It's just the way humans interpret things "if it's not this or that then do the other" sounds like just what you want and tell a human that and you'll prolly get the result you want but a to a computer you need to say " if it is not this and it is not that then do the other".

Simon x

Interesting stuff, Pete! I

Interesting stuff, Pete! I had no idea you were using our logic skills to a non-research end ;) You should link to our twitter pages!
Happy to help anyway...

p.s.: liking the new blog layout. Colours _much_ better!

An interesting post. I will

An interesting post. I will admit that I have fallen into exactly the same trap of incorrectly using an "or" from the English phrase in my head in the code.

As far as De Morgan's Laws go, it seemed easiest to convince myself of the truth of them once, and then memorise them. In order to think about this stuff I found Johnston diagrams helpful (actually I thought I was using Venn diagrams, but I just found that there is a name for the specialization). Somehow I find it easier to internalise from that visualization than from truth tables.

I hope that this can be

I hope that this can be considered relevant, but please zap it if you think it is not!

A logic problem I had a while ago comes from the CAM system I work on. Our end result is a series of points written into a file that describes the motion of a cutting tool. Our users program the path assuming a certain nominal size of cutter, but the real cutter they find they have to hand can be marginally undersize (due to wear on the cutter) or more rarely, oversize. This is traditionally dealt with using a crude mechanism that the machine tools have for shifting the path left, or right. Whether to shift left or right depends on two things. The first of these is whether
the cutter is oversize, or undersize. The second consideration is whether the path is climb cutting or conventional cutting (industry terms that mean that the material to be cut is to the right of the cutter, or to the left of the the cutter). Neither of these conditions on their own are enough to determine left or right, but it's always the case that an oversize tool and an undersize tool would have opposite shifts, all else being equal. The same goes for climb and conventional paths. I messed this up in my initial attempt (I'm far too embarrassed to show that!), and so I cautiously wrote out my condition "longhand" as:

// we know that we shift left if we have both climb cutting and an
// oversize cutter
bool shift_left = climb_cutting && oversize_tool;
// if we are not climb cutting, we "swap" our state once from the
// above "model"
if (!climb_cutting) {
shift_left = !shift_left;
}
// also, if we do not have an oversize cutter, we "swap" our state once
if (!over_size_tool) {
shift_left = !shift_left;
}

Although that was correct, I found it rather verbose and clumsy, and wanted it expressed more succinctly (whether or not that was a Good Thing or not is debatable). For the life of me I just couldn't figure out how to do it, so I ended up resorting to truth tables. I had:

OVERSIZE CLIMB LEFT
T T T
T F F
F T F
F F T

I then went and looked up truth tables to see what that was,
and found that that was the truth table for "<=>" ("if and only if").
There's a tautology

(A <=> B) <=> ( (A && B) || (!A && !B) )

From that I could finally see that my expression should be:

bool shift_left = (climb_cutting && oversize_tool) ||
(!climb_cutting && !oversize_tool);

Once I had the expression it was easy enough to see that it was correct. It then seemed very obvious, and I felt rather silly. However, I had had a real blind spot in working the expression out without constructing it from truth tables and tautologies.

That probably says more about my failings than anything else!

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

CAPTCHA
To help prevent spam, please answer the following question.
D
3
P
B
x
7
Enter the code without spaces.
Syndicate content