Tutorial: Doing multiple replacements on a string

A few days ago I helped a very good friend of mine with some Python questions, and one of his questions gave a very good entry point to some Python basics, so I thought it wrap it all up in a small tutorial.

The question, which my friend asked, was something in the lines of

I’m processing some data samples, which all have a title. I want to use these titles as file names for the results, but they sometimes contain some illegal characters, which I need to replace first. How do I define a set of replacement rules and apply them all in a readable way?

This basically boils down to: How do I do multiple string replacements on a string?

Let’s start with some data. Let’s assume that we have a title string with the value '12" HPHT 2012-08-07 12:37:26'. This title string contains two types of illegal characters: '"' and ':'. We want to replace '"' with 'inch' and ':' with '.'.

We can define those replacement rules as tuples in a list.

Next, we want to apply those replacement rules to the title. We can do this in a simple for-loop:

Notice how we unpack the two elements of the tuple and assign them to search and replacement directly in the for statement.

Of course, we might need this functionality in several places, so it’s probably better to wrap it up in a function. Here’s the complete code:

Now, while the above works and is very readable, we could also have avoided the for-loop be using a bit of functional programming instead. So let’s dive into that.

Instead of the for-loop, we could have used the reduce() function instead.

The reduce function basically reduces a list to a single result by iterating over the list and for each element calculate an intermediate value based on the element and the previous intermediate value. The start value is given as an argument to reduce() and the last intermediate value becomes the final result. Thus

is the same as

Using this we could rewrite our previous example to avoid the for-loop like this:

Of course, this gave us much more code, but we can apply a few tricks to deal with that. The first trick is using the syntax for unpacking argument lists to avoid having to explicitly reference each item in rule tuple. We simply ask Python to unpack the tuple for us and use each element as an argument:

Finally, we can avoid the helper function altogether by replacing it with a lambda function.

A lambda function is just a very simple function, which doesn’t have a name and which must always return a result. Thus

is the same as the lambda function

We can rewrite _replace_helper() as a lambda function the same way. The final code looks like this:

A final word of advice, though.
While using functional programming elements like reduce() can be very powerful, it can also hurt the readability of the code. The general rule of thumb is to always write code in the most readable way, and in this case then first for-loop is probably the most readable version. However there are also situations, where you want to do a small data transformation, which is easily expressed in a functional way, and where a plain for-loop ends up looking bloated.
So, if you’re in doubt, try writing the transformation in both ways and compare them. In the end, it’s up to you to write your code in the most readable manner.

That’s it! I know this is some very basic Python, but since there is always someone out there, who is just learning the basics of Python, it might be useful to someone else too.

Posted in Python | Comments Off on Tutorial: Doing multiple replacements on a string

Another reboot

So, once again it’s time for my bi-annual blog reboot. I’ve trashed the old (and hacked!) blog and installed a brand new WordPress. I’ll get back to configuring the look’n’feel in a few days.

 

Posted in Uncategorized | Comments Off on Another reboot