Efficient algorithm for finding incorrect transactions in double-entry bookkeeping

I wanted to find an efficient algorithm for finding missing or problematic transactions with YNAB Debt Sync. The naïve O(n²) approach of comparing all transactions from both budgets would obviously not scale. This post explains how I achieved an average time complexity of O(n·log(n)).

Thinking about the problem, it occurred to me that I could use the mirrored nature of double-entry bookkeeping to my advantage by sorting each budget's transactions by their amount. The trick is to sort the second budget's transactions in opposite order to the first budget's transactions and, for each pair of transactions, compare the first budget's transaction amount with the inverse of the second budget's transaction amount. For the best-case scenario where the budgets tally up, this results in linear time complexity.

You can see an example in the next figure showing two budgets, called "This budget" and "Other budget", whose transactions are sorted in this mirrored fashion. This is the best-case scenario: both budgets have the same number of transactions, and each transaction in one budget has its opposite transaction in the other budget.

Using a common sort algorithm, like quicksort, the complexity for the algorithm's best-case scenario is O(2(n·log(n)) + n), or O(n·log(n)).

The best-case scenario isn't of much interest, though, as the use-case for the algorithm is precisely when the budgets don't tally up. If one (or more) transaction(s) is missing from one of the budgets, the previous comparisons will fail at some point, due to the transaction amounts in the incorrect budget "skipping" past the expected transaction amount. You can see this in the next figure, where "Other budget" is missing the transaction of amount -9,80.

The first comparison is successful: This budget's first transaction of amount 10 is mirrored by Other budget's first transaction of amount -10.  The comparison fails for the second pair though: This budget's second transaction of amount 9,80 is not mirrored by Other budget's second transaction of amount -6,27. Looking at the figure, it's easy to see that Other budget is the one missing a transaction. Specifically, the mirror transaction to This budget's 9,80 transaction. Programmatically deciding which budget is missing a transaction is a bit more complicated, and is documented in YnabBudgetComparer's get_missing_transactions method. The cases depend on the combination of the transaction amount signs and on the absolute value comparison of the amounts.

In this case, because This budget's transaction is greater than zero and Other budget's transaction is less than zero, coupled with This budget's transaction's absolute value being greater than Other budget's transaction's absolute value, we know that Other budget is missing a transaction to mirror the transaction of amount 9,80 from This budget.

Information such as Payee or Memo (in the case of YNAB transactions) will help the user to understand the discrepancy between the budgets, but This budget may have multiple transactions of amount 9,80, each with different information. The next step is to find the best candidate transaction.

To do so, first the subset of transactions of amount 9,80 from This budget is extracted and sorted by date. The same is done for transactions of amount -9,80 from Other budget. If there are m transactions of this amount, this is O(m·log(m)) operations. It is common that, for n total transactions, n ≫ m, as there are relatively few transactions that share the exact same amount.

A transaction subset comparison is triggered each time there is a mismatch in a transaction pair's amounts so, for p skips, the time complexity of this operation is O(pm·log(m)). The value of p depends entirely on how thorough the users have been at recording transactions. In my experience, there is roughly one skip for every ten transactions, so we can also consider n ≫ p, giving n ~ pm. The loosely average time complexity is then O(n·log(n) + n·log(n)), or O(n·log(n)).

Once the two subsets have been created, the relative complement of Other budget transactions in This budget transactions is then calculated, i.e. which transactions exist in This budget but not in Other budget. This is a linear operation of, at worst, O(n-1), where n is the amount of transactions from This budget. It is n-1 because if Other budget has no transactions of amount -9,80, the whole set of This budget's transactions of amount 9,80 is returned, and is a constant-time O(1) operation. The order of magnitude of this operation doesn't affect the previous time complexity of O(n·log(n)).

For the budgets in this example, the subset from This budget contains one transaction of amount 9,80 with date 2015-08-16. The subset from Other budget contains zero mirror transactions of amount -9,80, as you can see in the following figure. The transaction from This budget is returned as Other budget's missing transaction, required for both budgets to tally up.

For extreme cases the algorithm could degenerate to a worst-case scenario of O(mn·log(n)) complexity, for example if none of the transactions were mirrored: each budget containing only its own side of transactions.

Posted in Uncategorized | Leave a comment

Managing short-term lending with two-way YNAB

This post explains the reason behind my latest personal project, YNAB Debt Sync.

Shortly after I started using YNAB I ran into a problem. How was I meant to borrow cash from —or spot cash to— family and friends? The first few times a friend asked me to spot them some cash to pay for lunch or a beer, I just assigned it to whatever category the activity was related to. That posed a problem though, I wasn't actually spending that money on that category, so my budget wasn't truly reflecting my spending. In any case, I'd (hopefully) be getting that money back the next time I saw them.

The solution I came up with was to create a category for each person. If I loaned a person 5€, I'd create an outflow transaction and assign it to that person's category. Starting from a zero balance, the category would show a -5€ balance, meaning they owed me that amount.

If, another day, I borrowed 7€ from them, I'd record two cash transactions. One inflow of 7€ assigned to that person's category, which would update the balance to +2€ (i.e. I owe them money). The other transaction would be an outflow of 7€, assigned to whatever category made sense (e.g. Lunch), decreasing that category's balance. This way my budget would be accurate, as I had actually spent 7€ on lunch. As the two transactions have the same amount, but are opposites of each other (inflow vs. outflow), the balance for my cash account would remain unchanged (usually close to zero, in this case :)

To avoid any outstanding loans subtracting from my "available to budget" amount when the loans span two months, for example if I were to borrow money on the last Sunday of a month and I wouldn't be able to repay until the following Sunday, I "turned the arrow right", as mentioned by this post on YNAB's forums.

This was great for a while. I could lend and borrow money for short periods of time while being confident my actual spending categories were accurate, and also keeping track of how much I (or they!) owed.

Then I managed to convince some of these friends and family to use YNAB. Suddenly, my budget wasn't the sole source of information. I started being a bit lazier when recording money I borrowed; I knew the lender would record the transaction to ensure their budget was up-to-date. The opposite also happened, if I loaned somebody some money, they knew I'd record it and chase them after it. Especially with family, where loans are quite frequent.

It got to the point where, when checking my category balances against the other people's budgets, quite often one or two categories would be out of sync. We'd then have to manually go through both budgets' transactions and look for the missing ones, which was boring and time-consuming. Wanting to see if I could automate the process, I dug through YNAB's budget files and realised it was possible. Being a bit of an enterprising individual, I wrote YNAB Debt Sync.

Notes

The YNAB link uses my referral code, which gives you a $6 discount if you buy a licence, and $6 for me.

This system is similar to YNAB's own reccommendation for loans, except creating a category per person instead of per loan and not budgeting for the categories, just taking advantage of "turning the arrow right" to keep a running total. Luckily this type of situation happens with a small group of people, so I only have a few categories. For cases when there are lots of different people involved, the strategy of having a catch-all category and using the memo field for the person's name, described in the YNAB forum post, is more manageable.

While writing this post I realised this system is basically a reinvention of double-entry bookkeeping. For every loan (credit) transaction in one person's budget there is a matching, opposite,  borrow (debit) transaction in the other person's budget. Tallying up the transactions in each budget should result in the same balance; one tally will be positive and the other will be negative. If the budgets don't tally up to a mirrored balance, there is a mistake somewhere; either missing transaction(s) or mistyped amount(s).

Due to "turning the arrow right" and not budgeting the loan categories, it's possible to have enough outstanding loans that could wreak havoc with your cash flow if you're not at Rule 4 yet.

 

Posted in Uncategorized | Leave a comment

How to build PeaZip installer for Windows

Some notes on how I built PeaZip from source, including the working x64 installer for Windows.

Preparation

Before building PeaZip you need:

  • PeaZip source code
  • Missing binaries (7zip, etc)
  • Missing files
  • Lazarus, an open source Free Pascal IDE
  • Inno Setup

The source code is available as a zip download on SourceForge. Choose a version of PeaZip and look for the *.src.zip file (for example, v5.4.1). Download and extract somewhere.

As the readme.txt mentions, the source code doesn't include the binaries for the extraction utilities, like 7zip or unACE. I got them by downloading the PeaZip installer from SourceForge, installing PeaZip and copying the res\ directory from C:\Program Files\PeaZip to the directory where I'd extracted the source code. From a quick glance, it seems to be a licensing issue that prevents these files being in the source download.

There are two files missing that break compilation of the installer script. The first is peazip_help.pdf. I placed it in the directory where I extracted the source code. The second file is SendTo.7z, also available in the root directory. I saw that SendTo.zip already existed in the source download so I extracted it and repackaged as .7z.

The Lazarus IDE installer (I used v1.2.4) can be downloaded from the project's homepage. Download and install it.

Inno Setup can be downloaded from the project's homepage. I used v5.5.5.

Compiling the source code

First the executables have to be compiled. Open Lazarus and then open project_pea.lpi, select the Run menu → Build. After it's done, open project_peach.lpi and do the same.

Building the installer

Next the installer build needs configuring. For x64 there are two Inno scripts, found in the installer\ directory. peazip-setup_script_WIN64.iss needs editing to work, I also edited peazip-setup_script_WIN64-configure.iss to make life easier.

peazip-setup_script_WIN64-configure.iss (optional)

I haven't found any documentation as to what the purpose of this installer is. Once built, it has to be copied to the res\ directory; but, as we've copied the res\ directory from a working installation, we already have this file. From what I can tell, it is used to configure file associations and context menu entries for an already existing PeaZip installation.

I edited the OutputDir key to point to the Output directory of PeaZip's source code, just to keep everything in one place. I then copied the .exe to the res\ directory.

peazip-setup_script_WIN64.iss

Replace all instances of "C:\input\peazip-5.4.1.WIN64" with the path to where you've extracted the PeaZip source. In my case it was "C:\Users\john\dev\PeaZip\src".

Optionally, change the value of the "OutputDir" key from "C:\output\" to something else. I chose to point it to "C:\Users\john\dev\PeaZip\src\Output".

The installer can now be built by opening the file in Inno Setup Compiler and choosing the Build menu → Compile. It's important to have copied the res\ directory from a working installation, otherwise the installer will result in a broken PeaZip installation. The program will just crash when extracting files.

After following these steps, I ended up with a working PeaZip installer for Windows/x64. The purpose was to fix a pet peeve of mine related to archives: some archives contain a single directory in the root, with the contents inside while other archives contain the content in the root.

To avoid mixing up files from different archives I always use "Extract here (new folder)", but it's annoying when the original archive is the type that contains a single directory. I then have to navigate two folders deep, move the content up one level and delete the now-empty original directory.

peazip-uniform-subdirectory-extraction handsomely solves the issue by doing that work for me.

Posted in Uncategorized | Leave a comment

git error: unable to create file (Invalid argument)

A file was added to a project's git repository at work with an illegal filename (under Windows). The filename had an invalid character in the extension: a question mark.

When I tried fetching/rebasing with TortoiseGit an error message popped up which contained "error: unable to create file /path/to/file.ext? (Invalid argument)". Confused, I looked at the upstream's log and, effectively, a file with an invalid character in the extension had been added.

To solve the issue I dropped down to git's command line:

$ git fetch
$ git rebase
First, rewinding head to replay your work on top of it...
error: unable to create file Path/file.ext? (Invalid argument)
Applying: My local commit. 

The problematic file wasn't created in my working directory, but the rebase worked correctly. All that was needed was to delete the file with the invalid character in the extension from git's index (as it wasn't in my working directory), commit and push so we could continue working.

$ git rm --cached Path/file.ext?
rm 'Path/file.ext?'
$ git commit -m "Delete problematic file with invalid character."
[develop bdf3b40] Delete problematic file with invalid character.
 1 file changed, 0 insertions(+), 0 deletions(-)
 delete mode 100644 Path/file.ext? 

Then I pushed and all was right again. The important part was to remove the file from git's index, leaving the working tree alone. Attempting to just git rm the file would result in

$git rm Path/file.ext?
fatal: pathspec 'Path/file.ext?' did not match any files 

I was able to do this so quickly thanks to an article on git reset I'd just read the other day that explains git's three different trees.

Posted in Uncategorized | Leave a comment

Moodle sending text files with incorrect text/html MIME type

Our Moodle installation was serving .txt files via file.php with an incorrect text/html MIME type instead of the correct text/plain MIME type. This was breaking the exercises from a new authoring tool we're trying out.

Adding .htaccess directives wasn't an option as the file request isn't handled by Apache. Moodle stores course files in a non-public directory, using file.php for access control.

After poking around the source code I got to line 829 of lib/filelib.php, where I saw that .txt files' MIME types are rewritten from text/plain to text/html when "filter all files" is  active. A quick visit to Administration → Modules → Filters → Manage filters and changing the "Filter uploaded files" option from "All files" to "HTML files only" fixed the issue.

This fixed the incorrect .txt MIME type

Don't forget to clear the browser's cache.

Posted in Uncategorized | Leave a comment

Grub2: Boot Windows by default

Grub2's default boot setting configuration is different from regular Grub. Whenever I search for it on the internet I have to sift through tonnes of advice regarding the menu.lst file.

Here are the steps for setting Windows as the default OS to boot in Grub2:

  1. Find out the name of Windows' boot entry. There are two ways
    1. Look at the Grub2 boot menu:
    2. Execute update-grub and check the output
  2. Edit /etc/default/grub, specifically the "GRUB_DEFAULT=", adding the name of the Windows boot entry. In this example it would be GRUB_DEFAULT="Windows 7 (loader) (on /dev/sda2)" (inverted commas included)
  3. Run update-grub

    In principle the Windows entry should show up in update-grub's output. It worked on my friend's computer a few minutes ago. I hope I haven't broken my grub configuration.

And that's it. After rebooting the Windows entry will be selected and booted by default.

Posted in Uncategorized | Leave a comment

Setting contact's group in Android 2.3 Gingerbread

A huge issue I have with Android is its contact management. I have a lot of contacts, many of whom I only have an email address for. I have created a "Mobile" group for those contacts I do actually want on my phone, which is great.

Except when I need to add a new contact. For some reason there isn't a way to set the contact's group in Gingerbread and previous versions. After I add a new contact on my phone they get lost in limbo and I have to remember to set their group the next time I'm on a computer. It's been fixed for Ice Cream Sandwich, but my phone isn't going to be getting that upgrade.

I managed to find a way to set contacts' group in Gingerbread with an application from  Android MarketGoogle Play: DW Contacts. I often forget the exact steps to finding the groupless contacts though, so I'm writing them down here.

Instructions

First the contact has to be found. The search option in DW contact's Toolbox tab searches through all contacts, not just those on the phone. Once you find the contact, long press their name and select the "Set group for contact" option. Finally check the appropriate groups and presto. How it's taken until Honeycomb/ICS for this to be a native feature of Android's contact management is beyond me.

 

Posted in Uncategorized | 1 Comment

Irrefutable proof: two spaces after a period on the web is evil

I noticed a few years ago that Americans often type two spaces after a "period". I've always thought it was a practice that should have died years ago; valid when all that was available was a typewriter's monospaced typeface, but horribly outdated as soon as proportional typefaces became available. A similar habit used to be common in Spain, where, in an upper case word or sentence, any accented letters were written in lower case, e.g. CAFé. Even when handwritten. Seemingly it had started with anglo-centric typewriters' lack of ability to move the acute accent upwards when typing in upper-case. This pragmatic—albeit ugly—solution somehow filtered into handwriting, becoming the "correct" way to write uppercase accented letters. Luckily it seems to have fallen out of use—I've not seen it in the last 5 years.

Back to the two space after a period issue. After "discovering" this habit I read up a little and was surprised to find there are still proponents of it. I chalked up my dislike of it to just being another pet peeve of mine; double spaces after a period don't really affect text, after all. Then the other day I noticed this:

From the Diaspora* blog

Take a look at that third line. It looks horrendous and is going to be the irrefutable proof that typing two spaces after a period is evil which I will refer to forever more. In the meantime I'll get back to sperging over other typographic crimes.

Kerning

I need to think up a post just so I can link XKCD's great regex strip.

Posted in Uncategorized | Leave a comment

Windows-7–like window management with Ubuntu (and Compiz)

Windows 7 introduced great window management keyboard shortcuts. Pressing (and holding) the Windows key and then tapping the arrow keys allows the window to quickly be maximised, restored and docked to the left/right half of the screen.

One of the first things I missed when I started to use Ubuntu was this window management. After some trial and error, trying out different Compiz plugins, I managed to simulate the Windows experience.

The Compiz plugins I use are "place" and "grid". These are the settings for grid. Hopefully I won't have too much trouble getting it to work again when I inevitably lose my configuration after updating Ubuntu some day.

Posted in Uncategorized | Leave a comment

Check HTTP headers (specifically MIME type) for file downloads

A client called us up saying they had problems downloading files from our e-learning portal. After some back and forth on the phone I found out the problem was their browser and/or the files' MIME types. Either the MIME type wasn't being set correctly or their browser was ignoring it. In any case they were seeing a bunch of "Greek numbers" on their screen; the file contents were just being printed on-screen.

I wanted to check the HTTP headers the URL was returning, to make sure we were in fact setting the MIME type. Firebug shows the headers in the console and net tabs. In this case they weren't being displayed though, as the URL was initiating a download. Slightly stumped, I thought for a minute and then remembered cURL.

I've never used cURL, but have read much of it. I guessed it could maybe do the trick, so I set about finding out. After a cursory search I found out that

 $ curl -I -H http://URL

Would return the headers for the requested URL. The URL I was trying requires cookie-based authentication, though. After reading the manual I managed to log in via cURL, save the cookies in a text file, then request the file using the cookies to authenticate:

First I POSTed to the login form's action URL with the necessary parameters, while saving the generated cookies to a text file. Then, using the text file as a source for cookies, I requested the file URL and, voilà, got what I needed.

$ curl -c cookies.txt -L -d "username=<USER>&pass=<PASS>" http://URL/login.php
$ curl -c cookies.txt -b cookies.txt -L -I http://URL/file.pdf

My reaction. There were fireworks and bangs and colours and it was just amazing.

Resources

Posted in Uncategorized | Tagged | Leave a comment