Category Archives: Code

Bits and pieces of code

Why lists are amazing

When I was working on my linguistics project, I ran into a wall. Perl doesn’t allow for dynamic creation of arrays. I had used this feature/bug of the BASH shell during an internship at a theoretical chemistry lab, as it allowed me to create an array, whose elements were dynamically created arrays, named according to whatever variable I felt like.

For example, in the case of a cyclohexane, I could have a condition (if $atom = ‘C’) and then create my array as follows:

Carbon$number=[x,y,z];
$number++;

In Perl of course, the solution was to feed my input text to a function, and create a hash table with the structure word{wordcount} for my key-value pair. However, as it turns out, there is an even more elegant solution: lists.

  • Lists can be accessed as arrays (check)
  • Elements can be added to lists as arrays (check)
  • Elements can be arrays (check)

Of course, for the use I was intending, lists are absolutely useless. I cannot really go through an array writing $words[$cow[$xvar]], (well, it’s feasible, but slightly useless). On the other hand, for generating a dictionary spell-checker (which is a word-LIST) lists would consume far less memory due to the absence of the value partner (useful in the case of memory intensive lists on older systems).

Lists are fairly awesome in their capacity. I have not yet tested if they can be used to replace the STRUCT vartype from C, but if they can, then that is awesome.

Language Statistical Analysis

For one of my IT based courses, I was required to do a project involving computers and language. Very vague requirements, due to the course being aimed at people who have little to no IT baggage (and as it wasn’t a programming course… even more so!) I decided to enjoy it, and therefore write some code.

My initial project was to take various texts from 3 different time periods (all texts in English) and attempt to find word variations through time. The main challenge here would be finding the algorithm to decide how one word becomes another. A friend told me he had a C++ library which would be suitable for this, so I attempted some OO programming (it did not go well). As it turns out, the library he had thought of was useless to me, so I decided to go functional and write everything in Perl.

So, this was the first time I was working with Perl, (previous experience had been to add “echo” functions in already existing scripts) so first I had to decide what I wanted and could do.

  1. Take text from input file and make everything lowercase
  2. Create a table containing each word, the number of times it appears and other statistical elements.

Unlike BASH, Perl doesn’t allow for the dynamic creation of tables, but this is resolved with hash tabled. I therefore ended up with a Hash table, structured: word{word occurrences}.

The next step was to use the numbers, at first I attempted writing subroutines, but when it took me an hour to write a simple function, I decided to look up packages and how surprising that CPAN had a discrete statistics package. The joy! I no longer needed to learn how to do OO (the package did everything for me), and I had no need to write sub routines, as the package gave me all the tools necessary.

After some 20h of coding (to only get 100 lines of code), the conclusion was that the language’s statistics don’t change significantly, and the main changes are:

  • New vocabulary replacing old (computer vs type-writer)
  • Manner of communication (Twitter will have a reduced vocabulary span with respect to Shakespeare’s works, if comparing type-token ratios).

My Perl verdict: definitely a fun language, although requires a large amount of commenting due to its free variable use and regex creation.  I will definitely continue to explore this language.

Much help obtained from PerlMonks