tbl: a slice of cake
A colleague of mine, Dave Marsee, talks about building a “slice of cake.” When building out a prototype, think about the minimum vertical that you need to demonstrate your ideas, and do that. Now, I’ve already “built one to throw away” (in another language) with regards to
tbl, but a slice of cake is still a good idea here.
What I know about
tbl is that I want it to be an abstraction for working with data. There are fundamental operations I want to be able to do on a
tbl, and my goal is for those to be conceptual in nature. That is, I want a programmer to be able to say “give me all of the data in this set where a person’s age is greater than 18.” I don’t, for the moment, want them to be thinking about arrays, or SQL, or anything else… just the idea of filtering data.
And, in going back to some of my thoughts on this prior, I’m reminded that I imagined this as a very functional library. I imagined having:
- functions that consume
tbls and produce
- functions that consume
tbls and produce lists
- functions that consume
tbls and produce values
I’m already, in my first exploration, heading down an OO path, where the programmer creates a
tbl structure, and then manipulates it. Hence, I need Dave’s “slice of cake.” I’ve set up my project directory, I’ve written some tests, but now I need to do a deep dive down one implementation pathway, and see where it takes me.
I could design this out from the top down, but that really isn’t the purpose of the exercise. Instead, I want to do a series of structured explorations, and in doing so, demonstrate how to explore software development in an “agile” manner. It means developing stories, prototypes, and tests, and being willing to walk ideas back when they don’t work. (There’s more to it than that, but this is a small exploration for now, and I think that story about agile will suffice.)
all about the lobsters
I live in Maine at the moment, so my running example is going to be about lobsters. It’s either that, or tourists from Massachusetts… so lobsters it is.
import tbl pets_url = "https://docs.google.com/spreadsheets/d/..." a_tbl = tbl.tbl(url=pets_url) a_tbl.show_columns()
First, I want to imagine my code differently. I think I’d like it to look like this:
from tbl import tbl, show_columns, keep_rows pets_url = "https://docs.google.com/spreadsheets/d/..." a_tbl = tbl(pets_url) show_columns(tbl) new_tbl = keep_rows(tbl, weight > 0.8)
I’ve done a few things here:
- I don’t want keyword parameters for common cases. Keyword parameters have “weird” scoping semantics in different languages (e.g. Python, R), and if they aren’t necessary for the common case, then they shouldn’t be used.
- I’d like to have an expression syntax for filtering queries, so that the idea of keeping rows where “weight > 0.8” is easy. A “say what you mean” (or SWYM?) principle of sorts is at work here. I was able to do this in the Racket version with macros, but the Python case is a bit more subtle.
an hour later
i really want macros
This is the side effect of spending 20 years of working in a language like Racket: I want to re-invent my language from within my language. This is a classic case where I’d love to be able to design a small query language to simplify the expressions that novices could write, and then walk the AST and “do the right thing” for them. A Scheme- or LISP-based language would make this easy-peasy.
But, that’s a bit trickier to do in Python (not impossible… just tricky). I could do this:
T = tbl(url = pets_url) show_tbl(T) newT = T |keep_rows_where| (T.weight |gt| 0.8) show_tbl(newT)
because I can override the
| operator. In doing so, I can introduce infix “operators” that are really just functions being applied left-to-right. This has some serious pitfalls waiting in the wings, but for the moment, I’m going to play with it in the “slice of cake.”
The work is going to be done in the left-most operator. In this case, it is
|keep_rows_where|. It will make sure that the LHS is a
tbl, and the RHS is an expression of some sort. I’m essentially giong to be building a little language and interpreter. The language will be made up of “infix operators” that return data structures, and then functions like
keep_rows_where will interpret those structures.
The first implementation will have no error checking. We’re going for a “slice of cake.”
from tbl import tbl, show_columns, show_tbl from tbl.queries import gt, keep_rows_where pets_url = "http://bit.ly/2IzVqoV" print("The original table:") T = tbl(url = pets_url) show_tbl(T) print() print("The new table:") newT = T |keep_rows_where| (T.weight |gt| 0.8) show_tbl(newT)
This works. It works because I implemented the function
gt as follows:
GT = NT("GT", ["lhs", "rhs"]) @Infix def gt(lhs, rhs): return GT(lhs, rhs)
All the function does is build a GT data structure, which has two fields: a left-hand side, and a right-hand side. I’m doing no checking. It just shoves data into the fields of the structure.
keep_rows_where looks like:
@Infix def keep_rows_where (lhs, rhs): T = lhs # The LHS needs to be a tbl, the RHS if not (type(T) is tbl): print("The left-hand side of |keep_rows_where| must be a tbl.") newT = copy.deepcopy(T) # Get the row index based on the field, # which will be a Column() structure. col_index = T._get_column_index(rhs.lhs) new_rows = list() for r in T.fields.rows: # FIXME: I want to import values appropriately, and I want # to have checking integrated somewhere up the chain so # that 'gt' comparisons don't happen on the # wrong types of data. if float(r[col_index]) > float(rhs.rhs): new_rows.append(r) newT._set_rows(new_rows) return newT
Essentially, I’m writing functions to build up and/or interpret abstract syntax trees.
gt is a function that builds a single node of an AST, which represents the “greater than” operator. The
keep_nodes_where function takes a
tbl and an AST, and interprets the AST in light of the
This slice of cake, when executed, outputs the following:
The original table: Name Weight Color Bart 0.75 Muddy Flick 1 Muddy Bubbles 1.2 Blue Crabby 0.5 Muddy The new table: Name Weight Color Flick 1 Muddy Bubbles 1.2 Blue
The new table is generated functionally via this expression:
newT = T |keep_rows_where| (T.weight |gt| 0.8)
Do I like this? I don’t know. Have I thoroughly explained this slice of cake? No. I have spent many, many years writing linkers, interpreters, compilers, and transpilers for all manner of weird systems, and the idea of “creating a language” on-the-fly to solve a problem is natural to me. If I want to continue down this road, I have to think about the potential implications for novices. What kinds of errors might they discover with this approach? How can I support them in their learning? Will I need multiple approaches… that is, will this work for simple things, but not complex things?
The answer to the last question is almost always “yes.” Developing simple things that scale up in complexity gracefully is difficult, but arguably easier than developing a complex thing that also (just happens) to express simple things, simply. So, there is hope, but it will take more explorations.
Today, I explored in the “expressions” branch. I don’t know if I’ll keep this exploration, but for now, it’s in the repository.