Monthly Archives: January 2014

Web Development

So I finally caved. After almost a year of resisting, I inevitably jumped into the world of web-dev (well, more accurately, I have timidly stepped into it). I think what finally pushed me over the edge was that websites and apps are both useful and convenient methods of showcasing one’s work to the world (even if the main focus of the project is not web related), and the ability to create a demonstrable final product provides the necessary motivation to actually see some of these projects through to completion.

For the time being, I am taking a fairly minimalist approach to web-dev, focusing primarily on Python based libraries (Flask and SQLAlchemy), and only doing limited amounts of non-python front-end work (html, css). Because web development isn’t my preferred domain, I feel that taking this minimalist approach will allow me to continue to increase my python experience, while keeping the ‘web-only’ stuff to a minimum. And who knows, if I ever get a change of heart, I can always pick up JavaScript and css later (so far it’s been fun, who knows).

I decided that my first project was going to be a web app that monitors craigslist and alerts users about new posts that fall under particular search criteria. In the past year, I have written a number of web scrappers, and have also played around with the Twilio API to send text messages, so I figured this would be a good project to demonstrate some of that experience. Additionally, I had gone through some Flask tutorials (a python based web framework) as well as done some work with SQLAlchemy (a database ORM), so hooking everything up seemed like it would be straight forward enough. But of course, there are always hiccups….

One of the aspects of data analysis projects that I find appealing is that it is very easy (for me at least) to conceptualize the process that takes place when writing an analysis program. Even when using large, powerful libraries and tools to work with the data, I find it relatively easy to understand the transformations and operations that are taking place. This makes writing and debugging code fairly straightforward as I know what I want to do, and I can anticipate where problems might arise. One of the issues I have with modern web development is how much of the work is actually black-boxed. Over the last 10 years, web development libraries and tools have advanced tremendously, allowing developers to greatly increase their productivity, but the downside to this is that until you have significant amount of experience under your belt, it can be very difficult to develop insight into the inner workings of the libraries you are using. So for someone like myself who has limited experience with these frameworks, running into bugs can be very frustrating, as I do not posses the intuition or insight to solve the problem quickly.

This brings me to the subject matter of my next post. In the process of setting up the web code for the craigslist project, I ran into a frustrating issue that seemed to fall between the cracks of the available tutorials and stackoverflow posts. The issue arose when I attempted to setup the flask login manager to take care of authentication for the different web pages (essentially controls what users can see what pages). The available example and tutorials demonstrated the use of the flask login manager with a generic, non-specific ORM. For those well versed in how Flask interacts with ORM’s this probably was the best way to present the information, but for me it resulted in some long, very frustrating hours trying to get the login manager to hook up to SQLAlchemy. I’ll save the details for the next post, but in the end, I decided that it would be a good idea to create a Flask-SQLAlchemy-Login template so that the next time I needed a login system for a project, I wouldn’t have to go through the painful process again.

Github as synced file storage

So it’s been a while since I have posted anything. It’s not for lack of material, mostly due to lack of time. I should be adding a few more posts in the next week, and I am going to start off this latest string of posts with a quick short one that extends my previous discussion of thin clients.

In the last post, I talked about using boto and python to push data around. I found boto to be very useful in streamlining the process of getting data on and off AWS instances, which can be a slow and tedious task when using a thin client setup and scp. While exploring other ways to make the thin client lifestyle easier to manage, I ‘discovered’ that github is not only a valuable tool for version control, but also a fairly useful tool for synchronized file storage. In addition to moving a lot of data around, I was regularly using scp to push python scripts and other files to and from different machines. The biggest pain point with scp is you need to keep track of the IP addresses for each machine. Using github, you simply need to remember the repo name (much easier to memorize than an IP address), and then clone the repo to the machine. And of course you have the added bonus of being able to commit any changes you make back to the remote repo.

For seasoned veterans, this ‘revelation’ of mine would seem rather obvious. But for those of us that are still learning the ropes, version control adds a lot of extra overhead to the already heavy mental load that comes with learning new programming languages and tools. So in hindsight, after using github heavily for a few months, the idea that it could be used as a synced file storage system seems obvious, but at the time, I was quite happy to discover it’s secondary use case.