It used to be that to even start a business in Tech, you needed expensive servers. Cloud computing solved this, and
refined it as workloads can be sliced into smaller and smaller pieces, only run on demand.
We bought into the first of these, AWS Lambda, quite some time ago. At the time, only DynamoDB was available as a
persistence layer, however the use of AWS API Gateway made it so that our Lambdas, to all intents and purposes, looked
like a real server. Since then we’ve seen on-demand relational databases, with the only real cost being a slight amount
of latency as the resources are provisioned.
This release includes some refinement of our tagging API, a few bug fixes, and the surfacing of tags to the user interface.
Tagging
Atomic changes of tag associations now work properly.
Tag names and tag category names are now indexed and searchable.
Tags are now visible and assignable via the UI.
Dataset Parsing
Fixed a major bug in our file type detection, where our file reader would prematurely exit before properly detecting the mime type.
At the moment, most of our tag and tag categories are still input manually, so if you need a particular new set of tags, please contact us. Furthermore, we anticipate making tags a little more substantial in the upcoming Search UI, so stay tuned for those changes.
This release, including version 0.1.78, includes the beginnings of our tagging functionality.
Tagging API
(API Only) Tags and tag categories can now be created by admins.
(API Only) Tags can be assigned to users’ data sets.
DataSets may be searched and browsed by multiple tags. Tags are currently on an “AND” basis.
Tagging UI
The admin user interface now permits the creation of tags and tag categories. It does not yet permit editing those tags, or deleting them or the categories.
In this Monday’s release, we’ve added very simple support for publishing a dataset to the rest of the world.
Publishing
Users may now publish their datasets and make them available to the rest of the world.
The dataset search is now public, everyone can search our data.
Downloading and uploading now have an explicit popup that informs the user they have to authenticate to perform these actions.
Search Index Fixes
Our search indexes were corrected to properly filter by publishing. The API also supports filtering by user, though that has not yet been exposed in the UI.
Upcoming features
Next on our docket is adding multidimensional search dimensions, though those are probably easier to understand as “tags”. We’re also working on a new search user interface that will allow us to expose all these filters, however that is being worked on independently and is likely to take a little longer.
This release includes a complete replacement of our data storage layer – rather than storing things in Cassandra, we’ve moved everything over to HBase as it provides us with a couple of very significant benefits, most significant of which is the ability to actually order our rows. As you will notice, both the rows and columns of your data sets now upload and download in the order they were uploaded. No other major things were changed, so we’re going to forego the regular bullet list this time.
This release (yes, we skipped a few versions in staging) is paired with the release of our data parsing framework, and now supports the automatic detection and parsing of text files in many different data encodings. We’ve made use of the Tika framework to assist in detecting the character encoding of our text files, and can now reliably support most character sets listed here.
Upload Fixes
Uploaded files are now analyzed for text encoding and parsed accordingly.
Errors in parsing are passed to our engineering team for further testing.
UI updates to indicate the current state of a parsing data set.
Upload Fixes
Our development vagrant box is now shared, which cuts about 15 minutes off of our deployment. it might not seem like much to you, but for us it’s a lovely breath of fresh air 🙂
DFR Updates
The interfaces IDataEncoder and IDataDecoder have had their stream declarations removed and passed into the new interfaces IStreamEncoder and IStreamDecoder.
New interfaces named ITextEncoder and ITextDecoder have been added to handle character stream based file processing.
With the 0.1.72 release, we are moving into a more regular release cadence: Since our office hours are Monday and Thursday evenings, we’ll be pushing releases after those hours conclude. Last night’s updates are as follows:
Fixes to Upload
Licenses, attribution, and the shared flag are no longer required.
Sharing data sets has been disabled, you can now only see your own data sets (don’t worry, it’s coming back).
The dataset button is now log-in-only.
Uploading in Firefox now works.
Miscellaneous Updates
RSS feed is now in application header.
Pricing page has been removed.
Login string in the side menu has been renamed.
Application byline has been updated to match blog.
Dead code has been removed.
The next update will focus on properly handling file encodings of various different types, as we currently only support UTF-8 or US-ASCII encoded file, which leaves the entire windows world unsupported. This is likely to take a bit longer, as we need to build an intermediary caching mechanism for data files while we process them, however it should be available by next week.
With the 0.1.71 release, we are starting to address some of our user interface bugs, as well as adding some features that should simplify interacting with social websites.
UI Updates
New linking for the favicon. Yay logos!
Undesired scaling tended to occur when viewing the app on mobile. While we can’t exactly control user settings in mobile browsers, we can at least encourage those browsers to let us do our own layout.
Our contact-us form was sending messages to the person submitting the contact form, rather than our support team. oops.
The column in the download data modal now stretches the entire width of the modal.
The dataset menu in the left navigation bar now highlights when you have a dataset selected.
Open Graph tagging!
Infrastructure Updates
Our API Cassandra configuration will now be notified of more than one Cassandra node to connect to, reducing points of failure.
We are no longer using protobuf as our message encoding, opting for JSON instead. This reduces the complexity of our internal serialization logic rather dramatically, as Jackson will attach its own type discovery parameters (and we don’t have to manually deserialize the data).
Deprecated old password-based authentication module, and cleaned up the database.
Deprecated and removed old VFS2 filesystem handling, as data is now stored in cassandra.
All of the passwords used for development accounts have been (mostly) removed from our source, with the exception of passwords for the vagrant VM. Devs now have to use their own SMTP accounts to send email.
This evening’s release comes with a small selection of UI improvements, and one big new feature!
New Feature: Download Data
It’s somewhat funny that up until now, you couldn’t download the data you uploaded. Well, this has now been corrected, and you may download any public shared data set in any of the formats that we support. Furthermore, you can select columns on the fly, so that you don’t download anything that you don’t need.