The Fine Art of Cheap Infrastructure

Michael Krotscheck - - 4 mins read

It used to be that to even start a business in Tech, you needed expensive servers. The rise of cloud computing solved this, and refined it by slicing workloads into smaller and smaller pieces that only run on demand.

The project adopted Amazon Web Services Lambda quite some time ago. At the time, only DynamoDB was available as a persistence layer, however the use of Amazon API Gateway made the Lambdas, to all intents and purposes, look like a real server. Since then the industry has seen on-demand relational databases, with the only real cost being a slight amount of latency as the resources provision.

0.1.81 Released

Michael Krotscheck - - 1 min read

This release includes some refinement of the tagging API, a few bug fixes, and the surfacing of tags to the user interface.

Tagging

  • Atomic changes of tag associations now work properly.
  • Tag names and tag category names are now indexed and searchable.
  • Tags are now visible and assignable via the UI.

Dataset parsing

  • Fixed a major bug in file type detection, where the file reader would prematurely exit before properly detecting the mime type.

At the moment, most tag and tag categories are still input manually, so if you need a particular new set of tags, please contact the team. Furthermore, the roadmap anticipates making tags more substantial in the upcoming Search UI, so stay tuned for those changes.

0.1.79 Released

Michael Krotscheck - - 1 min read

This release, including version 0.1.78, includes the beginnings of tagging capability.

Tagging API

  • API-only: administrators can now create tags and tag categories.
  • API-only: administrators can assign tags to users’ data sets.
  • Users can search and browse datasets by multiple tags. Tags currently require all selected tags.

Tagging UI

  • The administrator user interface now permits the creation of tags and tag categories. It doesn’t yet permit editing those tags or deleting them or the categories.

Curious? Go check it out.

0.1.77 Released

Michael Krotscheck - - 1 min read

This Monday’s release adds simple support for publishing a dataset to the rest of the world.

Publishing

  • Users may now publish their datasets and make them available to the rest of the world.
  • The dataset search is now public, so everyone can search the data.
  • Downloading and uploading now have an explicit popup that informs the user they have to authenticate to perform these actions.

Search index fixes

  • The search indexes now filter properly by publishing. The API also supports filtering by user, though the UI doesn’t expose it yet.

Upcoming features

Next on the docket is adding multidimensional search dimensions, which are easier to understand as “tags”. The team is also working on a new search user interface that lets the product expose all these filters. The team tracks that effort independently and it’s likely to take a little longer.

0.1.76 Released

Michael Krotscheck - - 1 min read

This release includes a complete replacement of the data storage layer. Rather than storing things in Cassandra, the system now uses HBase because it provides a couple of significant benefits, most notably the ability to order rows. Both the rows and columns of datasets now upload and download in upload order. No other major changes occurred, so this release skips the regular bullet list.

Curious? Go check it out.

0.1.75 Released

Michael Krotscheck - - 1 min read

This release pairs with the release of the data parsing framework and now supports the automatic detection and parsing of text files in many different data encodings. The project uses the Tika framework to assist in detecting the character encoding of text files and now reliably supports most character sets listed here.

Upload fixes

  • Uploaded files are now analyzed for text encoding and parsed accordingly.
  • Parsing errors now go to the engineering team for further testing.
  • UI updates to indicate the current state of a parsing data set.

Development environment updates

  • The development Vagrant box is now shared, which cuts about 15 minutes off deployment. It might not seem like much, but the time savings are welcome 🙂

Data file reader updates

  • The interfaces IDataEncoder and IDataDecoder now pass their stream declarations into the new interfaces IStreamEncoder and IStreamDecoder.
  • The release adds new interfaces named ITextEncoder and ITextDecoder to handle character stream based file processing.

Curious? Go check it out.

0.1.72 Released

Michael Krotscheck - - 1 min read

With the 0.1.72 release, the project moves into a more regular release cadence. Since office hours are Monday and Thursday evenings, releases go out after those hours conclude. Last night’s updates are as follows:

Fixes to upload

  • Licenses, attribution, and the shared flag are no longer required.
  • The update turns off data set sharing, so users can now only see their own data sets. It returns in a future update.
  • The dataset button is now log-in-only.
  • Uploading in Firefox now works.

Miscellaneous updates

  • The Really Simple Syndication feed now appears in the app header.
  • The update removes the pricing page.
  • The update renames the login string in the side menu.
  • The app byline now matches the blog.
  • The update removes dead code.

The next update focuses on properly handling file encodings of various types, as the system currently only supports UTF-8 or American Standard Code for Information Interchange encoded files, which leaves the entire Windows world unsupported. This is likely to take a bit longer, as the team needs to build an intermediary caching mechanism for data files while processing them, however it should be available by next week.

0.1.71 Released

Michael Krotscheck - - 2 mins read

The 0.1.71 release starts addressing user interface bugs and adds features that simplify interacting with social websites.

UI updates

  • New linking for the favicon. Yay logos.
  • Undesired scaling tended to occur when viewing the app on mobile. Mobile browsers still control user settings, but the site now encourages those browsers to allow the layout to render as intended.
  • The contact form sent messages to the submitter instead of the support team. Oops.
  • The column in the download data modal now stretches the entire width of the modal.
  • The dataset menu in the left navigation bar now highlights when you have a dataset selected.
  • Open Graph tagging.

Infrastructure updates

  • The API Cassandra configuration now includes more than one Cassandra node to connect to, reducing points of failure.
  • The system no longer uses Protocol Buffers for message encoding and now uses JSON instead. This reduces the complexity of internal serialization logic because Jackson attaches its own type discovery parameters, so manual data decoding is no longer required.
  • Deprecated old password-based authentication module, and cleaned up the database.
  • Deprecated and removed old VFS2 filesystem handling, as data is now stored in Cassandra.
  • The cleanup removed most passwords used for development accounts from source, with the exception of passwords for the Vagrant VM. Developers now use their own Simple Mail Transfer Protocol accounts to send email.

Curious? Go check it out.

0.1.70 Released

Michael Krotscheck - - 1 min read

A quick bug fix release this morning.

Bug fixes

  • User administrator UI role dropdown is now unavailable when viewing your own account.
  • Downloading data now passes the correct mime_type query parameter, rather than mimeType.

Curious? Go check it out.

0.1.69 Released

Michael Krotscheck - - 1 min read

This evening’s release comes with a small selection of UI improvements, and one big new feature.

Download data feature

It’s somewhat funny that up until now, you couldn’t download the data you uploaded. That gap is now resolved, and users can download any public shared dataset in any of the formats that Dataplay supports. Furthermore, users can select columns on the fly, so there is no need to download anything extra.