It used to be that to even start a business in Tech, you needed expensive servers. The rise of cloud
computing solved this, and refined it by slicing workloads into smaller and smaller pieces that only
run on demand.
The project adopted Amazon Web Services Lambda quite some time ago. At the time, only DynamoDB was
available as a persistence layer, however the use of Amazon API Gateway made the Lambdas, to all
intents and purposes, look like a real server. Since then the industry has seen on-demand relational
databases, with the only real cost being a slight amount of latency as the resources provision.
This release includes some refinement of the tagging API, a few bug fixes, and the surfacing of tags
to the user interface.
Tagging
Atomic changes of tag associations now work properly.
Tag names and tag category names are now indexed and searchable.
Tags are now visible and assignable via the UI.
Dataset parsing
Fixed a major bug in file type detection, where the file reader would prematurely exit before
properly detecting the mime type.
At the moment, most tag and tag categories are still input manually, so if you need a particular new
set of tags, please contact the team. Furthermore, the roadmap anticipates making tags more
substantial in the upcoming Search UI, so stay tuned for those changes.
This release, including version 0.1.78, includes the beginnings of tagging capability.
Tagging API
API-only: administrators can now create tags and tag categories.
API-only: administrators can assign tags to users’ data sets.
Users can search and browse datasets by multiple tags. Tags currently require all selected tags.
Tagging UI
The administrator user interface now permits the creation of tags and tag categories. It doesn’t
yet permit editing those tags or deleting them or the categories.
This Monday’s release adds simple support for publishing a dataset to the rest of the world.
Publishing
Users may now publish their datasets and make them available to the rest of the world.
The dataset search is now public, so everyone can search the data.
Downloading and uploading now have an explicit popup that informs the user they have to
authenticate to perform these actions.
Search index fixes
The search indexes now filter properly by publishing. The API also supports filtering by user,
though the UI doesn’t expose it yet.
Upcoming features
Next on the docket is adding multidimensional search dimensions, which are easier to understand as
“tags”. The team is also working on a new search user interface that lets the product expose all
these filters. The team tracks that effort independently and it’s likely to take a little longer.
This release includes a complete replacement of the data storage layer. Rather than storing things
in Cassandra, the system now uses HBase because it provides a couple of significant benefits, most
notably the ability to order rows. Both the rows and columns of datasets now upload and download in
upload order. No other major changes occurred, so this release skips the regular bullet list.
This release pairs with the release of the
data parsing framework and now supports the
automatic detection and parsing of text files in many different data encodings. The project uses the
Tika framework to assist in detecting the character encoding of text
files and now reliably supports most
character sets listed here.
Upload fixes
Uploaded files are now analyzed for text encoding and parsed accordingly.
Parsing errors now go to the engineering team for further testing.
UI updates to indicate the current state of a parsing data set.
Development environment updates
The development Vagrant box is now shared, which cuts about 15 minutes off deployment. It might
not seem like much, but the time savings are welcome 🙂
Data file reader updates
The interfaces IDataEncoder and IDataDecoder now pass their stream declarations into the new
interfaces IStreamEncoder and IStreamDecoder.
The release adds new interfaces named ITextEncoder and ITextDecoder to handle character stream
based file processing.
With the 0.1.72 release, the project moves into a more regular release cadence. Since office hours
are Monday and Thursday evenings, releases go out after those hours conclude. Last night’s updates
are as follows:
Fixes to upload
Licenses, attribution, and the shared flag are no longer required.
The update turns off data set sharing, so users can now only see their own data sets. It returns
in a future update.
The dataset button is now log-in-only.
Uploading in Firefox now works.
Miscellaneous updates
The Really Simple Syndication feed now appears in the app header.
The update removes the pricing page.
The update renames the login string in the side menu.
The app byline now matches the blog.
The update removes dead code.
The next update focuses on properly handling file encodings of various types, as the system
currently only supports UTF-8 or American Standard Code for Information Interchange encoded files,
which leaves the entire Windows world unsupported. This is likely to take a bit longer, as the team
needs to build an intermediary caching mechanism for data files while processing them, however it
should be available by next week.
The 0.1.71 release starts addressing user interface bugs and adds features that simplify interacting
with social websites.
UI updates
New linking for the favicon. Yay logos.
Undesired scaling tended to occur when viewing the app on mobile. Mobile browsers still control
user settings, but the site now encourages those browsers to allow the layout to render as
intended.
The contact form sent messages to the submitter instead of the support team. Oops.
The column in the download data modal now stretches the entire width of the modal.
The dataset menu in the left navigation bar now highlights when you have a dataset selected.
Open Graph tagging.
Infrastructure updates
The API Cassandra configuration now includes more than one Cassandra node to connect to, reducing
points of failure.
The system no longer uses Protocol Buffers for message encoding and now uses JSON instead. This
reduces the complexity of internal serialization logic because Jackson attaches its own type
discovery parameters, so manual data decoding is no longer required.
Deprecated old password-based authentication module, and cleaned up the database.
Deprecated and removed old VFS2 filesystem handling, as data is now stored in Cassandra.
The cleanup removed most passwords used for development accounts from source, with the exception
of passwords for the Vagrant VM. Developers now use their own Simple Mail Transfer Protocol
accounts to send email.
This evening’s release comes with a small selection of UI improvements, and one big new feature.
Download data feature
It’s somewhat funny that up until now, you couldn’t download the data you uploaded. That gap is now
resolved, and users can download any public shared dataset in any of the formats that Dataplay
supports. Furthermore, users can select columns on the fly, so there is no need to download anything
extra.