Figure 1: Google Colab Notebook with a Natural Language to SQL converter with query results displayed.

I recently came across an interesting paper entitled “Data Agnostic RoBERTa-based Natural Language to SQL Query Generation” by Debaditya Pal, Harsh Sharma and Kaustubh Chaudhuri which promised SQL generation for any tabular dataset. Infact the dataset is not even needed by the algorithm which only takes as input a list…


http://synner.io/

Ever needed to generate synthetic data and felt that tools such as Faker don’t go far enough to match your requirements. Synner may just be what you need.

Synner is a tool developed by Azza Abouzied (@AzzaAbouzied) and Mino Mannino and documented in their UIST 2019 paper. A demo video…


Data Oriented Programming — Cover Image

Object-oriented programming is too complex but what are the alternatives? Yehonathan Sharvit makes the case for data oriented programming and completely succeeds by introducing and building on the following three fundamental principles: Separate code from data; Represent data with generic data structures; and enforce data immutability.

The book is well…


I made my first H5P content type called Interactive Code. I’m pretty excited! The extension allows editable programming code to be displayed. Many programming languages including Javascript, Python, Ruby, PHP, C++ and even SQL are supported and no server-side libraries are required. I think the new content type has the…


I’m building a new GUI for Pandas aptly called “Pandamonium” as a Jupyter Notebook Extension. I’ve not built a Jupyter Notebook extension before and I’ve spent this last weekend learning all about the extension eco-system. I’m going to share my learnings with a series of blog posts — a great…


Use the Sentence-BERT library to create fixed-length sentence embeddings suitable for semantic search on a large corpus.

Image by https://pixabay.com/users/PDPhotos-16/?utm_source=link-attribution&utm_medium=referral&utm_campaign
Image by PDPhotos and downloaded from Pixabay.

I’ve been working on a tool that needed to embed semantic search based on BERT. The idea was simple: get BERT encodings for each sentence in the corpus and then use cosine similarity to match to a query (either a word or another short sentence). Depending upon the size of…


2021 Update: If you are looking for a quick way to run Dask, please try SaturnCloud which even offers a free plan.

Dask is simply the most revolutionary tool for data processing that I have encountered. If you love Pandas and Numpy but were sometimes struggling with data that would…


Image from Pixabay

Note: The Google Colab Notebook has been updated for Python 3, includes pyLDAvis visualisations and has improved raw display top top words and top documents in a topic.

You can now find topics in text without installing any libraries or writing a line of python code. All made possible by…


I could not believe the response I got for my previous blog post learning maths for Machine Learning and Deep Learning. There are definitely lots of people like me, who are interested in learning math in greater depth. …


Word2Vec and Glove are two popular algorithms that produce word vectors or word embeddings. Both algorithms work in an unsupervised manner to determine vector representations of words (i.e., they map words to a latent dimensions). Glove is trained using corpus word-word co-occurrence statistics. The Word2Vec skip-gram algorithm uses a log-linear…

Aneesha Bakharia

Data Science, Topic Modelling, Deep Learning, Algorithm Usability and Interpretation, Learning Analytics, Electronics — Brisbane, Australia

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store