Access remotely MacOS with Jupyter and ngrok

Posted on April 13, 2019 in datascience • 1 min read

This is a step-by-step guide to control remotely a MacOS laptop host1 from host2 using Jupyter Lab and ngrok. The two hosts may sit in different local networks behind firewalled NATs. Instructions to setup Jupyter lab in a clean Python 3.6.5 virtual environment are also included.

Steps for …


Continue reading

AWSFlow: From zero to Amazon EMR jobs and Lambda functions with Python

Posted on April 06, 2019 in devops • 1 min read

After lots of cleanup and refactoring, the AWSFlow project goes public! It lets you define programmatically workloads for AWS Elastic Map Reduce (EMR) clusters and Lambda functions using Python with a concise methodology aimed at fast prototyping.

The most interesting design choice is that the awsflow package itself gets deployed …


Continue reading

Reviewing Zeppelin and Jupyter notebooks

Posted on March 16, 2019 in datascience • 2 min read

Recently, I started a new project using Scala/Spark. The project runs on the AWS EMR infrastructure and data science investigations are performed in Zeppelin notebooks hosted on S3. We review all our data science deliverables and it quickly became clear that reviewing notebooks is not as easy as reviewing …


Continue reading