Introduction¶
Pytcup is a python3 library interface to TCUP Services. It allows user to retrieve/query data from TCUP data stores such as Sensor Observation Service (SOS), Data Lake Service (DLS) etc. Pytcup provides interface to connect with the TCUP Spark cluster for distributed Computing. It also allows user to upload the algorithms developed in Notebook, as TCUP Tasks for scheduled orchestration.
Following are the brief list of features:
Get list of sensors/ features from TCUP Sensor Observation Service.
Get Sensors metadata, sensors capabilities, sensor observed property from TCUP Sensor Observation Service.
Bulk-Upload Sensor observations in CSV format in Sensor Observation Service.
Get the Spark Context from Spark Cluster.
Run algorithms in distributed Spark cluster using Spark Dataframes, RDD.
Get sensor observations as Spark dataframes using Spark for data analysis.
Get file URI from TCUP Data Lake Service with metadata.
Get Spark dataframes using Spark for given files from TCUP Datalake Service for data analysis.
Write analyzed Spark dataframe back to TCUP Datalake Service.
Create projects, Tasks and Deploy Tasks, get Task status, and download Task files from TCUP Task Service.
Deploy the python programs to TCUP Task Service using Pytcup library.
Finalized programs , can be put to production by uploading the same as TCUP Tasks.
Easy enhancement of the ML libraries in Python (e.g scikit-learn , sparkml etc) in notebook.
Purpose/Usage¶
TCUP Notebook is a key component in the platform which provides containerized web based work station for the data scientist. TCUP is a multitenant platform, and there will be multiple data scientist who can work under same tenant data set. Admin for specific tenant can create or delete Notebook users under that tenant.
After a Notebook user is created data scientist can log in to his/her own notebook container and use Pytcup library and connect to Spark cluster from Jupyter Notebook. User need to import the Pytcup library to access data from TCUP Sensor Observation Service, TCUP Data Lake service, etc. User can leverage Pytcup library for data analysis. Pytcup tutorials are available in respective notebook container.
Login URL : https://domainname.tcupiot.com/ns/hub/login
Data scientist can start/stop his server and logout using GUI of Jupyter Notebook.