Spark - Install Spark on Windows (PySpark)




Welcome to another post on spark. Sometimes we have to create a setup to try out few things quickly.


If you want to fastly do some exeriments with apache spark then just follow the following simple steps to do the setup of pyspark on windows.

1. Download and install Java


Download Link-

https://java.com/en/download/


Path Setup -

Add path to system environment variables

C:\Program Files\Java\jdk1.8.0_91\bin

2. Download and install Anaconda (Contains Python and Jupyter Notebook)


Download Link-

https://anaconda.org/anaconda/python


Path Setup -

Add path to system environment variables while installing anaconda.


3. Install Spark


Download Link-

http://spark.apache.org/downloads.html


Path Setup -

Add the path of extracted spark folder to the system environment variables.

3. Install Hadoop

Just create a Hadoop folder on c drive.

4. Download winutils - required for spark on windows


Download Link-

https://github.com/steveloughran/winutils/tree/master/hadoop-2.6.0/bin


Download and save it inside the bin folder of Hadoop folder created already in the previous step.


Path Setup -


Add bin path to system environment variables

5. Start pyspark

pyspark

Type above command and check the output

6. Setup for spark on Jupyter notebook

export PYSPARK_DRIVER_PYTHON=ipython3
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"

Use the following command to start the spark through the command line.

pyspark --master local[4]

That's it! Just run the above command and you will see the following output on the cmd window and jupyter notebook will get opened in the new browser window.


About Author

✔ 13+ years of experience in the Software Industry; Over 6.5+ years of experience in machine learning and deep learning projects.


✔ Hands-on Data Science practitioner; a Leading team of data scientists, Python developers, UI developers, and business analysts for multi-million dollar projects


✔ Designed and developed DevOps enabled MLOPs strategy and components from scratch which had saved 80% data annotation and 40% development time for clients.

Featured Posts
Search By Tags
No tags yet.
Connect