Everything i know about Threads

I am going to cover a huge concept here.

  • How python code executes
  •  Multitasking
    • Process based multitask (Multiprocessing)
    • Thread based multitask (Multi-threading)
    • GIL (Global Interpreter Lock)
    • Green Thread

Thread based programming is hard. There are no two ways about it. Every time one thinks he or she understands everything there is to know about how threading works, a new wrinkle is uncovered.

How python code executes
Before understanding Threading first we need to understand how a python code is executed.

python code  —> | compiled | —> bytecode —> | Interpreted | —>  Executes

What the above process does? When ever you run a python file first it is compiled and a new file with .pyc extension is created containing only binary data. Once a file is compiled it is not compiled again. Next time on wards system directly interprets the byte code. The byte code is then interpreted line by line and executed.

Multitasking

Multitasking is a process of executing multiple tasks simultaneously. We use
multitasking to utilize the CPU. Multitasking can be achieved by 2 ways:
– Process- based Multitasking (Multiprocessing)
– Thread-based Multitasking (Multithreading)

multitasking-in-php-7-638

The threading module uses threads, the multiprocessing uses processes. The difference is that threads run in the same memory space, while processes have separate memory. This makes it a bit harder to share objects between processes with multiprocessing. Since threads use the same memory, precautions have to be taken or two threads will write to the same memory at the same time. This is what the global interpreter lock is for.

Spawning processes is a bit slower than spawning threads. Once they are running, there is not much difference. One of the many disadvantages of windows is, it can not do multiprocessing within the code and take full advantage of multi core systems.

Multiprocessing

Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system, means execution of multiple concurrent software processes at any one instant. Usually it happens for multiple software but this is now achievable through code within the same application making the process ultra-fast.

A process runs independently and isolated of other processes. It cannot directly access shared data in other processes. Cost of communication is very high.

MultiThreading 

Mutithreading is the ability to run several parts of a program in parallel, so you can subdivide specific operations within a single application into individual threads. Each of the threads can run in parallel.

Threads are so called lightweight processes which have their own call stack but an access shared data. Threads share same address space. Cost of communication between the thread is low.

Comparision and brief into GIL (Global Interpreter Lock)

Processes are usually known as OS threads. Process are typically independent, which threads exist as subset of a process.

There is a huge confusion here. People usually say that languages like python, ruby does not support Multithreading, ie two threads can not execute at the same time because of the GIL (Global Interpreter Lock), which ensures only one thread can execute at a certain instance of time, there by making the whole process thread safe. It’s actually not like that. In python, ruby etc also multi-threading does exists, and they execute as expected. The GIL come into action in case of resource locking.

It is almost same in case of any language, when one thread is using one resource and at the same time a different thread is trying to access the same resource the interpreter will not allow the second thread to access the resource. The GIL has 4 stages of it’s life, namely
1. Acquire
2. Lock
3. Execute
4. Release

The the first thread acquires the resource with the help of GIL, Locks it so that no other thread can not use it in the middle of execution, Executes it so now the value of resource is changed, and then releases it so that other threads can use the resource.

However in python 3.2 on wards the biggest update in the GIL code has been made since 1992, as a summary i can say if one thread is using a resource and locked it. Now suppose before the execution of the thread a new thread is trying to access the resource The GIL will stop execution of the first thread and release the resource for the second thread to use it.

In practice we can have multiple threads running under multiple processes and communicate within the process.

mthread

Advantages of multiprocessing is isolation. A crashing process will not affect other process as they do not share same memory, but a crashing thread can destroy all other threads within the process as they share same memory.

Green Thread

In computer programming, green threads are threads that are scheduled by a runtime library or virtual machine (VM) instead of natively by the underlying operating system. Green threads emulate multithreaded environments without relying on any native OS capabilities, and they are managed in user space instead of kernel space, enabling them to work in environments that do not have native thread support.

Both Ruby, Cpython and pypy compiler have green thread feature in them.

multithread-your-application-34-638.jpg

As the ending note I would like to write some python code to show you the performance difference between single core code and multithreading and multiprocessing code.

benchmark_test.py

import urllib2, datetime
import threading
from multiprocessing.dummy import Pool as ThreadPool

# ############ urls definition ################
urls = [
http://www.python.org’,
http://www.python.org/about/’,
http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html’,
http://www.python.org/doc/’,
http://www.python.org/download/’,
http://www.python.org/getit/’,
http://www.python.org/community/’,
https://wiki.python.org/moin/’,
http://planet.python.org/’,
https://wiki.python.org/moin/LocalUserGroups’,
http://www.python.org/psf/’,
http://docs.python.org/devguide/’,
http://www.python.org/community/awards/’
]

# ##########  crawl function definition  ##############
def crawl(url):
urllib2.urlopen(url)

# ##########  usual for loop #######################
mk1 = datetime.datetime.now()
# usual for loop
for i in urls:
r = urllib2.urlopen(i)

mk2 = datetime.datetime.now()
print ‘usual for loop took %s’ % str(mk2 – mk1)

# ##########  list comprehention###################
a = [urllib2.urlopen(i) for i in urls]
mk3 = datetime.datetime.now()

print ‘list comprehention took %s sec’ % str(mk3 – mk2)

# ########## map ##############################
b = map(urllib2.urlopen, urls)
mk4 = datetime.datetime.now()

print ‘map took %s sec’ % str(mk4 – mk3)

# ######## multithreading ########################
threads = []

for n in urls:
thread = threading.Thread(target=crawl, args=[n])
thread.start()

threads.append(thread)

for thread in threads:
thread.join()

mk5 = datetime.datetime.now()
print ‘multithreading took %s sec’ % str(mk5 – mk4)

# ###### multiprocessing ##########################
pool = ThreadPool(4)
results = pool.map(urllib2.urlopen, urls)
pool.close()
pool.join()
mk6 = datetime.datetime.now()

print ‘multiprocessing took %s sec’ % str(mk6 – mk5)

I am using 4 core intel i5 processor with 8gb ram in an ubuntu system. I have not seen any big disturbances in the htop output while executing this program. Finally the output was like,

binayr@binay-u12:~$ python   benchmark_test.py
usual for loop took 0:02:17.804874
list comprehention took 0:02:24.816333
map took 0:02:15.912677
multithreading took 0:00:17.750087
multiprocessing took 0:00:22.988438

Although the output depend on the internet speed the multithreading and multiprocessing code is incredibly fast. The above result is for 4 processes in the multiprocess code, I have also tried with 10 processes and the result was even faster that the above result.

 

My REFERENCES are really awesome:

http://jeffknupp.com/blog/2012/03/31/pythons-hardest-problem/
https://en.wikipedia.org/wiki/Green_threads
http://stackoverflow.com/questions/6022629/difference-between-multitasking-multithreading-and-multiprocessing
http://stackoverflow.com/questions/3044580/multiprocessing-vs-threading-python

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s