KWoC Project Report

What is KWoC?

Kharagpur Winter of Code (KWoC), is a 5 week long Open Source coding event, hosted by IIT-KGP, in (you guessed it...) winter. Students from across the country who are new to the realm of Open Source Software try to contribute to projects involving Machine Learning, DBMS, Cyber Security, Compiler Design, and more! I highly recommend KWoC to new programmers who want to get into Open Source Development. 

Free Open Source Software (FOSS) is basically software whose source code can be viewed, edited, and used freely by anyone interested to do so. 

The purpose of FOSS is to come up with solutions to real world problems, and get people from all over the world to contribute, in the form of adding new features, discovering and fixing bugs and, documenting pre-existing code.

Selecting a Project -

KWoC had an array of amazing projects to choose from, but Sim-C really caught my eye, since it involved Compiler Design, and was written in Python, a language I am familiar with (aren't we all?).


Sim-C is basically a dynamically typed front end for the C language, which allows budding C programmers to type high level code without worrying about the complexities that come along with C-lang.
I highly recommend it to new programmers, especially since installation can be done with one seamless pip installation.    ( Official Github Page )

An Overview of how Sim-C (and most Compilers) work -

The user's code is saved with a '.simc' extension. This code is first passed on the Lexical Analyzer (or Tokenizer). This tokenizer breaks the simc code into tokens by removing whitespaces, identifying keywords and processing numeric values.

The generated tokens are then passed on to the Parser, which creates the Operational Code (OpCode), if the tokens are syntactically legitimate. Basically, the parser checks if the grammatic rules have been followed. In most compilers, the Parser also stores the expressions encountered in trees, but in Sim-C, after the type inference of the defined variables is done, it passes the code directly to the compiler.

Finally, the OpCode is passed on to the main compiler (transpiler in this case). The transpiler generates the required C code from the OpCode. It adds any include statements to the final C code as well. A transpiler is basically a source-to-source compiler that converts code written in one language to another.


The Type Inference Engine basically identifies the constants, variables (with their data types) and stores all this information in a Symbol Table. The symbol table is filled by the tokenizer and parser. In the transpiler phase, it helps fetch data to convert Sim-C's dynamically typed code to C-lang's statically typed code. 

My first Contribution - 

When the coding period began on 6th December, I started to familiarize myself with the project by replicating the code provided in the Sim-C documentation. It was then that I found my first bug!
The thing is that in C-lang, we take a string input using the following lines of code:

char* string_name;
printf( "Enter a string: " );
scanf( "%s", &string_name );

The corresponding Sim-C code that should generate these statements looks like this:
var string_name = input( "Enter a string: ", "s" )

As we can see, the Sim-C equivalent is much easier to remember and implement, which is what motivated my mentor, Siddartha, to build this compiler.
The bug I found was that the scanf( ) statement in the generated C code did not receive the reference address of the string. Instead, it received the string itself. Basically, instead of the scanf( ) statement mentioned above, we got this instead:

scanf( "%s", string_name );

Note how there is no "&" before string_name.

I quickly filed a bug using Github's trusty resources and then later realised that the exact bug had been captured and documented before. There go my attempts of being a newbie hero... (Issue #183)

Anyways, the way to solve issues in open source development, is to head over to the particular issue, and then comment in the thread, asking if you can work on the issue. If you get accepted by one of the Project Admins, then you can start coding the solution. In my opinion, this small bit of communication goes a long way, especially because it would be a terrible waste of everyone's time if you started working on an issue, came up with a genuine solution, and then offered your code, only to find out that someone else has been assigned to it. (this happens more often than expected, so be wary)

What I thought was an easy fix, turned out to take much longer than expected. I mean, I just had to add an ampersand symbol before the variable name, right? (Spoiler alert, this assumption was wrong)

I started by comparing the way integers and strings get accepted in Sim-C. The token sequence for both of them looked the same, so I figured that the lexical analyzer processed things correctly. I then navigated to the parser (where the OpCode is generated), and added an extra condition which added the '&' character before the variable, if the variable was a string. Locating the block where I had to add this took longer than fixing the bug itself. (That's basically coding in a nutshell) 
Later, I realised that this should have been corrected in the tokenizer itself, but my mentor was happy with this change, so we went ahead with it. 

Snippet of the changes I made


I then filed a Pull Request, which is basically something you need to do for merging your code in the official project. It is in this phase of development where the admins go through your code, test it against test cases, ask for certain changes, until the code becomes perfect for merging. Mathias pointed out that I had some leftover print statements that got in the way. ( An old debugging habit of mine xD )

After fixing that, my code officially became part of the Sim-C master branch. What a relief!

Solving More Issues -

I then gave a quick solution to one of the test cases mentioned in issue #379. This involved writing Sim-C code to find the area of a circle circumscribing a square. Easy-peasy.

For my third issue, I picked, Issue #402, in which the problem was to check for balanced braces { }, brackets [ ], and parentheses  ( ). Although this was actually a very basic problem in Computer Science, it was the first time I got to implement a theoretical algorithm in a real world scenario. Basically, we need to return an error if the code's bracket sequence is unbalanced. For reference,
The sequence: { [ ] ( { } ) }   is balanced.
The sequence: { ] ( ) } (       is unbalanced.

Algorithm:

-To solve this, we need to implement a stack using an array (or a list, since we're coding in Python).
-Each time an open bracket, brace, or parenthesis is encountered, push that character to the stack. 
-If we encounter a closing bracket, brace or parenthesis, and if the element at the top of the stack corresponds to the same type, then pop it out, else report an error. ( Too many open braces )
-If at any time during execution, the stack reports an 'underflow', then report an error, since there are too many closing braces.

While implementing this in Sim-C, I thoroughly messed up. The code was logically 'correct', but from a software point of view, the lines of code were redundant and unnecessarily blew up the time complexity. After some collaboration with my mentor, I figured it out, and made it as compact as possible.

After solving 3 issues with the 'easy' tag, I realised that I would pass the Mid Evaluations without much trouble. This pushed me to go for a 'medium' level issue (which actually came under the 'feature' category). I decided to implement the if cascade feature (Issue #408) in Sim-C. For the first time in this event, I was clueless as to where to start. After some brainstorming with Siddhartha, we came to the conclusion that the Dynamic Memory Allocation feature (Issue #33) had to be implemented in order to add this feature (which happened to have a 'difficult' tag ). Due to my lack of experience, I could not complete this during KWoC.

I then decided to tackle issue #433, in which we had to implement the expression( ) function from simc_parser.py in array_parser.py, by removing redundant code in the latter file. Yet again, I didn't know where to start. It took me about a day just to figure out the logic behind it, and a few more days after that to implement it. Despite all that, my code still didn't pass all the test cases. 
( I tested the changes using the Sim-C Testing Suite, which is this cool and convenient piece of software that runs your model changes against about 105 test files )
KWoC's coding period ended on 4th January, 2021, and I could not solve this issue by then. I did not feel like dropping the task, so I decided to finish it and get my code merged into the official project.

After some help from Siddhartha, my mentor, I added the required condition blocks, and documentation required, incase someone in the future works on the same section of code. My final contribution for KWoC 20-21 got merged. Since it got merged after the coding period, it wouldn't come under the final stats, but I am happy with it nonetheless :)

Conclusion and Final Thoughts - 

I know my attempts to make this section as non-cliché as possible have failed miserably, but I would like to say that I genuinely enjoyed KWoC 2020-21, as it gave me a taste of what 'real' software development looks like (and how systematic, and somewhat tedious it can be). 

I am really grateful to Mathias for helping me out whenever I got stuck, whether it be better documentation, or more efficient logic.

A special thanks to my mentor and friend, Siddhartha, for teaching me the intricacies of coding, and for all the fun discussions we had after completing the assigned tasks.
 
A huge thanks to the Kharagpur Open Source Society for letting me be part of this event, and hosting it successfully. I highly recommend KWoC to budding programmers who want to dive into the world of Open Source.

Also, dear reader, if you've made it this far, thanks a lot for taking the time to read my experience. You're the best!

This is Chasmiccoder, signing out...

Final Stats and Closed Pull Requests -




Comments

Post a Comment