Friday, March 8, 2013

Ruby Pass By Value? Pass By Reference? Pass By Sharing!!

Updated at Dec 27, 2013

There has been a debate over the Internet about whether Ruby's method parameters are passed by value or by reference. Some people claim that it's passed by value by providing the following code.

str_1 = "pass_by_value"

def pass_by_value(str)
  str = "pass_by_reference"
end

pass_by_value(str_1)
str_1
=> "pass_by_value"

It looks convincing at the first glance, as the function pass_by_value didn't change the value of the str_1. But if you change the code a little bit, you will find this assertion is erroneous.

str_1 = "pass_by_value"
str_1.object_id
=> 70332873209820

def pass_by_value(str)
  puts str.object_id
end

pass_by_value(str_1)
70332873209820
 => nil
In the method pass_by_value, when we call method object_id of the object referenced by the variable str, it returns the same id as the object referenced by "str_1". If Ruby pass argument by value, the object_id should NOT be the same. As the value is copied to a new memory location and the local variable str will point to the new object.

Now we can come to the conclusion that Ruby does not pass argument by value, but neither does it pass argument by reference. Because if it does, str_1 should've been changed in our first code snippet. So there must be another evaluation strategy that Ruby uses. This third evaluation strategy is called pass by sharing. Here is a citation from wikipedia that states how pass by sharing works. The semantics of call by sharing differ from call by reference in that assignments to function arguments within the function aren't visible to the caller (unlike by reference semantics)[citation needed], so e.g. if a variable was passed, it is not possible to simulate an assignment on that variable in the caller's scope. However since the function has access to the same object as the caller (no copy is made), mutations to those objects, if the objects are mutable, within the function are visible to the caller, which may appear to differ from call by value semantics. Let's take a look at the following code snippet.

str_1 = 'abc'
str_2 = 'abc'

def mutate_str(str)
str << 'efg'
end

def assign_str(str)
str = 'abcefg'
end

And if we call mutate_str on str_1 and assign_str on str_2. We can find that str_1 is changed to 'abcefg', while str_2 remains 'abc'.
> mutate_str(str_1)
=> "abcefg" 
> str_1
=> "abcefg" 
> assign_str(str_2)
=> "abcefg" 
> str_2
=> "abc" 

When to Use "self" Keyword Explicitly

In Ruby, when you are defining an instance method, the keyword "self" can be used to refer to the current object.

class Name < String
  def write_name_in_capital_letter
    self.upcase
  end
end
name = Name.new('david')
name.write_name_in_capital_letter
=> "DAVID"

In the example above, you can remove the self keyword and the method still works. That's because when you call a method without an explicit receiving object, the method message is implicitly sent to "self". In this case, upcase is sent to "self".

class Name < String
  def write_name_in_capital_letter
    upcase
  end
end
name = Name.new('david')
name.write_name_in_capital_letter
=> "DAVID"
This is just a Ruby's idiom to save programmers' work. However, abusing this feature may lead to unexpected result. There are two situation where "self" is needed to get desirable result. The first situation is when you attempt to call the method "class".

class Name < String
  def show_my_class_name
    puts class.name
  end
end
It actually gives the following error.

SyntaxError: (irb):49: syntax error, unexpected '.'
puts class.name
Because the Ruby interpreter thinks of "class" as the class keyword instead the class method.

Another situation is when you attempt to call the setter method on an instance variable.

class Name < String
  attr_accessor :an_attr

  def change_attr
     an_attr = 'hohoho'
  end
end
n = Name.new
n.an_attr = 'hahaha'
n.change_attr
n.an_attr
 => "hahaha"
We can see no error is thrown, but "an_attr" is not changed at all. That's because the "an_attr=" in method change_attr is not interpreted as the setter method of "an_attr". Instead, it is an assignment of local variable. Because if Ruby interpreter takes it as the setter method, there will be no way left for you to define a local variable named "an_attr".

So in these two situations you need to use "self" explicitly.

References
http://www.jimmycuadra.com/posts/self-in-ruby

Wednesday, March 6, 2013

Yaml Basics and Some Simple Examples (Part 2)

Sometimes you need certain values to appear several times in your yaml file. Instead of copying and pasting, you can use node anchors, which marks a node for future reference. An anchor is denoted by the “&” indicator and a reference is denoted by "*"indicator. For example, in the test_3.yml file in Part 1, if we want the team_3 to have the same members as team_1, we can use node anchors.

test_3.yml
team_1: &team_1_members
  - Brian
  - David
  - Tom

team_2: 
  - Mike
  - Bob
  - Sam

team_3: 
  *team_1_members 
 => {"team_1"=>["Brian", "David", "Tom"], "team_2"=>["Mike", "Bob", "Sam"], "team_3"=>["Brian", "David", "Tom"]}

Tuesday, March 5, 2013

Yaml Basics and Some Simple Examples (Part 1)

Yaml is a great tool for storing configuration data. Just check out the config folder of your rails app, you will find plenty of files ending with .yml. You can load your yaml file by calling the method YAML.load_file. After a yaml file is loaded into memory, it presents itself as either Array or Hash, which depends on how the original yaml file has been written.

In a yaml file, indentations are not just used for ease of reading, they are actually used to indicate parent node and child node. Here is how yaml's official site (http://www.yaml.org/spec/1.2/spec.html#id2777534) puts it, Each node must be indented further than its parent node. All sibling nodes must use the exact same indentation level. However the content of each sibling node may be further indented independently.

In the following, I will show you four yaml files. Two of them presents themselves as arrays after they are read into memory, and the other two presents themselves as hashes.

test_1.yml
- Abraham Lincoln 
- George W. Bush
- Barack Obama
=> ["Abraham Lincoln", "George W. Bush", "Barack Obama"]

test_2.yml
name: David
age: 25
gender: male 
 => {"name"=>"David", "age"=>25, "gender"=>"male"}

test_3.yml
team_1:
  - Brian
  - David
  - Tom

team_2: 
  - Mike
  - Bob
  - Sam

team_3: 
  - Adam
  - Aron
  - Bruce 
 => {"team_1"=>["Brian", "David", "Tom"], "team_2"=>["Mike", "Bob", "Sam"], "team_3"=>["Adam", "Aron", "Bruce"]}

test_4.yml
-
 name: david
 age: 25
 gender: male
-
 name: sam
 age: 22
 gender: male
-
 name: bill
 age: 29
 gender: male 
=> [{"name"=>"david", "age"=>25, "gender"=>"male"}, {"name"=>"sam", "age"=>22, "gender"=>"male"}, {"name"=>"bill", "age"=>29, "gender"=>"male"}]

We see two symbols here, the dash and space ("- ") and the colon and space (": "). The former is used to mark the start of an array member and the latter is used to mark a key-value pair. So for the four test files, after we load them into memory, we get an array, a hash, a hash whose values are arrays and an array of hashes, respectively.

Monday, March 4, 2013

Experimenting Built-in Transactions

One important feature of a reliable database system is atomicity, i.e. either a transaction succeeds as a whole or none of its statement is executed. Put it in another way, if one of its statement fails, all of its statements must not be executed. Rails has a nice clean implementation of transaction, just wrap all the rails code which you want to put within a transaction with the transaction call,

  A_CLASS.transaction do
    # ...
  end

All the code within the block that will touch the database is guaranteed to be executed on a whole or none of them are executed. However, one thing worth noting is that only the database is guaranteed on atomicity, those models involved are changed anyway. If you want to restore the models involved in the transaction if any of the database statement fails, you need to add them as parameters to the transaction method.

However, what I'm going to mess around with today is a special kind of transaction called built-in transaction. Built-in transaction guarantees the atomicity of the transaction that occurs between parent and child tables. For example, when saving a parent record, built-in transaction makes sure that either the parent record and all its related child records are saved to database, or nothing is saved to database.

Let's create two very simple classes, Order and OrderLineItem, and define a one-to-many relationship between them.
rails generate model Order total:decimal description:text
rails generate model OrderLineItem subtotal:decimal description:text order:references

class Order < ActiveRecord::Base
  attr_accessible :description, :total
  has_many :order_line_items
end

We also require that each line item's subtotal must be greater than 0.
class OrderLineItem < ActiveRecord::Base
  belongs_to :order
  attr_accessible :description, :subtotal

  # validations
  validates_numericality_of :subtotal, 
        :greater_than => 0
end

Now let's launch the console. Let's create an order , and an order line item with subtotal to be -1, and then we assign the line item to the order object. We know the line item is invalid. We also know if we save the order object, the line item will be saved too. So let's call the save method on the order object and see what happens. As expected, it throws an error saying line item subtotal must be greater than zero. Now let's check the database, we can see no order or order line item records is added to database. Then let's change subtotal of the line item object to 1, and then save the order again, now both the order and the line item are saved to database.