[Stoq-devel] Support for Distributed Units/Stores

Gustavo Barbieri barbieri at gmail.com
Tue May 17 11:13:29 BRT 2005


On 5/16/05, Evandro Vale Miquelito <evandro at async.com.br> wrote:
> 
> > Hello,
> 
> Hello Gustavo, nice to see your presence here !
> 
> > A common problem with multi-unit/store companies is that they probably
> > want to access from other units, but since internet nowadays is not
> > very reliable, it should support failures, so considering the
> > other-unit-DB is always accessible is a mistake.
> >
> > Does Stoq have any prevision for that?  I want to help with this.
> 
> Actually Stoq team have had many discussions related to this topic and, for
> now, this is not our first priority. By the way, of course as soon as we
> start planning and implementing this function, it's better for the project.

Great... I've already told you, you guys rock. I don't know any other
SW here in Brazil that ever think about it.

 
> We also will enjoy your help here and I think we could start right now
> planning how bring to Stoq all the functionalities stores need.

Sure!

 
> For now we have plans to suport two types of data transaction:
>      1- A centralized database server with all the stores connected to it.
>         This is the easiest way to implement but unfortunately not cheap for
>         most of companies here in Brazil.
>      2- Decentralized database: each store have its own database and
>         every day has to send the contents of its transactions to a server
>         and receive later an updated version of the whole data. We think this
>         is the most commom approach here in Brazil and companies here usually
>         use dial up connections to transfer data.
> 
> To implement the second approach we are planning to look to each store as a
> independent company with its own stock. If the company has a warehouse we
> will share its contents with the stores and associte with them a logical
> stock, a stock that represents the quantity in the warehouse reserved for
> the store.

Great.

> Of course that there is some kind of data that can not be
> independent, such customers. In this case we have plans to implement some
> resources that can work with possible conflicts when we have a certain
> customer data been changed for two stores in the same day(this is rare
> but possible).

Excellent.

> Once we work with all the stores in a independent way we can
> easily send every day they data to a server, process and validate the
> contents, and send again from server to all the stores.
> This idea has also some problems. Usually we will need much more than one
> data transfer by day if we want a consistent data. For example, in most of
> stores there is a purchase limit for every customer, calculated by credit
> analises. If we have more than one store we must know every time if a
> certain customer has already broken his credit in another store and avoid
> problems with bad clients. We can also need to know sometimes how many items
> do we have in another company and of course it's not safe do get this
> information using just one single transfer by day.

Sure.  If we go with a dispatcher queue we can always show user the message:

"This data is from %X %x."
"Do you want to update data?"
 [Yes][No]

if user hit Yes, we launch pppd, connect, transfer and disconnect.
Data cam be use bzip2 compression to short transmission times.

Actually, we could go with a simpler solution that would just transfer
the data user needs, but I think using a dispatcher queue make the
process easier and avoid future transmissions.

Also, this message could have a threshold to be shown... some messages
could accept data 10m old, others 1day... others need to be fresh
data.



> There is many other details related here that we can discuss in the next e-mail.

Ok.


> > There are many ways to address this problem, the easiest IMO is the
> > Passive Replication, each Unit is a Coordinator for its DB and a slave
> > for other DBs.
> >   If the slave loose connection to it's coordinator, data may be
> > outdated, but this is not a huge problem for most of cases (I'll
> > elaborate more on this later).
> >   Update to remote database should use a dispatcher queue and should
> > be done using add/subtract operations, so if a coordinator is down for
> > some time, when it's back the operations will be performed in order
> > and data will not loose consistency.
> >   For example: user has 100.00 in debit, it pay 100.00. The operation
> > should be "value = value - 100.00" instead of "value = 0", since if
> > user spend more 40.00 before the payment is commited, it will be 40.00
> > at the end, not 0.00 (in the case of "value = 0.00".
> 
> It seems you are considering here that stores always have broad band
> connection which is not true in Brazil.

Not really. If you use the dispatcher queue, every operation you do on
the Coordinator is queued, when it connects to Slaves, it goes trough
it and send events in order.  Coordinator would be the store user was,
slave would be any other store that may want to know data about other.
   For example, if we consider Stock on STORE1, if STORE2 wants to
know how many PRODUCT there are in STORE1, it can be a slave for
STORE1 stock db.  If STORE1 removes one item from db, it put this
information on dispatcher queue, when STORE2 is able to connect to
STORE1, it will receive this data and process.

BTW, just now I realized we can't name this Passive Replication. :-P
In Passive Replication, coordinator just send the new state, not
things like "Subtract X from Y", it should say "Y=X". Well... it's
just a name issue, I don't think you bother, my mistake!


> If we have a high speed connection why not instead use a passive replication
> method think about a single comunication between stores and a server ? A
> server here could be also a store that has a main office. In this case each
> store will have to send, time after time, all its transactions to the server
> and the server, after receive the data from all the stores, send back to the
> them an updated version of the data. We can also implement a
> special mechanism to avoid problems when a certain store could not send its
> data in the right time.

Yes, you're right... but I think that mechanism would be something
like a dispatcher queue :)


 
> > Why outdated data is not a huge issue:
> >   One common case is when USER is registered at STORE1 and goes to
> > STORE2 to shop, STORE2 needs to know if it's registered. This can
> > accept a huge delay, maybe one day.
> >   Then USER left a debit on STORE2, which must be kept at STORE1,
> > it's "default" store. This also accepts a huge delay.
> >   User USER want to pay his debits on STORE2, his debits come from
> > STORE1. This also accepts a huge delay.
> >   These operations are Ok to have outdated data, but user may be
> > informed with something like "These data is from yyyy-mm-dd hh:ss and
> > may be outdated", but operations are ok to proceed.
> 
> Yes, I agree here. Informing to users when we have outdated data is a mandatory
> approach when working with many stores.
>
> >   But Some kind of operations must act on official data, like when a
> > user want to close his account, he must have no debit. These should
> > not be allowed in the presence of outdated data.
> 
> Agreed.
> 
> > Other kind of communication between stores, like product transference,
> > must be addressed in similar way, so the system doesn't need to stop
> > so changes take place.
> 
> Ditto.
> 
> > I have no idea if Stoq already support anything on this front, but I
> > want it to do so and I want to help. Any comments are welcome, since I
> > never did this in real live and I'm just studying Distributed Systems
> > now.
> 
> As I told you before, for now we don't have a support for this in Stoq, only
> ideas. Would you like to help us starting the wiki documentation of how we should
> implement this in Stoq ? or even creating some bugs and sending some patches ?
> I can give you all the suport you need. We can create a special branch for
> that in Async's subversion repository and a user for you. You can also
> come to #async IRC channel and be free to talk about some implementation
> details.

Sure, I don't think you should implement it atm, but to know you think
about it is great. I really have interest in this area, so I want to
help with Wiki + Code, but I need some help to get started with Stoq
internals, where to look, etc...

By now I will start a wiki page about the topic and try to gather what
we need, then I can do some experiments with a prototype and check it
works. Then we think on how to integrate both.


> Again, I and all the Stoq team appreciate your contribution here and we are
> expecting you as soon as it's possible to be part of the success of this project.

Thanks... I really want to help you guys, since it will help me too :-)

My father is on the CC, so it can start reading about this stuff,
maybe he have some cases to contribute :)

-- 
Gustavo Sverzut Barbieri
---------------------------------------
Computer Engineer 2001 - UNICAMP
GPSL - Grupo Pro Software Livre
Cell..: +55 (19) 9165 8010
Jabber: gsbarbieri at jabber.org
  ICQ#: 17249123
   MSN: barbieri at gmail.com
 Skype: gsbarbieri
   GPG: 0xB640E1A2 @ wwwkeys.pgp.net



More information about the Stoq-devel mailing list