republican-creole
site Search:


 
    All Forums Hot Topics Gallery






how-to block ads


 
Search Topic:
Share Topic
Posting?
Post a:
Post a:
Links: ·How To Get Noticed ·Web Monks FAQ ·Webhosting FAQ ·Posting Code ·How To Post ·Webhosting forum
AuthorAll Replies


Steve
I know your IP address
Consultant
join:2001-03-10
Yorba Linda, CA
kudos:5

reply to yaplej

Re: How to deal with a lot of data?

said by yaplej:

There are 2^1280 possible unique patterns (20-160 Bytes) in a VoIP packet excluding headers. Can that many records be created in a table?

If you know how to type 2^1280, you know how to figure out how much data that actually is, right?


yaplej
Premium
join:2001-02-10
White City, OR

Its huge, massive, gargantuan. Just asking how it might be done. Distributed tables/databases? I mean they simulate climate changes and storms effects on the world topology with huge amounts of data. How could you effectively store this data?

It was just an idea if its not feasible no big deal.



yaplej
Premium
join:2001-02-10
White City, OR

It would probably be better just to insert the entire 160Byte payload as a blob and and count them somehow. If the table gets full just start a new table. If there are multiple tables it would just take the top occurrences from each then compare those.

It would only use space as particular records are inserted but its going to use 160Bytes for each record. I read that its 64 Terabyte limit for tables sizes though so that's a lot of data.

Still it would be neat challenge to deal with a huge amount of data like that.
--
sk_buff what?

Open Source Network Accelerators
»www.trafficsqueezer.org
»www.opennop.org



cdru
Go Colts
Premium,MVM
join:2003-05-14
Fort Wayne, IN
kudos:7

said by yaplej:

Still it would be neat challenge to deal with a huge amount of data like that.

There's tradeoffs with any type of compression. Quality, speed, size. Pick two. When you pick a compression level in a program like WinZip, WinRar, 7zip, etc, you have perfect quality, but slow compression for a smaller file, or fast compression for a larger file. For a video codec or image such as jpg or png, you are balancing all three.

While doing an "analysis" how you suggest is theoretically possible, in the end for VoIP data I doubt it would be very useful. VoIP needs to be as real time as possible, meaning you can't do a lot with an individual packet because you're limited with processing time, plus each packet needs to be independent of others since delivery is not guaranteed and can't be retransmitted. Certain data payloads may come up more often then others, but I'd be surprised if it was statistically significant enough to make exceptions for those packets in an effort to "compress" them.


Steve
I know your IP address
Consultant
join:2001-03-10
Yorba Linda, CA
kudos:5

reply to yaplej

said by yaplej:

It was just an idea if its not feasible no big deal.

If you're a CCNA you should be able to do some back-of-the-envelope calculations to figure it out for yourself that the number is so large that you cannot even have a discussion about databases or storage.

2^1280 is around 10^385, a number which dwarfs the number of atoms in the Earth (10^50).


cdru
Go Colts
Premium,MVM
join:2003-05-14
Fort Wayne, IN
kudos:7

said by Steve:

2^1280 is around 10^385, a number which dwarfs the number of atoms in the Earth (10^50).

Not to mention the number of atoms in the universe (10^81). In fact, if every atom in the universe contained a universe of atoms that that contained a universe of atoms that contained a universe of atoms (4 nested universes), you'd still only have ~10^324 atoms.


yaplej
Premium
join:2001-02-10
White City, OR

Had no idea about how many atoms are in the universe. Never counted them.

So lets take this another direction. How about analyzing each wireshark stream individually for any possible patterns and only store those if any. Then analyze the collection of matched patterns for the top most common patterns.

Its a lot less storage if you only find a few patterns per session. There might not be any patterns found in a voip session in that case nothing to store. Packet captures of test calls generally are not that big so they might be able to upload analyze that call save any patterns and be done.

Seems like if you had 1000 calls to analyze you could get an idea quickly if there are any common patterns for a particular codec.
--
sk_buff what?

Open Source Network Accelerators
»www.trafficsqueezer.org
»www.opennop.org


jfmezei
Premium
join:2007-01-03
Pointe-Claire, QC
kudos:22

You use RAM to do the pattern matching and then a database to store number of occurrences of patterns that are of interest.

Remember that you should decompress the data in RAM before you do the pattern matching. Trying to compress compressed data often results in more data.



cdru
Go Colts
Premium,MVM
join:2003-05-14
Fort Wayne, IN
kudos:7

reply to yaplej

said by yaplej:

So lets take this another direction. How about analyzing each wireshark stream individually for any possible patterns and only store those if any. Then analyze the collection of matched patterns for the top most common patterns.

You realize you are just describing almost any general data compression algorithm out there. See reference: pkzip, zlib, deflate, etc...

Its a lot less storage if you only find a few patterns per session.

So after a few packets you notice a pattern that could be substituted. First, it's a lousy codec if that pattern is obvious and not a function of just the particular voice characteristics of that call. But even beside that, by the time you realize "Hey, there's a pattern here", the packets should have already been sent. If they haven't, they are now useless as part of the conversation.

But even if you do realize that 001010...1010 is a repeating pattern, you now must tell the other end that this particular pattern repeats and that if they see a particular bitpattern "key" come across, it should be replaced with the expanded value. Of course, this takes some extra data across the line. But it might save us some in the long run.

But that particular bit pattern that we've noticed repeats...may never repeat again. We don't know if it will or won't as it's a "real time" protocol and we can't look ahead, or go back in the past. We can only look at a very small bit of time. And the overhead of dictionary-based compression (such as system resources, sharing the dictionary across the link, and encoding the substituted values) becomes far more then what we would save.

Seems like if you had 1000 calls to analyze you could get an idea quickly if there are any common patterns for a particular codec.

I would suggest reading up on data compression algorithms first. I have a feeling that there is a lot more to compression then what you might know already.

Tuesday, 18-Jun 20:14:02 Terms of Use & Privacy | feedback | contact | Hosting by nac.net - DSL,Hosting & Co-lo
over 13.5 years online © 1999-2013 dslreports.com.
Most commented news this week
Hot Topics